pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Edward Z. Yang	90d2593b3e	Revert #132806 , #132736 , #132539 , #132487 (#133570 ) This reverts commit `25df063f04`. This reverts commit `de00c79583`. This reverts commit `419b76c4ac`. This reverts commit `bc57d5b6ff`. Differential Revision: [D61335013](https://our.internmc.facebook.com/intern/diff/D61335013) Pull Request resolved: https://github.com/pytorch/pytorch/pull/133570 Approved by: https://github.com/albanD, https://github.com/jansel, https://github.com/anijain2305	2024-08-15 20:54:21 +00:00
Animesh Jain	de00c79583	[dynamo][inline_inbuilt_nn_modules] Mark nn module tensor static for cudagraphs (#132736 ) Fixes https://github.com/pytorch/pytorch/issues/132714 Pull Request resolved: https://github.com/pytorch/pytorch/pull/132736 Approved by: https://github.com/mlazos ghstack dependencies: #132538	2024-08-06 20:13:28 +00:00
Xuehai Pan	4226ed1585	[BE] Format uncategorized Python files with `ruff format` (#132576 ) Remove patterns ``, `test/`, and `torch/**` in `tools/linter/adapters/pyfmt_linter.py` and run `lintrunner`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/132576 Approved by: https://github.com/ezyang, https://github.com/Skylion007 ghstack dependencies: #132574	2024-08-04 17:13:31 +00:00
Oguz Ulgen	72d2dba992	Add None return type to init (#132335 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/132335 Approved by: https://github.com/albanD	2024-08-01 15:26:45 +00:00
Animesh Jain	612ea35395	[dynamo] Introduce UnspecializedBuiltinNNModuleSource (#132312 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/132312 Approved by: https://github.com/yanboliang ghstack dependencies: #132302, #132304	2024-08-01 06:21:05 +00:00
Animesh Jain	bcd1d2e832	[dynamo] Introduce UnspecializedNNModule guard source (#132304 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/132304 Approved by: https://github.com/yanboliang ghstack dependencies: #132302	2024-08-01 04:35:43 +00:00
Animesh Jain	e772547d70	[dynamo][rename/refactor] Rename guard_source NN_MODULE to SPECIALIZED_NN_MODULE (#132302 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/132302 Approved by: https://github.com/yanboliang	2024-08-01 04:35:43 +00:00
Xuehai Pan	973037be6a	[BE][Easy] apply autofix for ruff rules unnecessary-collection-call (C408): `list()` / `tuple()` / `dict()` (#130199 ) This PR changes the empty collection factory call to Python literals: - `list()` -> `[]` - `tuple()` -> `()` - `dict()` -> `{}` The Python literals are more performant and safer. For example, the bytecode for building an empty dictionary: ```bash $ python3 -m dis - <<EOS import collections d1 = {} d2 = dict() dict = collections.OrderedDict d3 = dict() EOS ``` ```text 0 0 RESUME 0 1 2 LOAD_CONST 0 (0) 4 LOAD_CONST 1 (None) 6 IMPORT_NAME 0 (collections) 8 STORE_NAME 0 (collections) 3 10 BUILD_MAP 0 12 STORE_NAME 1 (d1) 4 14 PUSH_NULL 16 LOAD_NAME 2 (dict) 18 CALL 0 26 STORE_NAME 3 (d2) 6 28 LOAD_NAME 0 (collections) 30 LOAD_ATTR 8 (OrderedDict) 50 STORE_NAME 2 (dict) 7 52 PUSH_NULL 54 LOAD_NAME 2 (dict) 56 CALL 0 64 STORE_NAME 5 (d3) 66 RETURN_CONST 1 (None) ``` The dict literal `{}` only has one bytecode `BUILD_MAP`, while the factory call `dict()` has three `PUSH_NULL + LOAD_NAME + CALL`. Also, the factory call is not safe if users override the `dict` name in `locals` or `globals` (see the example of replacing with `OrderedDict` above). Pull Request resolved: https://github.com/pytorch/pytorch/pull/130199 Approved by: https://github.com/malfet	2024-07-11 17:30:28 +00:00
Oguz Ulgen	54b0006cb2	Evaluate symexprs on load path of cache not write (#128997 ) When caching is enabled, an internal model fails with ``` assert_size_stride(bmm_9, (17, s0, 512), (54784, 512, 1)) AssertionError: expected size 17==17, stride 57344==54784 at dim=0 ``` looking at this model, the exact problem is when the cache is hit on the forward graph, the generated code for backward fails since the strides of the outputs of forward, passed to backward as inputs, are not what we expected. This PR changes the evaluation logic so that we defer evaluation of output stride exprs to load path as opposed to eagerly doing it on save path. I have not been able to come up with a unit test repro for this problem. Differential Revision: [D58796503](https://our.internmc.facebook.com/intern/diff/D58796503) Pull Request resolved: https://github.com/pytorch/pytorch/pull/128997 Approved by: https://github.com/ezyang	2024-06-20 08:55:12 +00:00
Xuehai Pan	dd143d44cc	[BE] enable UFMT for top-level files `torch/*.py` (#127707 ) Part of #123062 - #123062 Pull Request resolved: https://github.com/pytorch/pytorch/pull/127707 Approved by: https://github.com/ezyang	2024-06-12 20:15:05 +00:00
Aaron Orenstein	ea614fb2b1	Flip default value for mypy disallow_untyped_defs [2/11] (#127839 ) See #127836 for details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127839 Approved by: https://github.com/oulgen	2024-06-08 18:23:08 +00:00
Aaron Gokaslan	1dd42e42c4	[BE]: Try TCH autofixes on torch/ (#125536 ) Tries TCH autofixes and see what breaks Pull Request resolved: https://github.com/pytorch/pytorch/pull/125536 Approved by: https://github.com/ezyang	2024-05-05 23:13:59 +00:00
Animesh Jain	37c993546d	[dynamo][guards] Bug fix for set_export_info (#125275 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/125275 Approved by: https://github.com/yanboliang	2024-05-01 03:46:26 +00:00
Edward Z. Yang	64491c0811	Restore CompileContext as well in backwards (#124626 ) This should fix many of the unknown compile id problems currently afflicting tlparse backwards analysis. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/124626 Approved by: https://github.com/bdhirsh	2024-04-23 14:39:52 +00:00
Xuehai Pan	93e249969b	[BE] enable `ruff` rule `RSE` and remove useless parentheses in `raise` statements (#124261 ) Remove useless parentheses in `raise` statements if the exception type is raised with no argument. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124261 Approved by: https://github.com/albanD	2024-04-17 19:29:34 +00:00
Jason Ansel	11e6f84ad8	[dynamo] Graph break on uninitialized nn.Module (#123790 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/123790 Approved by: https://github.com/anijain2305 ghstack dependencies: #123700, #123705, #123786	2024-04-12 19:03:13 +00:00
Brian Hirsh	134e56fa33	inductor: log unique id to match output_code to aot graphs (#118647 ) I found it helpful to be able to see, given some inductor output code, which AOT graph it came from. When you have large models with multiple graphs floating around this can be difficult, so I added the aot_config.aot_id to the printed inductor output. Pull Request resolved: https://github.com/pytorch/pytorch/pull/118647 Approved by: https://github.com/ezyang	2024-04-11 14:37:07 +00:00
Animesh Jain	1346ebf12e	[dynamo][guards] Delay DUPLICATE_INPUT guard because of incorrect ordering (#123605 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/123605 Approved by: https://github.com/jansel ghstack dependencies: #123606	2024-04-10 07:30:02 +00:00
Jason Ansel	1e9a7df8fe	[dynamo] Compile time optimizations in tx.step() (#121790 ) `python benchmarks/dynamo/microbenchmarks/dynamo_microbenchmarks.py` - Before: `symbolic_convert_overhead_stress_test: 10.7s` - After: `symbolic_convert_overhead_stress_test: 8.6s` `tx.step()` is a small part of that benchmark, so likely the speedup in that isolated function is larger than the top line. Pull Request resolved: https://github.com/pytorch/pytorch/pull/121790 Approved by: https://github.com/oulgen	2024-03-15 01:01:05 +00:00
Jason Ansel	7cc476ea16	[dynamo] Fix support for nn.Parameter constructor (part 1) (#120163 ) This captures calls to `torch.nn.Parameter` by lifting them to graph inputs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120163 Approved by: https://github.com/albanD, https://github.com/yanboliang ghstack dependencies: #121086	2024-03-11 05:14:42 +00:00
Joel Schlosser	dad1b76584	Introduce EphemeralSource for symbols that should be simplified out (#120948 ) Context: view fake-ification should handle closed-over state in ViewFuncs for use in view replay by: * fake-ifying tensors * symbolicizing SymInts This avoids invalid specialization during view replay. However, the symbols / tensors created as intermediates in the view chain should not stick around or be guarded on. This PR introduces an `EphemeralSource` intended to be used as a source for this purpose. It has the following properties: * Considered first to be simplified out in symbol simplification logic * Errors if guarded on Differential Revision: [D54561597](https://our.internmc.facebook.com/intern/diff/D54561597) Pull Request resolved: https://github.com/pytorch/pytorch/pull/120948 Approved by: https://github.com/ezyang	2024-03-06 02:30:52 +00:00
Sam Larsen	06f8af30fa	Change FakeTensor serialization to consider only an _active_ FakeTensor mode (#120848 ) Summary: https://github.com/pytorch/pytorch/pull/108186 make some changes related to FakeTensor serialization such that saving and loading a tensor will give us a meta tensor, even if FakeTensor mode is not enabled. This means we can't properly save and load Tensors as part of Fx graph caching. This PR changes the logic to check if there's an _active_ FakeTensor mode. Test Plan: * New unit tests * Validated unit tests introduced in https://github.com/pytorch/pytorch/pull/108186 still pass Pull Request resolved: https://github.com/pytorch/pytorch/pull/120848 Approved by: https://github.com/eellison, https://github.com/thiagocrepaldi	2024-03-01 02:37:21 +00:00
Elias Ellison	d03b11ad5b	Pass inductor strides forward in ddp optimizer (#120523 ) # Note: Returning Fake Tensors on First AOT Autograd Call # # Inductor will optimize strides of outputs when it deems it profitable. # For instance, converting to channels last. When we split the graph here # into multiple inductor compilations, we need to make sure that the # output strides of one compilation is appropriately passed to the subsequent # compilations. However, the mapping from inductor output to dynamo output # is non-trivial due to aot_autograd's deduping, de-aliasing, mutation, re-writing, # subclass handling, etc. In order to replay all this logic we set a flag such that # the first invocation of inductor in aot_autograd will return Fake Tensors with # appropriate strides. Then, all of aot autograd's runtime logic is replayed. # This gives us the appropriately strided outputs here which will reflect runtime strides. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120523 Approved by: https://github.com/yf225, https://github.com/bdhirsh	2024-02-29 22:25:00 +00:00
Jason Ansel	01ec8df6d8	[Compiled Autograd] Introduce BackwardState capture (#120382 ) This adds support for backwards hooks that are both: 1) Interior to the graph; and 2) Dynamically generated (e.g. lambdas) We do this by creating a BackwardState object that is used to register the hooks in the forward, then populated by dynamo after the forwards runs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120382 Approved by: https://github.com/xmfan	2024-02-28 20:36:47 +00:00
Animesh Jain	8a59f49da2	[dynamo][compile-time] Collect guard debug stack info only with logs enabled (#120520 ) Reduces backend=eager compile time from 33 to 19 seconds for `MobileBertForQuestionAnswering`. This also helps an internal model where guards.add function is taking 124 seconds. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120520 Approved by: https://github.com/mlazos	2024-02-27 01:51:16 +00:00
Taras Tsugrii	2c8722182e	[dynamo][guards] Avoid unnecessary stack copies. (#119115 ) There is no need to make a `frame_summary_stack` copy in case it's not modified. Proposed change uses copy-on-write functional approach that is easy to understand and is more efficient in case `self.loc_in_frame` is `None` Pull Request resolved: https://github.com/pytorch/pytorch/pull/119115 Approved by: https://github.com/Skylion007	2024-02-10 21:56:00 +00:00
Animesh Jain	0c3a1c893e	[dynamo] Setup the globals for guard_fn without a reference to f_locals (#118447 ) UPDATE - I changed the PR because from discussion with @jansel it was clear that someone else was holding on to a reference to f_locals. This PR now solves that problem first. I removed the eval_frame.c part because it was failing tests that use `exec` or `eval` with weird error like `no no locals found when storing 'math'`. I would debug that in a separate PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/118447 Approved by: https://github.com/Skylion007, https://github.com/jansel ghstack dependencies: #118975, #118420	2024-02-05 05:39:39 +00:00
Taras Tsugrii	41b63b26c2	[dynamo] Fix incorrect docstring placements in _guards.py. (#119114 ) This makes them unavailable when using help and other tools accessing them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/119114 Approved by: https://github.com/kit1980	2024-02-03 06:25:54 +00:00
lezcano	eb2bdfae88	Make variables in dict LazyTrackers (not lazily guarded yet) and avoid using DICT_KEYS guard (#117625 ) Make variables in dict lazy and remove DICT_KEYS guard. We build the keys of a dict depth-first and we rely on the guards of each element in the dict to create the correct guards. This allows us to remove the rather buggy DICT_KEYS guard and make the guard lazy. The guards are not completely lazy yet, as we instantiate them in `_HashableTracker._eq_impl` but it should be possible to make them truly lazy. Also, adding new types to the supported types within keys should be less error prone. This is marginally less efficient when we graph break, but in turn we should graph break much less. It also makes the dicts code easier to maintain (removes `is_hashable_python_var`). Pull Request resolved: https://github.com/pytorch/pytorch/pull/117625 Approved by: https://github.com/jansel, https://github.com/peterbell10, https://github.com/anijain2305 ghstack dependencies: #117982, #118098, #117983	2024-02-02 14:38:08 +00:00
Edward Z. Yang	46712b019d	Enable local_partial_types (#118467 ) When using dmypy, this setting is enabled and cannot be turned off. Force it for regular mypy too. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/118467 Approved by: https://github.com/Skylion007 ghstack dependencies: #118414, #118418, #118432	2024-01-28 13:38:22 +00:00
voznesenskym	081c5b3adc	Add Stateful/Stateless symbolic contexts, use fresh fake mode for dynamo backends (#113926 ) (#114526 ) Summary: The primary problem we are setting out to solve here is fake tensor freshness. Before this PR, fake tensors after dynamo represented fake tensors at the end of trace, so subsequent retraces like aot_autograd would start off with fake tensors in the wrong (end result) state, rather than their expected fresh state. The solution here is to start a fresh fake mode, and re-fakify the tensors. The nuance comes from ensuring that symbols are uniformly created for the symbolic sizes and strides of the tensor. This PR is the result of a lot of back and forth with ezyang and eellison. Initially, the first pass at this was not super different from what we have in the PR - the broad strokes were the same: 1) We cache source->symbol in shape_env 2) We pass policy objects around, stored at dynamo fakificaiton time, and reused for later fakification 3) We create a new fake mode for backends (from https://github.com/pytorch/pytorch/pull/113605/files) This is ugly, and has some layering violations. We detoured our decision making through a few other alternatives. Immutable/mutable fake tensor mode was the most interesting alternative, https://github.com/pytorch/pytorch/pull/113653, and was struck down on concerns of complexity in fake mode combined with it not covering all edge cases. We also detoured on what to do about tensor memoization returning back potentially different tensors than requested, and if that was an anti pattern (it is) we want to hack in with the symbol cache (we don't). We went back to the drawing board here, but with a few concessions: 1) the cache for source->symbol must live outside of shape_env, for both lifecycle, and layering reasons 2) A good amount of work needs to be done to pipe policy around fake_mode and meta_utils correctly, to cover all the cases (ezyang did this) cc penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 aakhundov kadeng imported-using-ghimport Test Plan: Imported from OSS Reviewed By: huydhn, Chillee Differential Revision: D51566250 Pulled By: voznesenskym Pull Request resolved: https://github.com/pytorch/pytorch/pull/114526 Approved by: https://github.com/Chillee, https://github.com/huydhn	2023-11-26 23:40:32 +00:00
PyTorch MergeBot	2f3beb715c	Revert "Add Stateful/Stateless symbolic contexts, use fresh fake mode for dynamo backends (#113926 )" This reverts commit `2ca1119d53`. Reverted https://github.com/pytorch/pytorch/pull/113926 on behalf of https://github.com/DanilBaibak due to Break internal build ([comment](https://github.com/pytorch/pytorch/pull/113926#issuecomment-1822713852))	2023-11-22 12:52:33 +00:00
voznesenskym	2ca1119d53	Add Stateful/Stateless symbolic contexts, use fresh fake mode for dynamo backends (#113926 ) The primary problem we are setting out to solve here is fake tensor freshness. Before this PR, fake tensors after dynamo represented fake tensors at the end of trace, so subsequent retraces like aot_autograd would start off with fake tensors in the wrong (end result) state, rather than their expected fresh state. The solution here is to start a fresh fake mode, and re-fakify the tensors. The nuance comes from ensuring that symbols are uniformly created for the symbolic sizes and strides of the tensor. This PR is the result of a lot of back and forth with @ezyang and @eellison. Initially, the first pass at this was not super different from what we have in the PR - the broad strokes were the same: 1) We cache source->symbol in shape_env 2) We pass policy objects around, stored at dynamo fakificaiton time, and reused for later fakification 3) We create a new fake mode for backends (from https://github.com/pytorch/pytorch/pull/113605/files) This is ugly, and has some layering violations. We detoured our decision making through a few other alternatives. Immutable/mutable fake tensor mode was the most interesting alternative, https://github.com/pytorch/pytorch/pull/113653, and was struck down on concerns of complexity in fake mode combined with it not covering all edge cases. We also detoured on what to do about tensor memoization returning back potentially different tensors than requested, and if that was an anti pattern (it is) we want to hack in with the symbol cache (we don't). We went back to the drawing board here, but with a few concessions: 1) the cache for source->symbol must live outside of shape_env, for both lifecycle, and layering reasons 2) A good amount of work needs to be done to pipe policy around fake_mode and meta_utils correctly, to cover all the cases (@ezyang did this) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113926 Approved by: https://github.com/ezyang, https://github.com/eellison	2023-11-20 23:06:37 +00:00
Jez Ng	5b95715bc0	Make {Tracing,Compile}Context.get() return non-optional type (#113535 ) They are used in many contexts that don't actually check if the returned type is `None`. I have also created `try_get()` for the cases where we do actually want an Optional type returned. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113535 Approved by: https://github.com/ezyang ghstack dependencies: #113412	2023-11-14 04:31:12 +00:00
Jez Ng	a8cf04fd2a	[inductor] Make {output_graph,pad_mm}.py pass follow_imports typechecking (#113413 ) I changed OutputGraph.nn_modules' type to `Dict[str, Any]` because it seems that `register_attr_or_module` can populate it with essentially any type. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113413 Approved by: https://github.com/Skylion007	2023-11-11 22:15:46 +00:00
Jez Ng	b0ede09682	[inductor] Make pattern_matcher.py pass follow_imports typechecking (#113409 ) Import following reveals that a good number of hints were wrong... Pull Request resolved: https://github.com/pytorch/pytorch/pull/113409 Approved by: https://github.com/Skylion007	2023-11-10 19:58:08 +00:00
Jason Ansel	9664190952	[dynamo] Eagerly install guards (#111415 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111415 Approved by: https://github.com/voznesenskym ghstack dependencies: #111306	2023-11-07 19:55:19 +00:00
Jason Ansel	4b8a5e1854	[dynamo] Remove VariableTracker.as_specialized (#112363 ) My local testing can't seem to find this function actually doing anything. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112363 Approved by: https://github.com/yanboliang	2023-10-30 20:07:55 +00:00
Peter Bell	bbd5b935e4	Use `pytree.tree_leaves` everywhere (#112324 ) This changes all the instances I could find of `tree_flatten(...)[0]` or `x, _ = tree_flatten` to use `tree_leaves`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112324 Approved by: https://github.com/lezcano ghstack dependencies: #112327, #112323	2023-10-30 03:39:04 +00:00
lezcano	c8a5bb451e	Do not import sympy within torch._prims_common (#112034 ) This is the first of a few PRs that avoid importing SymPy at import time. The pitch here is that we (almost!) do not have SymPy on our API, so this should be feasible. This should speed-up torch imports by a good 15% as per https://dev-discuss.pytorch.org/t/delving-into-what-happens-when-you-import-torch/1589 In this PR we just move a few global imports into local imports. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112034 Approved by: https://github.com/ezyang	2023-10-26 12:53:25 +00:00
voznesenskym	9455af58b5	[easy][dynamo] Cleanup guard builder selection (#111723 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111723 Approved by: https://github.com/jon-chuang, https://github.com/jansel	2023-10-21 10:48:32 +00:00
Animesh Jain	58637c4b43	[dynamo] Remove SuperSource (#110475 ) The motivation for removing this is already present in the pre-PR comments. Copying it ~~~ # NB - SuperSource is a weird one. # it is our only source with 2 bases, so we use the objec # as the base, rather than the type, since an invocation # like super(Foo, foo) is represented here, the source object base is more spiritually # aligned with the instance, rather than the type. # This whole construction is questionable tho, and we should probably find a way to # avoid this exception to our otherwise nice source parentage invariant. ~~~ Instead of using super(a, b), we can use `type(b).__mro__[index]`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110475 Approved by: https://github.com/jansel	2023-10-08 04:45:06 +00:00
chilli	005e8ddcb9	cache the hash construction on Guard (#110464 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110464 Approved by: https://github.com/zou3519, https://github.com/voznesenskym	2023-10-04 04:49:18 +00:00
Edward Yang	88600e7d2e	[RELAND] Force synced KJT to trace unbacked SymInt (#108960 ) (#109216 ) Summary: The basic concept behind this diff is to modify Dynamo's tracing behavior when it encounters a KeyedJaggedTensor that is synced (aka has `_length_per_key` and `_offset_per_key` populated). These fields are lists of integers; ordinarily, Dynamo will optimistically try to specialize on integers, however, for KJTs, we know that these integers will definitely vary from run-to-run. Furthermore, ordinarily, we would also specialize these integers if they are 0/1, but we will frequently expect features in KJTs to be 0/1. The fix is to detect KJTs and treat these integers as unbacked integers. This is NOT a universally sound optimization: when treating these integers as unbacked, we never report them as equal to zero or one. In return, we always generate graphs that generalize no matter the length of values on features. This is enough to trace through APS sparse arch, torchrec_dlrm and some small split-cat examples. The special integer behavior is triggered by a dynamically scoped `force_unspec_int_unbacked_size_like` variable on TracingContext, which we trigger when we wrap a KJT. There probably are other ways to do this, but this was simple and worked. Test Plan: ``` buck2 test mode/dev-nosan //pytorch/benchmark/fb/test_gpu:run_test_gpu ``` from aakhundov 1. first build feed_lower_benchmark: ``` buck2 build --show-output mode/opt -c python.package_style=inplace -c fbcode.enable_gpu_sections=true -c fbcode.platform=platform010 -c fbcode.split-dwarf=true hpc/new/models/feed/benchmark:feed_lower_benchmark ``` 2. then run the lowering of the model with it: ``` TORCHINDUCTOR_MAX_AUTOTUNE=1 TORCHINDUCTOR_UNIQUE_KERNEL_NAMES=1 TORCH_LOGS="output_code,graph_code" TORCH_COMPILE_DEBUG=1 ../buck-out/v2/gen/fbcode/79c6b019ee0f9469/hpc/new/models/feed/benchmark/__feed_lower_benchmark__/feed_lower_benchmark.par --load=manifold://ig_inference_model/tree/user/facebook/fblearner/predictor/960999465/60/gpu_lowering/input.predictor --skip-trt --skip-ait --sync-mode=0 --enable-aot-inductor --lower-presets="ig_stories" --gpu-trace ``` cf https://docs.google.com/document/d/1yD30xYrdmM8r2HTdmXnZTg0-MHVexfVrAa0294m1AUE/edit?pli=1#heading=h.qiv3fp7e6zg0 From torchrec: https://www.internalfb.com/intern/wiki/Torchrec/Development/Testing_production_models/ From ge0405 baseline (without your diff): f477293168 your diff: f477292363 ``` buck2 test //caffe2/test/dynamo:test_dynamo_torchrec buck2 run 'fbcode//mode/opt' fbcode//pytorch/benchmark/fb/test_gpu:run_test_gpu -- 'pytorch.benchmark.fb.test_gpu.test_gpu.TestBenchmarkFbGpu.test_train_blue_reels_vdd_v3_inductor_speedup' ``` Differential Revision: D49236757 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109216 Approved by: https://github.com/voznesenskym	2023-09-18 14:39:44 +00:00
PyTorch MergeBot	1d32c9c7f2	Revert "Force synced KJT to trace unbacked SymInt (#108960 )" This reverts commit `f9a250c35b`. Reverted https://github.com/pytorch/pytorch/pull/108960 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/108960#issuecomment-1715850779))	2023-09-12 14:37:36 +00:00
Edward Yang	f9a250c35b	Force synced KJT to trace unbacked SymInt (#108960 ) Summary: The basic concept behind this diff is to modify Dynamo's tracing behavior when it encounters a KeyedJaggedTensor that is synced (aka has `_length_per_key` and `_offset_per_key` populated). These fields are lists of integers; ordinarily, Dynamo will optimistically try to specialize on integers, however, for KJTs, we know that these integers will definitely vary from run-to-run. Furthermore, ordinarily, we would also specialize these integers if they are 0/1, but we will frequently expect features in KJTs to be 0/1. The fix is to detect KJTs and treat these integers as unbacked integers. This is NOT a universally sound optimization: when treating these integers as unbacked, we never report them as equal to zero or one. In return, we always generate graphs that generalize no matter the length of values on features. This is enough to trace through APS sparse arch, torchrec_dlrm and some small split-cat examples. The special integer behavior is triggered by a dynamically scoped `force_unspec_int_unbacked_size_like` variable on TracingContext, which we trigger when we wrap a KJT. There probably are other ways to do this, but this was simple and worked. Test Plan: ``` buck2 test mode/dev-nosan //pytorch/benchmark/fb/test_gpu:run_test_gpu ``` from aakhundov 1. first build feed_lower_benchmark: ``` buck2 build --show-output mode/opt -c python.package_style=inplace -c fbcode.enable_gpu_sections=true -c fbcode.platform=platform010 -c fbcode.split-dwarf=true hpc/new/models/feed/benchmark:feed_lower_benchmark ``` 2. then run the lowering of the model with it: ``` TORCHINDUCTOR_MAX_AUTOTUNE=1 TORCHINDUCTOR_UNIQUE_KERNEL_NAMES=1 TORCH_LOGS="output_code,graph_code" TORCH_COMPILE_DEBUG=1 ../buck-out/v2/gen/fbcode/79c6b019ee0f9469/hpc/new/models/feed/benchmark/__feed_lower_benchmark__/feed_lower_benchmark.par --load=manifold://ig_inference_model/tree/user/facebook/fblearner/predictor/960999465/60/gpu_lowering/input.predictor --skip-trt --skip-ait --sync-mode=0 --enable-aot-inductor --lower-presets="ig_stories" --gpu-trace ``` cf https://docs.google.com/document/d/1yD30xYrdmM8r2HTdmXnZTg0-MHVexfVrAa0294m1AUE/edit?pli=1#heading=h.qiv3fp7e6zg0 From torchrec: https://www.internalfb.com/intern/wiki/Torchrec/Development/Testing_production_models/ From ge0405 baseline (without your diff): f477293168 your diff: f477292363 Differential Revision: D49019987 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108960 Approved by: https://github.com/voznesenskym	2023-09-12 03:44:24 +00:00
Edward Z. Yang	66f67d9a25	Print restart attempt as part of Dynamo log context (#108864 ) Now looks like: ``` [2023-09-08 06:04:48,532] [0/0] torch._dynamo.symbolic_convert: [DEBUG] TRACE STORE_ATTR foo [ConstantVariable(int), NNModule Variable()] [2023-09-08 06:04:48,532] [0/0] torch._dynamo.convert_frame: [INFO] Restarting analysis due to _dynamo/variables/nn_module.py :138 in convert_to_unspecialized [2023-09-08 06:04:48,533] [0/0_1] torch._dynamo.symbolic_convert: [INFO] Step 1: torchdynamo start tracing f /data/users/ezyang/c/pytorch/a.py:6 [2023-09-08 06:04:48,533] [0/0_1] torch._dynamo.symbolic_convert.__trace_source: [DEBUG] TRACE starts_line f /data/users/ezyang/c/pytorch/a.py:6 ``` I'm happy to bikeshed the exact formatting of the attempt number if you want. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/108864 Approved by: https://github.com/mlazos, https://github.com/voznesenskym	2023-09-08 23:00:19 +00:00
Huy Do	5a4fe05a15	Revert "Force synced KJT to trace unbacked SymInt (#107788 )" (#108684 ) This reverts commit `3b92ef814d`. So let's manually revert it instead. (Not sure why the bot doesn't work on https://github.com/pytorch/pytorch/pull/107788) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108684 Approved by: https://github.com/ezyang	2023-09-06 19:15:45 +00:00
Edward Z. Yang	3b92ef814d	Force synced KJT to trace unbacked SymInt (#107788 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/107788 Approved by: https://github.com/voznesenskym	2023-09-06 03:18:26 +00:00
Animesh Jain	a506d0ad8f	[dynamo] Store originating source in the Guard object (#107634 ) Many times, I find myself wanting to know the source for the guard. This PR adds that as a field of guard itself. Pull Request resolved: https://github.com/pytorch/pytorch/pull/107634 Approved by: https://github.com/voznesenskym ghstack dependencies: #107622	2023-08-22 02:16:31 +00:00
Edward Z. Yang	8292b03c47	Use fast traceback for symbolic shapes (#107439 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/107439 Approved by: https://github.com/voznesenskym ghstack dependencies: #107505, #107516, #107530, #107532, #107562, #107471	2023-08-22 01:03:13 +00:00
Edward Z. Yang	8316affc45	Add frame/recompile counter to all log messages in tracing context (#107530 ) All log messages that occur while running Dynamo compilation now have `[X/Y]` added to the beginning of their message. X represents the frame being compiled, while Y says which compilation of the frame. For example, if you are debugging a frame that is repeatedly recompiling, you can look for N/0, N/1, N/2, etc. for the same N. Here is what the logs look like as you transition from one frame to another: <img width="1372" alt="image" src="https://github.com/pytorch/pytorch/assets/13564/4897e368-1e50-4807-b342-54e911bcf087"> To accurately get this prefix added to all messages, I had to expand the scope of the `tracing` context manager. Its scope now coincides with `log_compilation_event`. To do this, I had to populate fake mode lazily in the TracingContext, since it isn't created until later, inside the OutputGraph. This subsumes the previous X.Y logging that was solely for dynamic shapes. Unfortunately I had to reindent some stuff. Review the diff with whitespace off. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/107530 Approved by: https://github.com/anijain2305 ghstack dependencies: #107505, #107516	2023-08-21 13:02:12 +00:00
Edward Z. Yang	d6d485fa8c	Revamp guard debug logging (#107505 ) The new guard printout looks like this: ``` [DEBUG] GUARDS: [DEBUG] ___check_type_id(L['name'], 7605632) # if name == "special_attr": # test/dynamo/test_misc.py:1155 in __getattribute__ [DEBUG] L['name'] == '_backward_pre_hooks' # if name == "special_attr": # test/dynamo/test_misc.py:1155 in __getattribute__ [DEBUG] ___check_obj_id(L['self'], 139746432564960) # return super().__getattribute__(name) # test/dynamo/test_misc.py:1157 in __getattribute__ [DEBUG] ___check_obj_id(L['__class__'], 1451499216) # return super().__getattribute__(name) # test/dynamo/test_misc.py:1157 in __getattribute__ [DEBUG] ___is_grad_enabled() # _dynamo/output_graph.py:346 in init_ambient_guards [DEBUG] not ___are_deterministic_algorithms_enabled() # _dynamo/output_graph.py:342 in init_ambient_guards [DEBUG] ___is_torch_function_enabled() # _dynamo/output_graph.py:350 in init_ambient_guards [DEBUG] utils_device.CURRENT_DEVICE == None # _dynamo/output_graph.py:348 in init_ambient_guards ``` Along with the guards, we also print what line of user code caused the guard to be added, or what line of Dynamo internal code added the guard (if there is no user stack trace, which is typically the case for ambient guards.) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/107505 Approved by: https://github.com/mlazos, https://github.com/voznesenskym, https://github.com/anijain2305	2023-08-20 06:50:27 +00:00
Edward Z. Yang	67bb3c05b0	Add verbose_guards logging artifact (#107388 ) It looks like this: ``` [DEBUG] GUARD: ___check_type_id(L['z'][L["MyEnum"].BAR], 7640416) and L['z'][L["MyEnum"].BAR] == 10 [DEBUG] Stack: [DEBUG] File "/data/users/ezyang/b/pytorch/test/dynamo/test_misc.py", line 6657, in <module> [DEBUG] run_tests() [DEBUG] File "/data/users/ezyang/b/pytorch/torch/_dynamo/test_case.py", line 38, in run_tests [DEBUG] run_tests() [DEBUG] File "/data/users/ezyang/b/pytorch/torch/testing/_internal/common_utils.py", line 985, in run_tests [DEBUG] unittest.main(argv=argv) [DEBUG] File "/home/ezyang/local/b/pytorch-env/lib/python3.10/unittest/main.py", line 101, in __init__ [DEBUG] self.runTests() [DEBUG] File "/home/ezyang/local/b/pytorch-env/lib/python3.10/unittest/main.py", line 271, in runTests [DEBUG] self.result = testRunner.run(self.test) [DEBUG] File "/home/ezyang/local/b/pytorch-env/lib/python3.10/unittest/runner.py", line 184, in run [DEBUG] test(result) [DEBUG] File "/home/ezyang/local/b/pytorch-env/lib/python3.10/unittest/suite.py", line 84, in __call__ [DEBUG] return self.run(args, kwds) [DEBUG] File "/home/ezyang/local/b/pytorch-env/lib/python3.10/unittest/suite.py", line 122, in run [DEBUG] test(result) [DEBUG] File "/home/ezyang/local/b/pytorch-env/lib/python3.10/unittest/suite.py", line 84, in __call__ [DEBUG] return self.run(args, *kwds) [DEBUG] File "/home/ezyang/local/b/pytorch-env/lib/python3.10/unittest/suite.py", line 122, in run [DEBUG] test(result) [DEBUG] File "/home/ezyang/local/b/pytorch-env/lib/python3.10/unittest/case.py", line 650, in __call__ [DEBUG] return self.run(args, *kwds) [DEBUG] File "/data/users/ezyang/b/pytorch/torch/testing/_internal/common_utils.py", line 2521, in run [DEBUG] self._run_with_retry( [DEBUG] File "/data/users/ezyang/b/pytorch/torch/testing/_internal/common_utils.py", line 2450, in _run_with_retry [DEBUG] super_run(result=result) [DEBUG] File "/home/ezyang/local/b/pytorch-env/lib/python3.10/unittest/case.py", line 591, in run [DEBUG] self._callTestMethod(testMethod) [DEBUG] File "/home/ezyang/local/b/pytorch-env/lib/python3.10/unittest/case.py", line 549, in _callTestMethod [DEBUG] method() [DEBUG] File "/data/users/ezyang/b/pytorch/torch/testing/_internal/common_utils.py", line 2377, in wrapper [DEBUG] method(args, *kwargs) [DEBUG] File "/data/users/ezyang/b/pytorch/test/dynamo/test_misc.py", line 2529, in test_enum_as_dict_key_with_overloaded_str [DEBUG] res = opt_fn(x) [DEBUG] File "/data/users/ezyang/b/pytorch/torch/_dynamo/eval_frame.py", line 333, in _fn [DEBUG] return fn(args, *kwargs) [DEBUG] File "/data/users/ezyang/b/pytorch/test/dynamo/test_misc.py", line 2519, in fn [DEBUG] torch._dynamo.graph_break() [DEBUG] File "/data/users/ezyang/b/pytorch/torch/_dynamo/eval_frame.py", line 493, in catch_errors [DEBUG] return callback(frame, cache_size, hooks, frame_state) [DEBUG] File "/data/users/ezyang/b/pytorch/torch/_dynamo/convert_frame.py", line 637, in _convert_frame [DEBUG] result = inner_convert(frame, cache_size, hooks, frame_state) [DEBUG] File "/data/users/ezyang/b/pytorch/torch/_dynamo/convert_frame.py", line 133, in _fn [DEBUG] return fn(args, *kwargs) [DEBUG] File "/data/users/ezyang/b/pytorch/torch/_dynamo/convert_frame.py", line 371, in _convert_frame_assert [DEBUG] return _compile( [DEBUG] File "/data/users/ezyang/b/pytorch/torch/_dynamo/convert_frame.py", line 567, in _compile [DEBUG] guarded_code = compile_inner(code, one_graph, hooks, transform) [DEBUG] File "/data/users/ezyang/b/pytorch/torch/_dynamo/utils.py", line 181, in time_wrapper [DEBUG] r = func(args, kwargs) [DEBUG] File "/data/users/ezyang/b/pytorch/torch/_dynamo/convert_frame.py", line 466, in compile_inner [DEBUG] out_code = transform_code_object(code, transform) [DEBUG] File "/data/users/ezyang/b/pytorch/torch/_dynamo/bytecode_transformation.py", line 1028, in transform_code_object [DEBUG] transformations(instructions, code_options) [DEBUG] File "/data/users/ezyang/b/pytorch/torch/_dynamo/convert_frame.py", line 416, in transform [DEBUG] tracer = InstructionTranslator( [DEBUG] File "/data/users/ezyang/b/pytorch/torch/_dynamo/symbolic_convert.py", line 2018, in __init__ [DEBUG] self.symbolic_locals = collections.OrderedDict( [DEBUG] File "/data/users/ezyang/b/pytorch/torch/_dynamo/symbolic_convert.py", line 2021, in <genexpr> [DEBUG] VariableBuilder( [DEBUG] File "/data/users/ezyang/b/pytorch/torch/_dynamo/variables/builder.py", line 211, in __call__ [DEBUG] vt = self._wrap(value).clone(self.options()) [DEBUG] File "/data/users/ezyang/b/pytorch/torch/_dynamo/variables/builder.py", line 404, in _wrap [DEBUG] result = { [DEBUG] File "/data/users/ezyang/b/pytorch/torch/_dynamo/variables/builder.py", line 405, in <dictcomp> [DEBUG] k: VariableBuilder( [DEBUG] File "/data/users/ezyang/b/pytorch/torch/_dynamo/variables/builder.py", line 211, in __call__ [DEBUG] vt = self._wrap(value).clone(*self.options()) [DEBUG] File "/data/users/ezyang/b/pytorch/torch/_dynamo/variables/builder.py", line 354, in _wrap [DEBUG] return type_dispatch(self, value) [DEBUG] File "/data/users/ezyang/b/pytorch/torch/_dynamo/variables/builder.py", line 837, in wrap_literal [DEBUG] return self.wrap_unspecialized_primitive(value) [DEBUG] File "/data/users/ezyang/b/pytorch/torch/_dynamo/variables/builder.py", line 1073, in wrap_unspecialized_primitive [DEBUG] guards=self.make_guards(GuardBuilder.CONSTANT_MATCH), [DEBUG] File "/data/users/ezyang/b/pytorch/torch/_dynamo/variables/builder.py", line 269, in make_guards [DEBUG] return {source.make_guard(guard) for guard in guards} [DEBUG] File "/data/users/ezyang/b/pytorch/torch/_dynamo/variables/builder.py", line 269, in <setcomp> [DEBUG] return {source.make_guard(guard) for guard in guards} [DEBUG] File "/data/users/ezyang/b/pytorch/torch/_guards.py", line 641, in make_guard [DEBUG] return Guard(self.name(), self.guard_sou ``` One downside is I can't report why* the guard was added. I'm not entirely sure how to do this; the problem is guards will propagate to a bunch of variables before finally getting included as part of the final set. Maybe a very very verbose version could report stack traces at every handoff point. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/107388 Approved by: https://github.com/mlazos ghstack dependencies: #107438, #107358	2023-08-18 19:05:54 +00:00
eellison	e9ae820279	Unfuse bias add before pointwise ops (#106912 ) I get a 2% inference speedup in HF with this PR. I checked to see if there any models where unfusing was slower than the cublas gelu fusion, and I did not see any, which was surprising to me. Sorry for the cublas-activation api churn 😬 Kicking off another run in cublas 12, it's possible that the results have changed since. Pull Request resolved: https://github.com/pytorch/pytorch/pull/106912 Approved by: https://github.com/jansel ghstack dependencies: #106911	2023-08-16 17:22:24 +00:00
Edward Z. Yang	91afefb55b	Fix some fake mode confusion between inner/outer fake mode in export (#106515 ) Fixes https://github.com/pytorch/pytorch/issues/106412 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/106515 Approved by: https://github.com/voznesenskym, https://github.com/BowenBao, https://github.com/thiagocrepaldi	2023-08-04 15:42:23 +00:00
Edward Z. Yang	697893568d	Improve error message when export encounters non-local input (#106403 ) Previously, you would get an error like ``` Dynamo input and output is a strict subset of traced input/output ``` now you get ``` Cannot export model which references tensors that are neither buffers/parameters/constants nor are direct inputs. For each tensor, if you'd like this tensor to be an explicit input, add it as a dummy argument to the top-level model definition you are exporting; if you would like its value to be embedded as an exported constant, wrap its access in a function marked with @assume_constant_result. G['bulbous_bouffant'], accessed at: File "test_export.py", line N, in f return bulbous_bouffant + y ``` This doesn't handle outputs, I'm going to hit that next. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/106403 Approved by: https://github.com/tugsbayasgalan	2023-08-03 12:35:25 +00:00
Edward Z. Yang	76163a56c0	Refactor stack handling to always use TracingContext to populate real stack on exception (#106277 ) The basic gist of the PR is simple, but it's accompanied with some careful modifications and unit tests to make sure I got it right. Check inline comments for more details. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/106277 Approved by: https://github.com/albanD, https://github.com/voznesenskym	2023-08-02 00:09:16 +00:00
Edward Z. Yang	884cd53e49	Unconditionally record when FakeTensorMode is allocated and report it on inconsistency (#105927 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/105927 Approved by: https://github.com/albanD	2023-07-26 03:38:42 +00:00
Edward Z. Yang	523100a2f1	Make _CURRENT_TRACING_CONTEXT thread local (#105942 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/105942 Approved by: https://github.com/albanD, https://github.com/voznesenskym	2023-07-26 03:38:01 +00:00
Michael Lazos	05eea20eb9	[dynamo] Simulate torch function enablement state (#105091 ) Part of https://github.com/pytorch/pytorch/issues/93723 Pull Request resolved: https://github.com/pytorch/pytorch/pull/105091 Approved by: https://github.com/voznesenskym, https://github.com/anijain2305	2023-07-13 17:42:20 +00:00
Edward Z. Yang	979f826015	Read out real strides from compilation result, rather than real args (#105010 ) This prefigures a refactor that will move the backward compilation to entirely ahead of time, so I need to extract these strides some other way. Straight from the compiler's mouth will do it. I can't easily get the information via the return result of `fw_compiler` without changing the calling convention, so instead I smuggle it via TracingContext. TracingContext may be None when we are compiling patterns for the joint graph pattern matcher. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/105010 Approved by: https://github.com/shunting314	2023-07-12 11:33:08 +00:00
Michael Voznesensky	e5e9d563c2	Lift user defined attributes into inputs for certain cases (user defined types and tensors) (#103386 ) (1) Lazy (converts to dynamo variable on access only) (2) Uses existing side effect/reconstruct tech (3) not tensor opinionated Pull Request resolved: https://github.com/pytorch/pytorch/pull/103386 Approved by: https://github.com/jansel	2023-06-20 23:45:19 +00:00
Thiago Crepaldi	6f655d4195	Add symbolic tracing support to torch._dynamo.export (fake input + weights) (#100017 ) Fixes #95900 Using the following repro as guide: ```python import torch import torch._dynamo from torch._subclasses import fake_tensor from torch.fx.experimental.symbolic_shapes import ShapeEnv from torch._dynamo.output_graph import config class Model(torch.nn.Module): def __init__(self) -> None: super().__init__() self.linear = torch.nn.Linear(2, 2) self.linear2 = torch.nn.Linear(2, 2) def forward(self, x): out = self.linear(x) out = self.linear2(out) return out fake_mode = fake_tensor.FakeTensorMode(allow_non_fake_inputs=False, allow_fallback_kernels=True, shape_env=ShapeEnv( allow_scalar_outputs=config.capture_scalar_outputs, allow_dynamic_output_shape_ops=config.capture_dynamic_output_shape_ops, frame_id=0 ), ) # Fakefying input/model before calling torch._dynamo.export with fake_mode: fake_x = torch.rand(5, 2, 2) model = Model() # Calling torch._dynamo.export without active fake mode graph_module, guards = torch._dynamo.export( model, fake_x, aten_graph=True, fake_mode=fake_mode ) graph_module.print_readable() graph_module.graph.print_tabular() ``` Summary of changes: * Plumb fake_mode through torch.export API. When specified, it replaces the creation of a new FaketendorMode at InstructionTranslator on behalf of OutputGraph Hacks FakeTensor.__new__ to prevent a torch.tensor._make_subclass call for inputs that are already fakefied by user. This probably need to be fixed in a nicer way. Any idea? * Removed a few asserts that didn't want faked tensors coming from user script * Added torch._subclasses.fake_tensor.FakeTensor to type list on a few asserts check to allow fake inputs The changes above allowed symbolic tracing with both static and dynamic shapes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/100017 Approved by: https://github.com/ezyang	2023-06-15 21:28:10 +00:00
Michael Voznesensky	056bf951bf	Strengthen partially supported invariant of base for chained sources (#103445 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/103445 Approved by: https://github.com/ezyang	2023-06-13 22:44:28 +00:00
Elias Ellison	d083d444ff	Inductor Freezing (#100652 ) Adds a freezing pass that will constant fold parameters in inductor `config.freezing`. This occurs post functionalization in aot autograd to capture both dispatching and allow passes to occur post functionalization. A few notes: - There is an option to discard parameters `config.freezing_discard_parameters` which will take the current eager modules and wrap parameters to a Tensor subclass which will error if used. - I needed to expose flat_params in aot_autograd in order to discard old references when we constant fold away parameters, like with amp. I also exposed `fw_metadata` to avoid constant folding mutated paraemters. - Caching parameter transformations/constant folding across different inferences nyi - Checking version_counter of constant folded params nyi I'm not really sure what the actual naming should be. In jit there was both "freezing", which was platform agnostic, and "optimize for inference", which made device specific optimizations. We're doing the latter here but maybe freezing is a better name. Differential Revision: [D46244033](https://our.internmc.facebook.com/intern/diff/D46244033) Pull Request resolved: https://github.com/pytorch/pytorch/pull/100652 Approved by: https://github.com/jansel	2023-06-12 20:56:03 +00:00
Mark Saroufim	95fced4483	Pretty dataclass dynamo explain (#102869 ) Also thinking out loud: maybe we only print graph break reasons? And for the rest we have a verbose print which prints everything? TODO: some tests are failing based on what they expect a guard string to look like, easy to fix i'll do it early next week # After ``` (sourcetorch) ubuntu@ip-172-31-1-136:~/test$ python pretty.py BREAK Graph Count: 2 Graph Break Count: 1 Op Count: 2 Break Reasons: Break Reason 1: Reason: call_function BuiltinVariable(print) [ConstantVariable(str)] {} User Stack: <FrameSummary file /home/ubuntu/test/pretty.py, line 6 in fn> Ops per Graph: Ops 1: <built-in function add> Ops 2: <built-in function add> Out Guards: Guard 1: Name: '' Source: global Create Function: GRAD_MODE Guard Types: ['GRAD_MODE'] Code List: ['___is_grad_enabled()'] Object Weakref: None Guarded Class Weakref: None Guard 2: Name: '' Source: global Create Function: DEFAULT_DEVICE Guard Types: ['DEFAULT_DEVICE'] Code List: ['utils_device.CURRENT_DEVICE == None'] Object Weakref: None Guarded Class Weakref: None Guard 3: Name: "G['print']" Source: global Create Function: BUILTIN_MATCH Guard Types: None Code List: None Object Weakref: None Guarded Class Weakref: None Guard 4: Name: '' Source: global Create Function: DETERMINISTIC_ALGORITHMS Guard Types: ['DETERMINISTIC_ALGORITHMS'] Code List: ['not ___are_deterministic_algorithms_enabled()'] Object Weakref: None Guarded Class Weakref: None Guard 5: Name: "L['x']" Source: local Create Function: TENSOR_MATCH Guard Types: None Code List: None Object Weakref: None Guarded Class Weakref: None Guard 6: Name: '' Source: global Create Function: GRAD_MODE Guard Types: ['GRAD_MODE'] Code List: ['___is_grad_enabled()'] Object Weakref: None Guarded Class Weakref: None Guard 7: Name: '' Source: global Create Function: DEFAULT_DEVICE Guard Types: ['DEFAULT_DEVICE'] Code List: ['utils_device.CURRENT_DEVICE == None'] Object Weakref: None Guarded Class Weakref: None Guard 8: Name: '' Source: global Create Function: DETERMINISTIC_ALGORITHMS Guard Types: ['DETERMINISTIC_ALGORITHMS'] Code List: ['not ___are_deterministic_algorithms_enabled()'] Object Weakref: None Guarded Class Weakref: None Guard 9: Name: "L['x']" Source: local Create Function: TENSOR_MATCH Guard Types: None Code List: None Object Weakref: None Guarded Class Weakref: None Compile Times: TorchDynamo compilation metrics: Function Runtimes (s) ------------------------------ -------------- _compile 0.0164, 0.0035 OutputGraph.call_user_compiler 0.0000, 0.0000 ``` ## Before ``` ('Dynamo produced 2 graphs with 1 graph break and 2 ops', [{Guard(name='print', source=<GuardSource.GLOBAL: 1>, create_fn=<function GuardBuilder.BUILTIN_MATCH at 0x7f92ea5009d0>, is_volatile=False, guard_types=None, code_list=None, obj_weakref=None, guarded_class_weakref=None), Guard(name='x', source=<GuardSource.LOCAL: 0>, create_fn=<function GuardBuilder.TENSOR_MATCH at 0x7f92ea501000>, is_volatile=False, guard_types=['TENSOR_MATCH'], code_list=None, obj_weakref=<weakref at 0x7f9224d28f40; dead>, guarded_class_weakref=<weakref at 0x7f92d81734c0; to 'torch._C._TensorMeta' at 0x540b610 (Tensor)>)}, {Guard(name='x', source=<GuardSource.LOCAL: 0>, create_fn=<function GuardBuilder.TENSOR_MATCH at 0x7f92ea501000>, is_volatile=False, guard_types=['TENSOR_MATCH'], code_list=None, obj_weakref=<weakref at 0x7f9224d5e700; dead>, guarded_class_weakref=<weakref at 0x7f92d81734c0; to 'torch._C._TensorMeta' at 0x540b610 (Tensor)>)}], [GraphModule(), GraphModule()], [[<built-in function add>], [<built-in function add>]], [GraphCompileReason(reason='call_function BuiltinVariable(print) [ConstantVariable(str)] {}', user_stack=[<FrameSummary file <ipython-input-1-9e2ddb639697>, line 6 in fn>]), GraphCompileReason(reason='return_value', user_stack=[<FrameSummary file <ipython-input-1-9e2ddb639697>, line 8 in <graph break in fn>>])], 'Dynamo produced 2 graphs with 1 graph break and 2 ops\n Break reasons: \n\n1. call_function BuiltinVariable(print) [ConstantVariable(str)] {}\n File "<ipython-input-1-9e2ddb639697>", line 6, in fn\n print("BREAK")\n \n2. return_value\n File "<ipython-input-1-9e2ddb639697>", line 8, in <graph break in fn>\n return x\n \nTorchDynamo compilation metrics:\nFunction Runtimes (s)\n------------------------------ --------------\n_compile 0.0418, 0.0084\nOutputGraph.call_user_compiler 0.0001, 0.0001') ``` ## Program ```python import torch import torch._dynamo def fn(x): x = x + 1 print("BREAK") x = x + 1 return x out = torch._dynamo.explain(fn, torch.randn(10)) print(out) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/102869 Approved by: https://github.com/voznesenskym	2023-06-07 22:38:57 +00:00
Yanbo Liang	9ff1932d2b	[Dynamo] Save global autocast state to restore on graph break (#102415 ) Fixes #102414 Pull Request resolved: https://github.com/pytorch/pytorch/pull/102415 Approved by: https://github.com/yf225	2023-05-30 23:03:21 +00:00
Animesh Jain	dafa009c3c	[dynamo][moco] Save global torch state to restore on graph break (#101201 ) This is relevant to https://github.com/pytorch/pytorch/pull/100570 as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/101201 Approved by: https://github.com/voznesenskym	2023-05-18 01:03:15 +00:00
Michael Voznesensky	ffcbd1c2de	Move tracked nn_modules from OutputGraph to TracingContext (#100457 ) Lint Pull Request resolved: https://github.com/pytorch/pytorch/pull/100457 Approved by: https://github.com/anijain2305	2023-05-03 02:00:11 +00:00
Edward Z. Yang	d69a1a4491	In detect_fake_mode, assert that all detected fake modes are consistent (#99392 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/99392 Approved by: https://github.com/eellison	2023-04-18 15:35:05 +00:00
Edward Z. Yang	5c38c4cfa4	Improve symbolic shapes guard logging (#98941 ) Billing of changes: * Get rid of `print_guards`; instead, you control this with `TORCH_LOGS=torch.fx.experimental.symbolic_shapes`, debug logging toggles stack traces * Don't incorrectly report the tracing context frame when we're compiling; we just don't have this info anymore! (TODO: use the saved frames instead). This is via a new TracingContext.clear_frame context manager * Add TracingContext.extract_stack() which gives you the tracing context stack. * Add ShapeEnvLoggingAdapter to report which ShapeEnv any given operation is from (this is helpful for debugging situations when there are too many ShapeEnvs floating around) * Tweak create_symbol log message to also report Source * Add a debug log whenever duck sizing occurs * Report an excerpt of both the user and system backtrace whenever a guard is added in INFO mode. I found this is a good balance of "where did the guard come from" without full backtrace verbosity. Example log output with the new output: ``` [2023-04-12 08:25:49,003] torch.fx.experimental.symbolic_shapes: [INFO] 0: create_env [2023-04-12 08:25:49,021] torch.fx.experimental.symbolic_shapes: [INFO] 0: create_symbol s0 = 32 for L['x'].size()[0] [2023-04-12 08:25:50,154] torch.fx.experimental.symbolic_shapes: [INFO] 0: evaluate_expr s0 < 128 [guard added] at w.py:11 in forward2 (_dynamo/variables/tensor.py:476 in evaluate_expr) [2023-04-12 08:25:52,057] torch.fx.experimental.symbolic_shapes: [INFO] 0: evaluate_expr Eq(Mod(s0, 16), 0) [guard added] (_inductor/codegen/triton.py:77 in is_aligned) ``` from running ``` import torch import torch._dynamo def f(x, y): return x + y def forward(x, y): return forward2(x, y) def forward2(x, y): if x.size(0) < 128: x = x * 2 else: x = x * 3 r = f(x, y) r = r * y return r def woof(): fn_compiled = torch.compile(forward, dynamic=True) x = torch.randn(32, device='cuda') y = torch.randn(32, device='cuda') print(fn_compiled(x, y)) woof() ``` (To induce the Triton guard, I synthetically reverted https://github.com/pytorch/pytorch/pull/98471) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/98941 Approved by: https://github.com/wconstab	2023-04-12 21:58:59 +00:00
Edward Z. Yang	9abae6ae32	Make all Source subclasses frozen. (#98737 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/98737 Approved by: https://github.com/albanD	2023-04-10 17:51:10 +00:00
Edward Z. Yang	d01ee10b25	Add detect_fake_mode (#98321 ) This replaces fake_mode_from_tensors but it preferentially looks for fake_mode in TracingContext and also if there is an active fake mode on the dispatch stack, before groveling in tensors to find it. This advances PegasusForCausalLM, which was previously failing because we generated a graph that had a parameter (non-fake) and a SymInt, and thus previously we failed to detect the correct fake mode. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/98321 Approved by: https://github.com/voznesenskym	2023-04-05 22:15:16 +00:00
Will Constable	c1a6dde79e	Make dynamo-FSDP skip guards (#97463 ) Create a new GuardSource for FSDP modules, and use it to opt out of guard installation. Based on @awgu's work in https://github.com/pytorch/pytorch/pull/97091 Pull Request resolved: https://github.com/pytorch/pytorch/pull/97463 Approved by: https://github.com/voznesenskym, https://github.com/jansel, https://github.com/awgu	2023-03-28 04:04:34 +00:00
Michael Voznesensky	f9ce593267	Extend aot autograd dedup guards to params, stop using positions (#96774 ) The purpose of this PR is to remove reliance on argument positions in dedup guards, AND extend the functionality to params. A version of this PR was stamped prior https://github.com/pytorch/pytorch/pull/95831 - but was kinda gross, because it was based on an underlying PR that did way too much with source names. This PR leaves most of that alone, in favor of just reusing the same name standardization logic that dynamo module registration does. Pull Request resolved: https://github.com/pytorch/pytorch/pull/96774 Approved by: https://github.com/ezyang	2023-03-21 05:59:33 +00:00
Avik Chaudhuri	e4e761b277	record caller frame instead of function frame (#96882 ) Previously, when starting to trace a function, we would record a frame summary recording the definition loc. This would lead to an unconventional-looking stack trace when used for debugging, e.g., shape guards. ``` File ".../scripts/avik/pt2/example.py", line 407, in forward def forward(self, x): ... File ".../transformers/models/bert/modeling_bert.py", line 912, in forward @add_start_docstrings_to_model_forward(BERT_INPUTS_DOCSTRING.format("batch_size, sequence_length")) ... File ".../transformers/models/bert/modeling_bert.py", line 562, in forward def forward( ... File ".../transformers/models/bert/modeling_bert.py", line 484, in forward def forward( ... File ".../transformers/models/bert/modeling_bert.py", line 416, in forward def forward( ... File ".../transformers/models/bert/modeling_bert.py", line 275, in forward def forward( ... File ".../transformers/models/bert/modeling_bert.py", line 351, in forward attention_scores = attention_scores + attention_mask ``` As noted in https://github.com/pytorch/pytorch/pull/95848#discussion_r1134397096, we would like to change this to record function calls instead, like conventional stack traces do. This diff makes this change. The above stack now looks like the following, which is way more helpful at a glance to understand what's going on. ``` File ".../scripts/avik/pt2/example.py", line 408, in forward bert_out = self.bert(**x) ... File ".../transformers/models/bert/modeling_bert.py", line 1021, in forward encoder_outputs = self.encoder( ... File ".../transformers/models/bert/modeling_bert.py", line 610, in forward layer_outputs = layer_module( ... File ".../transformers/models/bert/modeling_bert.py", line 496, in forward self_attention_outputs = self.attention( ... File ".../transformers/models/bert/modeling_bert.py", line 426, in forward self_outputs = self.self( ... File ".../transformers/models/bert/modeling_bert.py", line 351, in forward attention_scores = attention_scores + attention_mask ``` Differential Revision: [D44101882](https://our.internmc.facebook.com/intern/diff/D44101882/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/96882 Approved by: https://github.com/ezyang	2023-03-17 00:06:16 +00:00
Avik Chaudhuri	178d2a38e0	debug shape guards (#95848 ) Adds logging when shape guards are added and when symbols are specialized to constants. Differential Revision: [D43719743](https://our.internmc.facebook.com/intern/diff/D43719743/) Differential Revision: [D43719743](https://our.internmc.facebook.com/intern/diff/D43719743) Pull Request resolved: https://github.com/pytorch/pytorch/pull/95848 Approved by: https://github.com/ezyang	2023-03-14 16:05:28 +00:00
Michael Voznesensky	d7db5b05b4	Context manager to push/pop frame summaries (#96054 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/96054 Approved by: https://github.com/avikchaudhuri, https://github.com/ezyang	2023-03-08 04:01:49 +00:00
Andrew Gu	cbac56e244	[BE] Simplify `Source.is_nn_module`; add some types (#95292 ) I am still reading Dynamo source code... This is an easy PR to simplify `Source.is_nn_module()` to reuse `GuardSource.is_nn_module()` instead of having the `in (...)` check implemented twice. While simplifying that, I thought I might as well add some type annotations for `Source` methods. Pull Request resolved: https://github.com/pytorch/pytorch/pull/95292 Approved by: https://github.com/ezyang	2023-02-22 22:33:58 +00:00
Edward Z. Yang	89e16c4f18	Assume sympy is always installed (#94903 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/94903 Approved by: https://github.com/Skylion007, https://github.com/malfet	2023-02-16 14:09:58 +00:00
Edward Z. Yang	f8740db410	Properly resolve source_ref when constructing shape guards (#91058 ) Whenever you guard on something, you're supposed to tell GuardBuilder about it, so GuardBuilder knows that it has to actually bind it in scope when it creates the guard function. But shape env guards bypass that mechanism completely. Well, now they don't. For the most part, this didn't matter in practice, because we usually had a `TENSOR_MATCH` guard floating around that made sure that the guard stayed live. But if we ever eliminate those guards (e.g., because we build it into the shape guard directly; something we'll probably want to do when https://github.com/pytorch/pytorch/pull/89707 goes online) then this will indeed matter. One complication: some of the shape env guards are on globals. You have to make sure to shunt the usage to the correct guard builder in that case. Maybe it would be better if we refactored things so there is only one GuardBuilder. Not sure. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/91058 Approved by: https://github.com/voznesenskym	2022-12-30 05:56:56 +00:00
Edward Z. Yang	bcf15cd93b	Store source, not sname, in Symbol (#91057 ) I'm going to need this in the follow up PR. Instead of storing only Source.name() in Symbol, I now store a full on Source. Lots of replumbing reoccurs. In particular: - Move Source to torch._guards to break cycles - I have to add TensorPropertySource and NegateSource to handle x.size()[0] and -x codegen that I was doing with string manipulation previously - I tighten up invariants so that I never pass source=None; instead I pass ConstantSource (these are constant sources right) and test for that rather than source being missing. I think this is more parsimonious - Some mypy wobbles from new imports I didn't move LocalSource and friends to torch._guards, but I ended up needing to access them in a few places. The main annoyance with moving these is that then I also need to move the bytecode codegen stuff, and that's not so easy to move without bringing in the kitchen sink. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/91057 Approved by: https://github.com/albanD, https://github.com/voznesenskym, https://github.com/zou3519	2022-12-30 05:56:56 +00:00
PyTorch MergeBot	b68fd7e319	Revert "Store source, not sname, in Symbol (#91057 )" This reverts commit `88c581be87`. Reverted https://github.com/pytorch/pytorch/pull/91057 on behalf of https://github.com/atalman due to causing internal build failures	2022-12-21 22:33:15 +00:00
Edward Z. Yang	88c581be87	Store source, not sname, in Symbol (#91057 ) I'm going to need this in the follow up PR. Instead of storing only Source.name() in Symbol, I now store a full on Source. Lots of replumbing reoccurs. In particular: - Move Source to torch._guards to break cycles - I have to add TensorPropertySource and NegateSource to handle x.size()[0] and -x codegen that I was doing with string manipulation previously - I tighten up invariants so that I never pass source=None; instead I pass ConstantSource (these are constant sources right) and test for that rather than source being missing. I think this is more parsimonious - Some mypy wobbles from new imports I didn't move LocalSource and friends to torch._guards, but I ended up needing to access them in a few places. The main annoyance with moving these is that then I also need to move the bytecode codegen stuff, and that's not so easy to move without bringing in the kitchen sink. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/91057 Approved by: https://github.com/albanD, https://github.com/voznesenskym	2022-12-21 04:51:51 +00:00
Edward Z. Yang	57390116e0	Restructure ShapeEnv so it uses GuardBuilder.SHAPE_ENV directly (#91055 ) The idea is to make ShapeEnv guards less of a one-off special snowflake, and integrate it more closely with the regular builder infrastructure. But it is not so easy: the shape env code has to live after tensor match code, because we need to know that the values in question are tensors before we start matching on them. So we introduce a new `shape_env_code` field to put the special shape env code, so we can add it to the final constructed code after tensor. Everything else works the obvious way. There's a new ShapeEnvSource for constructing the singleton SHAPE_ENV guard that drives the shape env guard construction. I added some more docs and also made the printed code for guards include the enclosing lambda for more clarity. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/91055 Approved by: https://github.com/albanD, https://github.com/voznesenskym	2022-12-21 03:50:47 +00:00
Michael Voznesensky	b72caf311d	Introduce guardexpr, aot autograd guarding of duplicates into torch._guards (#90955 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/90955 Approved by: https://github.com/ezyang	2022-12-18 03:05:47 +00:00
Michael Voznesensky	53e71fad8f	Add shape_env guards to tracing context (#90876 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/90876 Approved by: https://github.com/Chillee, https://github.com/ezyang	2022-12-16 09:05:05 +00:00
Edward Z. Yang	eef019c14a	Lint rule to forbid direct use of logging.info/etc APIs (#90907 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/90907 Approved by: https://github.com/jansel	2022-12-16 05:13:51 +00:00
Michael Voznesensky	6c8ef6a4c2	Add tracing context, Integrate dynamo guards into torch._guards (#90647 ) As defined here: https://docs.google.com/document/d/1oniZEgAaHE1IMByPRWRKbUHeaW06E2HMfCTCQyMRLek/edit# This PR creates a new structure, a TracingContext, whose lifecycle matches that of the traced frame. It carries on it a GuardsContext, and eventually, a FakeTensorMode. It is the source of truth of all accumulated guards. In this PR, we create the structure, and integrate it into dynamo. We do so by mapping OutputGraph's guards structure to its guard structure. Pull Request resolved: https://github.com/pytorch/pytorch/pull/90647 Approved by: https://github.com/ezyang	2022-12-14 07:35:32 +00:00
Michael Voznesensky	5adc18dcbc	Shape guard structure (#90679 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/90679 Approved by: https://github.com/ezyang	2022-12-12 09:50:00 +00:00
Michael Voznesensky	11442accc6	Make torch._guards, shuffle structures around for migration (#90636 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/90636 Approved by: https://github.com/ezyang	2022-12-11 23:16:07 +00:00
PyTorch MergeBot	15a4c60383	Revert "Make torch._guards, shuffle structures around for migration (#90636 )" This reverts commit `933b6c4eed`. Reverted https://github.com/pytorch/pytorch/pull/90636 on behalf of https://github.com/huydhn due to Breaking lint on master. Please rebase and run lintrunner -a before re-merging the PR	2022-12-11 10:15:47 +00:00
Michael Voznesensky	933b6c4eed	Make torch._guards, shuffle structures around for migration (#90636 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/90636 Approved by: https://github.com/ezyang	2022-12-11 06:04:17 +00:00

1 2 3

144 Commits