pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Shivam Raikundalia	0b81f700aa	[PT2/Profiler] Add Context Info to Torch-Compiled Regions (#132765 ) Summary: We want to add compile IDs and frames to each Torch-Compiled Region in order to help users cross reference the section they are checking alongside data obtained from tools, such as tlparse. This diff operates on the assumption that each graph section will enter and exit a CompileContext before it is ran to either compile the graph or look it up in the cache. Based on this assuption, we can save the value of the graph section from the exited CompileContext in eval_frame.c using a Python C API. After this, we can create a new interface in cpp shim to wrap around the record_function in order to pass in the new keyword argument for "context". Test Plan: Enhance test_profiler_dynamo_compiled_region to look for kwinputs as well as a name to see that the context is now labeled. Also changed test to run graph with more contexts so that we test a wider range of profiling. Differential Revision: D60803317 Pull Request resolved: https://github.com/pytorch/pytorch/pull/132765 Approved by: https://github.com/anijain2305	2024-08-27 04:55:04 +00:00
Animesh Jain	fee677eeb6	[fbode-testing][dynamo][reland][inline-inbuilt-nn-modules] Mark attri… (#134136 ) Shuai wants to test this internally before https://github.com/pytorch/pytorch/pull/133713 can go in. Creating a separate PR for ghmport. Pull Request resolved: https://github.com/pytorch/pytorch/pull/134136 Approved by: https://github.com/yanboliang	2024-08-22 17:54:58 +00:00
PyTorch MergeBot	68425e68fe	Revert "[dynamo][reland][inline-inbuilt-nn-modules] Mark attributes of nn mod… (#133714 )" This reverts commit `e8d3c4be36`. Reverted https://github.com/pytorch/pytorch/pull/133714 on behalf of https://github.com/anijain2305 due to fails internally ([comment](https://github.com/pytorch/pytorch/pull/133714#issuecomment-2302171472))	2024-08-21 14:21:06 +00:00
Animesh Jain	e8d3c4be36	[dynamo][reland][inline-inbuilt-nn-modules] Mark attributes of nn mod… (#133714 ) Relands https://github.com/pytorch/pytorch/pull/132539 Relands https://github.com/pytorch/pytorch/pull/132736 Pull Request resolved: https://github.com/pytorch/pytorch/pull/133714 Approved by: https://github.com/jansel	2024-08-20 05:57:52 +00:00
Edward Z. Yang	90d2593b3e	Revert #132806 , #132736 , #132539 , #132487 (#133570 ) This reverts commit `25df063f04`. This reverts commit `de00c79583`. This reverts commit `419b76c4ac`. This reverts commit `bc57d5b6ff`. Differential Revision: [D61335013](https://our.internmc.facebook.com/intern/diff/D61335013) Pull Request resolved: https://github.com/pytorch/pytorch/pull/133570 Approved by: https://github.com/albanD, https://github.com/jansel, https://github.com/anijain2305	2024-08-15 20:54:21 +00:00
Animesh Jain	de00c79583	[dynamo][inline_inbuilt_nn_modules] Mark nn module tensor static for cudagraphs (#132736 ) Fixes https://github.com/pytorch/pytorch/issues/132714 Pull Request resolved: https://github.com/pytorch/pytorch/pull/132736 Approved by: https://github.com/mlazos ghstack dependencies: #132538	2024-08-06 20:13:28 +00:00
Xuehai Pan	4226ed1585	[BE] Format uncategorized Python files with `ruff format` (#132576 ) Remove patterns ``, `test/`, and `torch/**` in `tools/linter/adapters/pyfmt_linter.py` and run `lintrunner`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/132576 Approved by: https://github.com/ezyang, https://github.com/Skylion007 ghstack dependencies: #132574	2024-08-04 17:13:31 +00:00
Oguz Ulgen	72d2dba992	Add None return type to init (#132335 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/132335 Approved by: https://github.com/albanD	2024-08-01 15:26:45 +00:00
Animesh Jain	612ea35395	[dynamo] Introduce UnspecializedBuiltinNNModuleSource (#132312 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/132312 Approved by: https://github.com/yanboliang ghstack dependencies: #132302, #132304	2024-08-01 06:21:05 +00:00
Animesh Jain	bcd1d2e832	[dynamo] Introduce UnspecializedNNModule guard source (#132304 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/132304 Approved by: https://github.com/yanboliang ghstack dependencies: #132302	2024-08-01 04:35:43 +00:00
Animesh Jain	e772547d70	[dynamo][rename/refactor] Rename guard_source NN_MODULE to SPECIALIZED_NN_MODULE (#132302 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/132302 Approved by: https://github.com/yanboliang	2024-08-01 04:35:43 +00:00
Xuehai Pan	973037be6a	[BE][Easy] apply autofix for ruff rules unnecessary-collection-call (C408): `list()` / `tuple()` / `dict()` (#130199 ) This PR changes the empty collection factory call to Python literals: - `list()` -> `[]` - `tuple()` -> `()` - `dict()` -> `{}` The Python literals are more performant and safer. For example, the bytecode for building an empty dictionary: ```bash $ python3 -m dis - <<EOS import collections d1 = {} d2 = dict() dict = collections.OrderedDict d3 = dict() EOS ``` ```text 0 0 RESUME 0 1 2 LOAD_CONST 0 (0) 4 LOAD_CONST 1 (None) 6 IMPORT_NAME 0 (collections) 8 STORE_NAME 0 (collections) 3 10 BUILD_MAP 0 12 STORE_NAME 1 (d1) 4 14 PUSH_NULL 16 LOAD_NAME 2 (dict) 18 CALL 0 26 STORE_NAME 3 (d2) 6 28 LOAD_NAME 0 (collections) 30 LOAD_ATTR 8 (OrderedDict) 50 STORE_NAME 2 (dict) 7 52 PUSH_NULL 54 LOAD_NAME 2 (dict) 56 CALL 0 64 STORE_NAME 5 (d3) 66 RETURN_CONST 1 (None) ``` The dict literal `{}` only has one bytecode `BUILD_MAP`, while the factory call `dict()` has three `PUSH_NULL + LOAD_NAME + CALL`. Also, the factory call is not safe if users override the `dict` name in `locals` or `globals` (see the example of replacing with `OrderedDict` above). Pull Request resolved: https://github.com/pytorch/pytorch/pull/130199 Approved by: https://github.com/malfet	2024-07-11 17:30:28 +00:00
Oguz Ulgen	54b0006cb2	Evaluate symexprs on load path of cache not write (#128997 ) When caching is enabled, an internal model fails with ``` assert_size_stride(bmm_9, (17, s0, 512), (54784, 512, 1)) AssertionError: expected size 17==17, stride 57344==54784 at dim=0 ``` looking at this model, the exact problem is when the cache is hit on the forward graph, the generated code for backward fails since the strides of the outputs of forward, passed to backward as inputs, are not what we expected. This PR changes the evaluation logic so that we defer evaluation of output stride exprs to load path as opposed to eagerly doing it on save path. I have not been able to come up with a unit test repro for this problem. Differential Revision: [D58796503](https://our.internmc.facebook.com/intern/diff/D58796503) Pull Request resolved: https://github.com/pytorch/pytorch/pull/128997 Approved by: https://github.com/ezyang	2024-06-20 08:55:12 +00:00
Xuehai Pan	dd143d44cc	[BE] enable UFMT for top-level files `torch/*.py` (#127707 ) Part of #123062 - #123062 Pull Request resolved: https://github.com/pytorch/pytorch/pull/127707 Approved by: https://github.com/ezyang	2024-06-12 20:15:05 +00:00
Aaron Orenstein	ea614fb2b1	Flip default value for mypy disallow_untyped_defs [2/11] (#127839 ) See #127836 for details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127839 Approved by: https://github.com/oulgen	2024-06-08 18:23:08 +00:00
Aaron Gokaslan	1dd42e42c4	[BE]: Try TCH autofixes on torch/ (#125536 ) Tries TCH autofixes and see what breaks Pull Request resolved: https://github.com/pytorch/pytorch/pull/125536 Approved by: https://github.com/ezyang	2024-05-05 23:13:59 +00:00
Animesh Jain	37c993546d	[dynamo][guards] Bug fix for set_export_info (#125275 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/125275 Approved by: https://github.com/yanboliang	2024-05-01 03:46:26 +00:00
Edward Z. Yang	64491c0811	Restore CompileContext as well in backwards (#124626 ) This should fix many of the unknown compile id problems currently afflicting tlparse backwards analysis. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/124626 Approved by: https://github.com/bdhirsh	2024-04-23 14:39:52 +00:00
Xuehai Pan	93e249969b	[BE] enable `ruff` rule `RSE` and remove useless parentheses in `raise` statements (#124261 ) Remove useless parentheses in `raise` statements if the exception type is raised with no argument. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124261 Approved by: https://github.com/albanD	2024-04-17 19:29:34 +00:00
Jason Ansel	11e6f84ad8	[dynamo] Graph break on uninitialized nn.Module (#123790 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/123790 Approved by: https://github.com/anijain2305 ghstack dependencies: #123700, #123705, #123786	2024-04-12 19:03:13 +00:00
Brian Hirsh	134e56fa33	inductor: log unique id to match output_code to aot graphs (#118647 ) I found it helpful to be able to see, given some inductor output code, which AOT graph it came from. When you have large models with multiple graphs floating around this can be difficult, so I added the aot_config.aot_id to the printed inductor output. Pull Request resolved: https://github.com/pytorch/pytorch/pull/118647 Approved by: https://github.com/ezyang	2024-04-11 14:37:07 +00:00
Animesh Jain	1346ebf12e	[dynamo][guards] Delay DUPLICATE_INPUT guard because of incorrect ordering (#123605 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/123605 Approved by: https://github.com/jansel ghstack dependencies: #123606	2024-04-10 07:30:02 +00:00
Jason Ansel	1e9a7df8fe	[dynamo] Compile time optimizations in tx.step() (#121790 ) `python benchmarks/dynamo/microbenchmarks/dynamo_microbenchmarks.py` - Before: `symbolic_convert_overhead_stress_test: 10.7s` - After: `symbolic_convert_overhead_stress_test: 8.6s` `tx.step()` is a small part of that benchmark, so likely the speedup in that isolated function is larger than the top line. Pull Request resolved: https://github.com/pytorch/pytorch/pull/121790 Approved by: https://github.com/oulgen	2024-03-15 01:01:05 +00:00
Jason Ansel	7cc476ea16	[dynamo] Fix support for nn.Parameter constructor (part 1) (#120163 ) This captures calls to `torch.nn.Parameter` by lifting them to graph inputs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120163 Approved by: https://github.com/albanD, https://github.com/yanboliang ghstack dependencies: #121086	2024-03-11 05:14:42 +00:00
Joel Schlosser	dad1b76584	Introduce EphemeralSource for symbols that should be simplified out (#120948 ) Context: view fake-ification should handle closed-over state in ViewFuncs for use in view replay by: * fake-ifying tensors * symbolicizing SymInts This avoids invalid specialization during view replay. However, the symbols / tensors created as intermediates in the view chain should not stick around or be guarded on. This PR introduces an `EphemeralSource` intended to be used as a source for this purpose. It has the following properties: * Considered first to be simplified out in symbol simplification logic * Errors if guarded on Differential Revision: [D54561597](https://our.internmc.facebook.com/intern/diff/D54561597) Pull Request resolved: https://github.com/pytorch/pytorch/pull/120948 Approved by: https://github.com/ezyang	2024-03-06 02:30:52 +00:00
Sam Larsen	06f8af30fa	Change FakeTensor serialization to consider only an _active_ FakeTensor mode (#120848 ) Summary: https://github.com/pytorch/pytorch/pull/108186 make some changes related to FakeTensor serialization such that saving and loading a tensor will give us a meta tensor, even if FakeTensor mode is not enabled. This means we can't properly save and load Tensors as part of Fx graph caching. This PR changes the logic to check if there's an _active_ FakeTensor mode. Test Plan: * New unit tests * Validated unit tests introduced in https://github.com/pytorch/pytorch/pull/108186 still pass Pull Request resolved: https://github.com/pytorch/pytorch/pull/120848 Approved by: https://github.com/eellison, https://github.com/thiagocrepaldi	2024-03-01 02:37:21 +00:00
Elias Ellison	d03b11ad5b	Pass inductor strides forward in ddp optimizer (#120523 ) # Note: Returning Fake Tensors on First AOT Autograd Call # # Inductor will optimize strides of outputs when it deems it profitable. # For instance, converting to channels last. When we split the graph here # into multiple inductor compilations, we need to make sure that the # output strides of one compilation is appropriately passed to the subsequent # compilations. However, the mapping from inductor output to dynamo output # is non-trivial due to aot_autograd's deduping, de-aliasing, mutation, re-writing, # subclass handling, etc. In order to replay all this logic we set a flag such that # the first invocation of inductor in aot_autograd will return Fake Tensors with # appropriate strides. Then, all of aot autograd's runtime logic is replayed. # This gives us the appropriately strided outputs here which will reflect runtime strides. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120523 Approved by: https://github.com/yf225, https://github.com/bdhirsh	2024-02-29 22:25:00 +00:00
Jason Ansel	01ec8df6d8	[Compiled Autograd] Introduce BackwardState capture (#120382 ) This adds support for backwards hooks that are both: 1) Interior to the graph; and 2) Dynamically generated (e.g. lambdas) We do this by creating a BackwardState object that is used to register the hooks in the forward, then populated by dynamo after the forwards runs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120382 Approved by: https://github.com/xmfan	2024-02-28 20:36:47 +00:00
Animesh Jain	8a59f49da2	[dynamo][compile-time] Collect guard debug stack info only with logs enabled (#120520 ) Reduces backend=eager compile time from 33 to 19 seconds for `MobileBertForQuestionAnswering`. This also helps an internal model where guards.add function is taking 124 seconds. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120520 Approved by: https://github.com/mlazos	2024-02-27 01:51:16 +00:00
Taras Tsugrii	2c8722182e	[dynamo][guards] Avoid unnecessary stack copies. (#119115 ) There is no need to make a `frame_summary_stack` copy in case it's not modified. Proposed change uses copy-on-write functional approach that is easy to understand and is more efficient in case `self.loc_in_frame` is `None` Pull Request resolved: https://github.com/pytorch/pytorch/pull/119115 Approved by: https://github.com/Skylion007	2024-02-10 21:56:00 +00:00
Animesh Jain	0c3a1c893e	[dynamo] Setup the globals for guard_fn without a reference to f_locals (#118447 ) UPDATE - I changed the PR because from discussion with @jansel it was clear that someone else was holding on to a reference to f_locals. This PR now solves that problem first. I removed the eval_frame.c part because it was failing tests that use `exec` or `eval` with weird error like `no no locals found when storing 'math'`. I would debug that in a separate PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/118447 Approved by: https://github.com/Skylion007, https://github.com/jansel ghstack dependencies: #118975, #118420	2024-02-05 05:39:39 +00:00
Taras Tsugrii	41b63b26c2	[dynamo] Fix incorrect docstring placements in _guards.py. (#119114 ) This makes them unavailable when using help and other tools accessing them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/119114 Approved by: https://github.com/kit1980	2024-02-03 06:25:54 +00:00
lezcano	eb2bdfae88	Make variables in dict LazyTrackers (not lazily guarded yet) and avoid using DICT_KEYS guard (#117625 ) Make variables in dict lazy and remove DICT_KEYS guard. We build the keys of a dict depth-first and we rely on the guards of each element in the dict to create the correct guards. This allows us to remove the rather buggy DICT_KEYS guard and make the guard lazy. The guards are not completely lazy yet, as we instantiate them in `_HashableTracker._eq_impl` but it should be possible to make them truly lazy. Also, adding new types to the supported types within keys should be less error prone. This is marginally less efficient when we graph break, but in turn we should graph break much less. It also makes the dicts code easier to maintain (removes `is_hashable_python_var`). Pull Request resolved: https://github.com/pytorch/pytorch/pull/117625 Approved by: https://github.com/jansel, https://github.com/peterbell10, https://github.com/anijain2305 ghstack dependencies: #117982, #118098, #117983	2024-02-02 14:38:08 +00:00
Edward Z. Yang	46712b019d	Enable local_partial_types (#118467 ) When using dmypy, this setting is enabled and cannot be turned off. Force it for regular mypy too. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/118467 Approved by: https://github.com/Skylion007 ghstack dependencies: #118414, #118418, #118432	2024-01-28 13:38:22 +00:00
voznesenskym	081c5b3adc	Add Stateful/Stateless symbolic contexts, use fresh fake mode for dynamo backends (#113926 ) (#114526 ) Summary: The primary problem we are setting out to solve here is fake tensor freshness. Before this PR, fake tensors after dynamo represented fake tensors at the end of trace, so subsequent retraces like aot_autograd would start off with fake tensors in the wrong (end result) state, rather than their expected fresh state. The solution here is to start a fresh fake mode, and re-fakify the tensors. The nuance comes from ensuring that symbols are uniformly created for the symbolic sizes and strides of the tensor. This PR is the result of a lot of back and forth with ezyang and eellison. Initially, the first pass at this was not super different from what we have in the PR - the broad strokes were the same: 1) We cache source->symbol in shape_env 2) We pass policy objects around, stored at dynamo fakificaiton time, and reused for later fakification 3) We create a new fake mode for backends (from https://github.com/pytorch/pytorch/pull/113605/files) This is ugly, and has some layering violations. We detoured our decision making through a few other alternatives. Immutable/mutable fake tensor mode was the most interesting alternative, https://github.com/pytorch/pytorch/pull/113653, and was struck down on concerns of complexity in fake mode combined with it not covering all edge cases. We also detoured on what to do about tensor memoization returning back potentially different tensors than requested, and if that was an anti pattern (it is) we want to hack in with the symbol cache (we don't). We went back to the drawing board here, but with a few concessions: 1) the cache for source->symbol must live outside of shape_env, for both lifecycle, and layering reasons 2) A good amount of work needs to be done to pipe policy around fake_mode and meta_utils correctly, to cover all the cases (ezyang did this) cc penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 aakhundov kadeng imported-using-ghimport Test Plan: Imported from OSS Reviewed By: huydhn, Chillee Differential Revision: D51566250 Pulled By: voznesenskym Pull Request resolved: https://github.com/pytorch/pytorch/pull/114526 Approved by: https://github.com/Chillee, https://github.com/huydhn	2023-11-26 23:40:32 +00:00
PyTorch MergeBot	2f3beb715c	Revert "Add Stateful/Stateless symbolic contexts, use fresh fake mode for dynamo backends (#113926 )" This reverts commit `2ca1119d53`. Reverted https://github.com/pytorch/pytorch/pull/113926 on behalf of https://github.com/DanilBaibak due to Break internal build ([comment](https://github.com/pytorch/pytorch/pull/113926#issuecomment-1822713852))	2023-11-22 12:52:33 +00:00
voznesenskym	2ca1119d53	Add Stateful/Stateless symbolic contexts, use fresh fake mode for dynamo backends (#113926 ) The primary problem we are setting out to solve here is fake tensor freshness. Before this PR, fake tensors after dynamo represented fake tensors at the end of trace, so subsequent retraces like aot_autograd would start off with fake tensors in the wrong (end result) state, rather than their expected fresh state. The solution here is to start a fresh fake mode, and re-fakify the tensors. The nuance comes from ensuring that symbols are uniformly created for the symbolic sizes and strides of the tensor. This PR is the result of a lot of back and forth with @ezyang and @eellison. Initially, the first pass at this was not super different from what we have in the PR - the broad strokes were the same: 1) We cache source->symbol in shape_env 2) We pass policy objects around, stored at dynamo fakificaiton time, and reused for later fakification 3) We create a new fake mode for backends (from https://github.com/pytorch/pytorch/pull/113605/files) This is ugly, and has some layering violations. We detoured our decision making through a few other alternatives. Immutable/mutable fake tensor mode was the most interesting alternative, https://github.com/pytorch/pytorch/pull/113653, and was struck down on concerns of complexity in fake mode combined with it not covering all edge cases. We also detoured on what to do about tensor memoization returning back potentially different tensors than requested, and if that was an anti pattern (it is) we want to hack in with the symbol cache (we don't). We went back to the drawing board here, but with a few concessions: 1) the cache for source->symbol must live outside of shape_env, for both lifecycle, and layering reasons 2) A good amount of work needs to be done to pipe policy around fake_mode and meta_utils correctly, to cover all the cases (@ezyang did this) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113926 Approved by: https://github.com/ezyang, https://github.com/eellison	2023-11-20 23:06:37 +00:00
Jez Ng	5b95715bc0	Make {Tracing,Compile}Context.get() return non-optional type (#113535 ) They are used in many contexts that don't actually check if the returned type is `None`. I have also created `try_get()` for the cases where we do actually want an Optional type returned. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113535 Approved by: https://github.com/ezyang ghstack dependencies: #113412	2023-11-14 04:31:12 +00:00
Jez Ng	a8cf04fd2a	[inductor] Make {output_graph,pad_mm}.py pass follow_imports typechecking (#113413 ) I changed OutputGraph.nn_modules' type to `Dict[str, Any]` because it seems that `register_attr_or_module` can populate it with essentially any type. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113413 Approved by: https://github.com/Skylion007	2023-11-11 22:15:46 +00:00
Jez Ng	b0ede09682	[inductor] Make pattern_matcher.py pass follow_imports typechecking (#113409 ) Import following reveals that a good number of hints were wrong... Pull Request resolved: https://github.com/pytorch/pytorch/pull/113409 Approved by: https://github.com/Skylion007	2023-11-10 19:58:08 +00:00
Jason Ansel	9664190952	[dynamo] Eagerly install guards (#111415 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111415 Approved by: https://github.com/voznesenskym ghstack dependencies: #111306	2023-11-07 19:55:19 +00:00
Jason Ansel	4b8a5e1854	[dynamo] Remove VariableTracker.as_specialized (#112363 ) My local testing can't seem to find this function actually doing anything. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112363 Approved by: https://github.com/yanboliang	2023-10-30 20:07:55 +00:00
Peter Bell	bbd5b935e4	Use `pytree.tree_leaves` everywhere (#112324 ) This changes all the instances I could find of `tree_flatten(...)[0]` or `x, _ = tree_flatten` to use `tree_leaves`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112324 Approved by: https://github.com/lezcano ghstack dependencies: #112327, #112323	2023-10-30 03:39:04 +00:00
lezcano	c8a5bb451e	Do not import sympy within torch._prims_common (#112034 ) This is the first of a few PRs that avoid importing SymPy at import time. The pitch here is that we (almost!) do not have SymPy on our API, so this should be feasible. This should speed-up torch imports by a good 15% as per https://dev-discuss.pytorch.org/t/delving-into-what-happens-when-you-import-torch/1589 In this PR we just move a few global imports into local imports. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112034 Approved by: https://github.com/ezyang	2023-10-26 12:53:25 +00:00
voznesenskym	9455af58b5	[easy][dynamo] Cleanup guard builder selection (#111723 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111723 Approved by: https://github.com/jon-chuang, https://github.com/jansel	2023-10-21 10:48:32 +00:00
Animesh Jain	58637c4b43	[dynamo] Remove SuperSource (#110475 ) The motivation for removing this is already present in the pre-PR comments. Copying it ~~~ # NB - SuperSource is a weird one. # it is our only source with 2 bases, so we use the objec # as the base, rather than the type, since an invocation # like super(Foo, foo) is represented here, the source object base is more spiritually # aligned with the instance, rather than the type. # This whole construction is questionable tho, and we should probably find a way to # avoid this exception to our otherwise nice source parentage invariant. ~~~ Instead of using super(a, b), we can use `type(b).__mro__[index]`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110475 Approved by: https://github.com/jansel	2023-10-08 04:45:06 +00:00
chilli	005e8ddcb9	cache the hash construction on Guard (#110464 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110464 Approved by: https://github.com/zou3519, https://github.com/voznesenskym	2023-10-04 04:49:18 +00:00
Edward Yang	88600e7d2e	[RELAND] Force synced KJT to trace unbacked SymInt (#108960 ) (#109216 ) Summary: The basic concept behind this diff is to modify Dynamo's tracing behavior when it encounters a KeyedJaggedTensor that is synced (aka has `_length_per_key` and `_offset_per_key` populated). These fields are lists of integers; ordinarily, Dynamo will optimistically try to specialize on integers, however, for KJTs, we know that these integers will definitely vary from run-to-run. Furthermore, ordinarily, we would also specialize these integers if they are 0/1, but we will frequently expect features in KJTs to be 0/1. The fix is to detect KJTs and treat these integers as unbacked integers. This is NOT a universally sound optimization: when treating these integers as unbacked, we never report them as equal to zero or one. In return, we always generate graphs that generalize no matter the length of values on features. This is enough to trace through APS sparse arch, torchrec_dlrm and some small split-cat examples. The special integer behavior is triggered by a dynamically scoped `force_unspec_int_unbacked_size_like` variable on TracingContext, which we trigger when we wrap a KJT. There probably are other ways to do this, but this was simple and worked. Test Plan: ``` buck2 test mode/dev-nosan //pytorch/benchmark/fb/test_gpu:run_test_gpu ``` from aakhundov 1. first build feed_lower_benchmark: ``` buck2 build --show-output mode/opt -c python.package_style=inplace -c fbcode.enable_gpu_sections=true -c fbcode.platform=platform010 -c fbcode.split-dwarf=true hpc/new/models/feed/benchmark:feed_lower_benchmark ``` 2. then run the lowering of the model with it: ``` TORCHINDUCTOR_MAX_AUTOTUNE=1 TORCHINDUCTOR_UNIQUE_KERNEL_NAMES=1 TORCH_LOGS="output_code,graph_code" TORCH_COMPILE_DEBUG=1 ../buck-out/v2/gen/fbcode/79c6b019ee0f9469/hpc/new/models/feed/benchmark/__feed_lower_benchmark__/feed_lower_benchmark.par --load=manifold://ig_inference_model/tree/user/facebook/fblearner/predictor/960999465/60/gpu_lowering/input.predictor --skip-trt --skip-ait --sync-mode=0 --enable-aot-inductor --lower-presets="ig_stories" --gpu-trace ``` cf https://docs.google.com/document/d/1yD30xYrdmM8r2HTdmXnZTg0-MHVexfVrAa0294m1AUE/edit?pli=1#heading=h.qiv3fp7e6zg0 From torchrec: https://www.internalfb.com/intern/wiki/Torchrec/Development/Testing_production_models/ From ge0405 baseline (without your diff): f477293168 your diff: f477292363 ``` buck2 test //caffe2/test/dynamo:test_dynamo_torchrec buck2 run 'fbcode//mode/opt' fbcode//pytorch/benchmark/fb/test_gpu:run_test_gpu -- 'pytorch.benchmark.fb.test_gpu.test_gpu.TestBenchmarkFbGpu.test_train_blue_reels_vdd_v3_inductor_speedup' ``` Differential Revision: D49236757 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109216 Approved by: https://github.com/voznesenskym	2023-09-18 14:39:44 +00:00
PyTorch MergeBot	1d32c9c7f2	Revert "Force synced KJT to trace unbacked SymInt (#108960 )" This reverts commit `f9a250c35b`. Reverted https://github.com/pytorch/pytorch/pull/108960 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/108960#issuecomment-1715850779))	2023-09-12 14:37:36 +00:00
Edward Yang	f9a250c35b	Force synced KJT to trace unbacked SymInt (#108960 ) Summary: The basic concept behind this diff is to modify Dynamo's tracing behavior when it encounters a KeyedJaggedTensor that is synced (aka has `_length_per_key` and `_offset_per_key` populated). These fields are lists of integers; ordinarily, Dynamo will optimistically try to specialize on integers, however, for KJTs, we know that these integers will definitely vary from run-to-run. Furthermore, ordinarily, we would also specialize these integers if they are 0/1, but we will frequently expect features in KJTs to be 0/1. The fix is to detect KJTs and treat these integers as unbacked integers. This is NOT a universally sound optimization: when treating these integers as unbacked, we never report them as equal to zero or one. In return, we always generate graphs that generalize no matter the length of values on features. This is enough to trace through APS sparse arch, torchrec_dlrm and some small split-cat examples. The special integer behavior is triggered by a dynamically scoped `force_unspec_int_unbacked_size_like` variable on TracingContext, which we trigger when we wrap a KJT. There probably are other ways to do this, but this was simple and worked. Test Plan: ``` buck2 test mode/dev-nosan //pytorch/benchmark/fb/test_gpu:run_test_gpu ``` from aakhundov 1. first build feed_lower_benchmark: ``` buck2 build --show-output mode/opt -c python.package_style=inplace -c fbcode.enable_gpu_sections=true -c fbcode.platform=platform010 -c fbcode.split-dwarf=true hpc/new/models/feed/benchmark:feed_lower_benchmark ``` 2. then run the lowering of the model with it: ``` TORCHINDUCTOR_MAX_AUTOTUNE=1 TORCHINDUCTOR_UNIQUE_KERNEL_NAMES=1 TORCH_LOGS="output_code,graph_code" TORCH_COMPILE_DEBUG=1 ../buck-out/v2/gen/fbcode/79c6b019ee0f9469/hpc/new/models/feed/benchmark/__feed_lower_benchmark__/feed_lower_benchmark.par --load=manifold://ig_inference_model/tree/user/facebook/fblearner/predictor/960999465/60/gpu_lowering/input.predictor --skip-trt --skip-ait --sync-mode=0 --enable-aot-inductor --lower-presets="ig_stories" --gpu-trace ``` cf https://docs.google.com/document/d/1yD30xYrdmM8r2HTdmXnZTg0-MHVexfVrAa0294m1AUE/edit?pli=1#heading=h.qiv3fp7e6zg0 From torchrec: https://www.internalfb.com/intern/wiki/Torchrec/Development/Testing_production_models/ From ge0405 baseline (without your diff): f477293168 your diff: f477292363 Differential Revision: D49019987 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108960 Approved by: https://github.com/voznesenskym	2023-09-12 03:44:24 +00:00

1 2

98 Commits