pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Jon Chuang	2ed3a73e40	[dynamo] treat `torch.device`, `torch.dtype` as constant literal; revise guards to have access to `torch` module (#112426 ) Just like e.g. container - list/set of constant literals, these are constant literals. We follow up to https://github.com/pytorch/pytorch/pull/112416, enforcing that we always use `ConstantVariable` to represent these. Replace https://github.com/pytorch/pytorch/pull/112284, https://github.com/pytorch/pytorch/pull/112332 as incomplete, in case there is no movement there. Ought to fix: https://github.com/pytorch/pytorch/issues/109910 We remove old guards special-casing, which fell back on str equality when not having access to `torch` module in `eval` Pull Request resolved: https://github.com/pytorch/pytorch/pull/112426 Approved by: https://github.com/ezyang	2023-11-01 05:28:28 +00:00
Oguz Ulgen	1df14f1bf8	Move has_triton to top level triton utils so that dynamo can also access (#109832 ) it without creating cyclic dependencies Pull Request resolved: https://github.com/pytorch/pytorch/pull/109832 Approved by: https://github.com/zou3519	2023-09-22 19:33:41 +00:00
Animesh Jain	2b6d983b8b	Reland [dynamo][activation checkpointing] Trace through ActivationWrapper (#109327 ) Fixes https://github.com/pytorch/pytorch/issues/108269 Original reverted PR - https://github.com/pytorch/pytorch/pull/108599 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109327 Approved by: https://github.com/aakhundov	2023-09-15 03:43:59 +00:00
CK Luk	366baf690b	Back out "[Dynamo x FSDP] Add support for params, buffers, submodules on FSDPManagedNNModuleVariable (#107923 )" (#108823 ) Summary: Original commit changeset: 33650f7cb0fb Original Phabricator Diff: D48833682 Test Plan: See T162942232 for how we figured out that this diff caused significant numeric difference. Reviewed By: voznesenskym Differential Revision: D49082219 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108823 Approved by: https://github.com/xw285cornell	2023-09-08 14:39:43 +00:00
PyTorch MergeBot	77691e8bc3	Revert "[dynamo][activation checkpointing] Trace through ActivationWrapper (#108599 )" This reverts commit `9efe0f7bf2`. Reverted https://github.com/pytorch/pytorch/pull/108599 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but test_ddp_activation_checkpointing is failing distributed ROCm test in trunk ([comment](https://github.com/pytorch/pytorch/pull/108599#issuecomment-1710479387))	2023-09-07 16:47:40 +00:00
Animesh Jain	9efe0f7bf2	[dynamo][activation checkpointing] Trace through ActivationWrapper (#108599 ) Fixes https://github.com/pytorch/pytorch/issues/108269 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108599 Approved by: https://github.com/rohan-varma	2023-09-07 00:32:18 +00:00
wz337	66af4f6ec7	[HSDP] Add device_mesh to FSDP kwarg and add dtensor state_dict support for HSDP (#107533 ) This PR: 1) Add device_mesh kwarg to FSDP. Remove init_device_mesh() from _runtime_utils.py, as device_mesh would be passed in by user as an kwarg. 2) change use_dtensor flag for state_dict_config and optim_state_dict_config to be private. If device_mesh is used with sharded model/optim state dict, _use_dtensor flag would be set to True and model/optim state dict would return dtensor state_dict. Otherwise, _use_dtensor flag would be set to False and model/optim state dict would return sharded_tensor state_dict. 3) Update _optim_utils.py, _shard_utils.py, and _state_dict_utils.py to add support for HSDP to return 2D DTensor state_dict. Pull Request resolved: https://github.com/pytorch/pytorch/pull/107533 Approved by: https://github.com/fegin, https://github.com/awgu, https://github.com/wanchaol	2023-09-05 21:21:21 +00:00
voznesenskym	f3a8d57aea	[Dynamo x FSDP] Add support for params, buffers, submodules on FSDPManagedNNModuleVariable (#107923 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/107923 Approved by: https://github.com/wconstab	2023-08-29 08:54:13 +00:00
Jason Lu	bc88028e8e	Back out "Reland "Make adding buffers more like adding parameters (#104069 )" (#106224 )" (#106743 ) Summary: Original commit changeset: 81319beb97f3 Original Phabricator Diff: D47961182 Test Plan: revert to maintain backward compat with legacy ads_dper3 production package. Read details in: S357822 Reviewed By: atuljangra Differential Revision: D48131623 @diff-train-skip-merge (D48131623 landed internally) Pull Request resolved: https://github.com/pytorch/pytorch/pull/106743 Approved by: https://github.com/malfet	2023-08-08 15:27:34 +00:00
Mikayla Gawarecki	d8e5f2aa6d	Reland "Make adding buffers more like adding parameters (#104069 )" (#106224 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/106224 Approved by: https://github.com/atalman, https://github.com/albanD	2023-07-31 17:18:56 +00:00
Michael Voznesensky	8549abc347	Grab bag of DTensor enablement stuff (Enable whole graph capture for DTensor) (#105787 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105787 Approved by: https://github.com/ezyang	2023-07-30 00:17:45 +00:00
Edward Z. Yang	edebdaf182	Change _dynamo.explain to be explain(f)(args, *kwargs) (#106066 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/106066 Approved by: https://github.com/wanchaol, https://github.com/voznesenskym	2023-07-27 03:21:52 +00:00
Andrey Talman	c6653b65d8	Back out "Make adding buffers more like adding parameters (#104069 )" (#105581 ) Summary: D47537831 is breaking pyper tests: https://fb.workplace.com/groups/802176577445480/posts/1018902842439518/ with `TypeError: register_buffer() takes 3 positional arguments but 4 were given` Original commit changeset: d4b4069fbd38 Original Phabricator Diff: D47537831 Test Plan: ``` buck2 run //caffe2/torch/fb/training_toolkit/integration_tests/training_lifecycle/cogwheel_tests/pyper_release_v2:cogwheel_smallworld_inline_cvr_infer_pyper_pyper__canary_offline_training-launcher -- --run-harness-in-tupperware --build-fbpkg ads_dper3 --build-fbpkg training_platform ``` Reviewed By: atalman Differential Revision: D47600140 Pull Request resolved: https://github.com/pytorch/pytorch/pull/105581 Approved by: https://github.com/mikaylagawarecki	2023-07-20 03:39:53 +00:00
ekamiti	32d422f335	Make adding buffers more like adding parameters (#104069 ) Add similar semantics for creating a buffer object similar to creating a parameter. This is done by introducing a new `Buffer` class that can be used for type disambiguation. The underlying functionality of registering a buffer remains the same as the `register_buffer` method has not been changed. The `persistent` parameter in the `Buffer` type is to indicate whether a buffer object should be persistent or not. Other non-test changes have to do with getting the new `Buffer` type recognized by inductor and dynamo. Remaining changes are test changes to make sure that the `Buffer` type can be used as a drop in replacement for `register_buffer` as it just leads to `register_buffer` being called. The addition of this new functionality still allows for normal tensors to be used as buffers so these changes are intended to be backwards compatible. Fixes #35735 Pull Request resolved: https://github.com/pytorch/pytorch/pull/104069 Approved by: https://github.com/mikaylagawarecki	2023-07-17 17:59:05 +00:00
Jack Taylor	c9a806be28	[ROCm] enable additional inductor/dynamo UTs (#104624 ) Enables additional inductor UTs on ROCm and un skips outdated skips. I have also removed a group of failures in `test_torchinductor_opinfo` which are now passing for CUDA and ROCm ``` - # The following 3 tests fail on CUDA with AssertionError: expected size 5==5, stride 5==1 at dim=0 - # linalg._svd's return value has different strides on CUDA vs CPU which causes this - # In test_meta.py there is a mechanism to skipping strides checks for some ops - # (including _linalg_svd), possibly we should have something similar here - "linalg.cond": {f32, f64}, - "linalg.svdvals": {f32, f64}, - "linalg.matrix_rank": {f32, f64}, - "linalg.svd": {f32, f64}, ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/104624 Approved by: https://github.com/malfet	2023-07-11 20:44:02 +00:00
Animesh Jain	d0e5c681f5	[dynamo][ddp][ac] Fallback to single bucket when higher order op (#104639 ) This helps unblock an internal model. The real fix requires lot of work, which might question the alternate approach of partitioning AOT graphs instead of Dynamo graphs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104639 Approved by: https://github.com/wconstab	2023-07-06 02:20:15 +00:00
Animesh Jain	75dab587ef	[dynamo] FSDP + AC + torch.compile (#103953 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/103953 Approved by: https://github.com/wanchaol	2023-06-24 01:40:56 +00:00
Jack Taylor	ede1965f5d	Enable additional inductor test suites on ROCm (#102270 ) Enables additional inductor UTs on ROCm, following from https://github.com/pytorch/pytorch/pull/100981 Pull Request resolved: https://github.com/pytorch/pytorch/pull/102270 Approved by: https://github.com/malfet	2023-06-22 00:36:35 +00:00
Mark Saroufim	95fced4483	Pretty dataclass dynamo explain (#102869 ) Also thinking out loud: maybe we only print graph break reasons? And for the rest we have a verbose print which prints everything? TODO: some tests are failing based on what they expect a guard string to look like, easy to fix i'll do it early next week # After ``` (sourcetorch) ubuntu@ip-172-31-1-136:~/test$ python pretty.py BREAK Graph Count: 2 Graph Break Count: 1 Op Count: 2 Break Reasons: Break Reason 1: Reason: call_function BuiltinVariable(print) [ConstantVariable(str)] {} User Stack: <FrameSummary file /home/ubuntu/test/pretty.py, line 6 in fn> Ops per Graph: Ops 1: <built-in function add> Ops 2: <built-in function add> Out Guards: Guard 1: Name: '' Source: global Create Function: GRAD_MODE Guard Types: ['GRAD_MODE'] Code List: ['___is_grad_enabled()'] Object Weakref: None Guarded Class Weakref: None Guard 2: Name: '' Source: global Create Function: DEFAULT_DEVICE Guard Types: ['DEFAULT_DEVICE'] Code List: ['utils_device.CURRENT_DEVICE == None'] Object Weakref: None Guarded Class Weakref: None Guard 3: Name: "G['print']" Source: global Create Function: BUILTIN_MATCH Guard Types: None Code List: None Object Weakref: None Guarded Class Weakref: None Guard 4: Name: '' Source: global Create Function: DETERMINISTIC_ALGORITHMS Guard Types: ['DETERMINISTIC_ALGORITHMS'] Code List: ['not ___are_deterministic_algorithms_enabled()'] Object Weakref: None Guarded Class Weakref: None Guard 5: Name: "L['x']" Source: local Create Function: TENSOR_MATCH Guard Types: None Code List: None Object Weakref: None Guarded Class Weakref: None Guard 6: Name: '' Source: global Create Function: GRAD_MODE Guard Types: ['GRAD_MODE'] Code List: ['___is_grad_enabled()'] Object Weakref: None Guarded Class Weakref: None Guard 7: Name: '' Source: global Create Function: DEFAULT_DEVICE Guard Types: ['DEFAULT_DEVICE'] Code List: ['utils_device.CURRENT_DEVICE == None'] Object Weakref: None Guarded Class Weakref: None Guard 8: Name: '' Source: global Create Function: DETERMINISTIC_ALGORITHMS Guard Types: ['DETERMINISTIC_ALGORITHMS'] Code List: ['not ___are_deterministic_algorithms_enabled()'] Object Weakref: None Guarded Class Weakref: None Guard 9: Name: "L['x']" Source: local Create Function: TENSOR_MATCH Guard Types: None Code List: None Object Weakref: None Guarded Class Weakref: None Compile Times: TorchDynamo compilation metrics: Function Runtimes (s) ------------------------------ -------------- _compile 0.0164, 0.0035 OutputGraph.call_user_compiler 0.0000, 0.0000 ``` ## Before ``` ('Dynamo produced 2 graphs with 1 graph break and 2 ops', [{Guard(name='print', source=<GuardSource.GLOBAL: 1>, create_fn=<function GuardBuilder.BUILTIN_MATCH at 0x7f92ea5009d0>, is_volatile=False, guard_types=None, code_list=None, obj_weakref=None, guarded_class_weakref=None), Guard(name='x', source=<GuardSource.LOCAL: 0>, create_fn=<function GuardBuilder.TENSOR_MATCH at 0x7f92ea501000>, is_volatile=False, guard_types=['TENSOR_MATCH'], code_list=None, obj_weakref=<weakref at 0x7f9224d28f40; dead>, guarded_class_weakref=<weakref at 0x7f92d81734c0; to 'torch._C._TensorMeta' at 0x540b610 (Tensor)>)}, {Guard(name='x', source=<GuardSource.LOCAL: 0>, create_fn=<function GuardBuilder.TENSOR_MATCH at 0x7f92ea501000>, is_volatile=False, guard_types=['TENSOR_MATCH'], code_list=None, obj_weakref=<weakref at 0x7f9224d5e700; dead>, guarded_class_weakref=<weakref at 0x7f92d81734c0; to 'torch._C._TensorMeta' at 0x540b610 (Tensor)>)}], [GraphModule(), GraphModule()], [[<built-in function add>], [<built-in function add>]], [GraphCompileReason(reason='call_function BuiltinVariable(print) [ConstantVariable(str)] {}', user_stack=[<FrameSummary file <ipython-input-1-9e2ddb639697>, line 6 in fn>]), GraphCompileReason(reason='return_value', user_stack=[<FrameSummary file <ipython-input-1-9e2ddb639697>, line 8 in <graph break in fn>>])], 'Dynamo produced 2 graphs with 1 graph break and 2 ops\n Break reasons: \n\n1. call_function BuiltinVariable(print) [ConstantVariable(str)] {}\n File "<ipython-input-1-9e2ddb639697>", line 6, in fn\n print("BREAK")\n \n2. return_value\n File "<ipython-input-1-9e2ddb639697>", line 8, in <graph break in fn>\n return x\n \nTorchDynamo compilation metrics:\nFunction Runtimes (s)\n------------------------------ --------------\n_compile 0.0418, 0.0084\nOutputGraph.call_user_compiler 0.0001, 0.0001') ``` ## Program ```python import torch import torch._dynamo def fn(x): x = x + 1 print("BREAK") x = x + 1 return x out = torch._dynamo.explain(fn, torch.randn(10)) print(out) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/102869 Approved by: https://github.com/voznesenskym	2023-06-07 22:38:57 +00:00
Aaron Gokaslan	3e2ea32dab	[BE]: Enable ruff rule TRY302 and apply fixes (#101874 ) Removes useless try statements and unreachable code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/101874 Approved by: https://github.com/malfet	2023-05-19 17:30:52 +00:00
Jack Taylor	187eb7ca88	Enable default workflow PyT 2.0 UTs on ROCm stack (#100981 ) PR to enable default workflow PyTorch 2.0 unit tests for the ROCm stack. - Enables all the dynamo unit test suites - Enables some of the inductor unit test suites - `test_config` - `test_cpp_wrapper` (cpu only) - `test_minifier` - `test_standalone_compile` - `test_torchinductor_dynamic_shapes` - `test_torchinductor_opinfo` - `test_torchinductor` - `test_triton_wrapper` - Introduces TEST_WITH_ROCM conditions for unit test skip/fail dictionaries in test_torchinductor_dynamic_shapes.py and test_torchinductor_opinfo.py Note this PR follows on from the discussions for the previous UT enablement PR https://github.com/pytorch/pytorch/pull/97988, we have opted to only enable a few inductor suites at the moment to ease the upstreaming effort as these files are changing very quickly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/100981 Approved by: https://github.com/jithunnair-amd, https://github.com/malfet	2023-05-15 23:45:04 +00:00
Andrew Gu	23de2e0620	[Dynamo] Fix staticmethods for FSDP (#100117 ) This PR fixes capturing static methods for FSDP-managed modules. Previously, if a static method was invoked using `self.<staticmethod>`, then Dynamo would pass `self` twice to the method, causing a graph break due to the method being "unsupported". This PR achieves this by checking for `staticmethod` and using `UserFunctionVariable` instead of `UserMethodVariable`, which handles the correct calling convention. This fixes FSDP + PT2 on HuggingFace's `T5ForConditionalGeneration`, which otherwise reports an error like the following based on the most recent trunk: ``` Output 0 of AsStridedBackward0 is a view of a view which was created in no_grad mode and is being modified inplace with grad mode enabled. ``` This is in reference to the `scores` tensor in `scores += position_bias_masked` ([code](`a0ae2310ec/src/transformers/models/t5/modeling_t5.py (L559)`)). I am not clear if this PR's fix is actually masking a different problem though. I wonder if there are edge cases with respect to Dynamo resuming execution and input mutations. Possibly, this PR only side steps the problem because there is no more recompilation at the static method `_relative_position_bucket()` ([code](`a0ae2310ec/src/transformers/models/t5/modeling_t5.py (L443)`)). In `UserDefinedObjectVariable.var_getattr()`, there is an existing branch: `e5291e633f/torch/_dynamo/variables/user_defined.py (L395-L398)` I am not clear on when this branch can be triggered since if `subobj` is a static method, it still takes the `FunctionTypes` branch: `e5291e633f/torch/_dynamo/variables/user_defined.py (L403-L404)` To preserve backward compatibility, the current version of this PR only modifies this `FunctionTypes` branch to differentiate between `staticmethod` and not `staticmethod`. The PR that added this `FunctionTypes` branch is https://github.com/pytorch/pytorch/pull/92050/, and I checked that the added test `test_torch_distributions_functions()` only exercises the non-`staticmethod` case (since `Independent.log_prob` is not a `staticmethod`). The last commit in `pytorch` that touched the `staticmethod` branch before https://github.com/pytorch/pytorch/pull/92050/ was the move from the `torchdynamo` repo into `pytorch`, so I cannot easily tell which test cases it corresponds to. Pull Request resolved: https://github.com/pytorch/pytorch/pull/100117 Approved by: https://github.com/anijain2305	2023-04-28 14:31:20 +00:00
Aaron Gokaslan	e2a3817dfd	[BE] Enable C419 rule for any all shortcircuiting (#99890 ) Apparently https://github.com/pytorch/pytorch/pull/78142 made torch.JIT allow for simple generator expressions which allows us to enable rules that replace unnecessary list comprehensions with generators in any/all. This was originally part of #99280 but I split it off into this PR so that it can be easily reverted should anything break. Pull Request resolved: https://github.com/pytorch/pytorch/pull/99890 Approved by: https://github.com/justinchuby, https://github.com/kit1980, https://github.com/malfet	2023-04-25 15:02:13 +00:00
Andrew Gu	3c5a825f3c	[AOTAutograd] Fix is-duplicate check in de-dup guard logic (#98932 ) Context The existing check to see if an arg is duped is `if dupe_arg_pos != kept_pos:`. However, this incorrectly considers every arg after a true duped arg to also be a duped arg. Consider `flat_args = [a, b, b, c]`, where indices `1` and `2` are duped. - `add_dupe_map = {0: 0, 1: 1, 2: 1, 3: 2}` - For `dupe_arg_pos=2, kept_pos=1`, `2 != 1`, so the check correctly identifies the second `b` to be a duped arg. - For `dupe_arg_pos=3, kept_pos=2`, `3 != 2`, so the check incorrectly identifies the `c` to be a duped arg. Indeed, if there were more args like `[a, b, b, c, d, e, ...]`, every arg after the second `b` will be considered a duped arg since its `kept_pos` will always be 1 lower than its `dupe_arg_pos`. Overview This PR changes `add_dupe_map` to be implemented as a `List[int]`, where the list index implicitly represents the `dupe_arg_pos` and the list element represents the `kept_pos`. We use a list to have stable in-order iteration and because we know the keys to be in `{0, 1, ..., len(flat_args) - 1}`. With `add_dupe_map` as a list, the `is_dupe_arg` condition is whether the entry in `add_dupe_map` shows a new not-yet-seen index in the iteration. One way to do this is to count the number of unique args so far and compare against that. This closes https://github.com/pytorch/pytorch/issues/98883, where now the guards change from ``` GUARDS ___guarded_code.valid and ___check_type_id(L['self'], 93996836333040) and ___check_obj_id(L['self'], 140119034997536) and not ___are_deterministic_algorithms_enabled() and ___check_tensors(L['x']) and L['self']._buf is L['self']._buf_module._buf and L['self']._buf_module._buf is L['self']._param ``` to without the final incorrect `L['self']._buf_module._buf is L['self']._param` guard. Pull Request resolved: https://github.com/pytorch/pytorch/pull/98932 Approved by: https://github.com/ezyang	2023-04-12 22:22:50 +00:00
Andrew Gu	c9adc4c376	[Dynamo] De-dup graph inputs (#98775 ) ### Overview This PR de-duplicates graph inputs in TorchDynamo, using the `Source` as the unique identifier for each input. This closes https://github.com/pytorch/pytorch/issues/98743 and https://github.com/pytorch/pytorch/issues/98625. ### Details `VariableBuilder.wrap_tensor()` should return a `VariableTracker` for the passed-in `value: Tensor`. If `value` is duplicated, we should avoid calling `OutputGraph.create_graph_input()` and `OutputGraph.add_grapharg()`. - Note that `create_graph_input()` and `add_grapharg()` are not 1:1. For a constant source and either `wrap_sym()` or `wrap_unspecialized_primitive()`, TorchDynamo still calls `create_graph_input()` but not `add_grapharg()`. - Note that `create_graph_input()` should be called before constructing the corresponding `VariableTracker`. TorchDynamo needs the `fx.Proxy` object to pass to `wrap_fx_proxy()`. In this PR, the `OutputGraph` saves an additional mapping `input_source_to_var` from each graph input's `Source` to its `VariableTracker`, which works because `Source` is now hashable. This mapping should be updated each time `create_graph_input()` is called. However, since we must construct the `VariableTracker` after `create_graph_input()` returns, we must have a separate call to the `OutputGraph` to update the mapping. If anyone has any suggestion on how to coalesce this logic and avoid having to remember to update `input_source_to_var` for each `create_graph_input()`, I would love to hear it. <details> <summary> Alternate Approach</summary> Initially, I tried having TorchDynamo construct a new but equivalent `VariableTracker` for the duplicated tensor. However, I abandoned this approach after hitting an assertion in `def wrap_fx_proxy_cls()` due to `"example_value"` already being in the proxy node's metadata because we were reusing the primary tensor's `Proxy` object. Reusing the exact `VariableTracker` also seems less error-prone instead of requiring constructing a new but identical `VariableTracker`. </details> ### Testing #### Global Variable Test ``` import torch @torch.compile() def f(): return x + x x = torch.randn(3) f() ``` Before: ``` ====== Forward graph 0 ====== <eval_with_key>.6 class <lambda>(torch.nn.Module): def forward(self, arg0_1: f32[3], arg1_1: f32[3]): # File: /data/users/ezyang/b/pytorch/ff.py:5, code: return x + x add: f32[3] = torch.ops.aten.add.Tensor(arg0_1, arg1_1); arg0_1 = arg1_1 = None return (add,) ``` After (only `arg0_1` and no more `arg1_1`): ``` ====== Forward graph 0 ====== <eval_with_key>.4 class <lambda>(torch.nn.Module): def forward(self, arg0_1: f32[3]): # File: dynamo/test_dup_global.py:8, code: return x + x add: f32[3] = torch.ops.aten.add.Tensor(arg0_1, arg0_1); arg0_1 = None return (add,) ``` #### FSDP Test Before we error on ``` File "/.../pytorch/torch/_guards.py", line 244, in __post_init__ assert self.input_source_a != self.input_source_b ``` and now there is no error. --- The rename from `name_to_input` to `input_name_to_proxy` is not part of the core logic change and is a remnant from initial attempts. I can undo it later if desired, but I also feel that the new name is more informative. It also fixes the type annotation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/98775 Approved by: https://github.com/ezyang, https://github.com/voznesenskym	2023-04-11 18:07:20 +00:00
Michael Voznesensky	b1e60bfb6a	Pass f_locals as a dict rather than kwargs (#98107 ) Fixes https://github.com/pytorch/pytorch/issues/97688 One big problem is that instead of printing x < y we now print `E["x"] < E["y"]` and now all of the tests wobbled and I'm mad. Signed-off-by: Edward Z. Yang <ezyangmeta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/98107 Approved by: https://github.com/ezyang	2023-04-04 00:30:08 +00:00
Will Constable	c1a6dde79e	Make dynamo-FSDP skip guards (#97463 ) Create a new GuardSource for FSDP modules, and use it to opt out of guard installation. Based on @awgu's work in https://github.com/pytorch/pytorch/pull/97091 Pull Request resolved: https://github.com/pytorch/pytorch/pull/97463 Approved by: https://github.com/voznesenskym, https://github.com/jansel, https://github.com/awgu	2023-03-28 04:04:34 +00:00
Will Constable	9fb9219478	Make DDPOptimizer work with torch._dynamo.explain() (#94749 ) GraphModules that were created during DDPOptimizer graph breaking lacked `compile_subgraph_reason`, which caused an exception when running .explain(). Now the reason is provided and users can use .explain() to find out that DDPOptimizer is causing graph breaks. Fixes #94579 Pull Request resolved: https://github.com/pytorch/pytorch/pull/94749 Approved by: https://github.com/voznesenskym	2023-02-14 01:33:47 +00:00
Xuehai Pan	046e88a291	[BE] [3/3] Rewrite `super()` calls in test (#94592 ) Rewrite Python built-in class `super()` calls. Only non-semantic changes should be applied. - #94587 - #94588 - #94592 Also, methods with only a `super()` call are removed: ```diff class MyModule(nn.Module): - def __init__(self): - super().__init__() - def forward(self, ...): ... ``` Some cases that change the semantics should be kept unchanged. E.g.: `f152a79be9/caffe2/python/net_printer.py (L184-L190)` `f152a79be9/test/test_jit_fuser_te.py (L2628-L2635)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/94592 Approved by: https://github.com/ezyang, https://github.com/seemethere	2023-02-12 22:20:53 +00:00
Jason Ansel	2b0d7e63f0	Move dynamo.optimizations.distributed to backends (#93408 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/93408 Approved by: https://github.com/wconstab	2023-02-02 20:42:17 +00:00
Will Constable	ac791bddce	Refactor dynamo distributed test helpers to be reusable (#93187 ) The point is to let Test helpers previously defined and used in `test_dynamo_distributed.py` be used from a new file `test_traceable_collectives.py` later in this stack. Pull Request resolved: https://github.com/pytorch/pytorch/pull/93187 Approved by: https://github.com/kumpera	2023-02-01 06:09:42 +00:00
Will Constable	648202ceb9	Improve DDPOptimizer by avoiding small preamble graph (#93162 ) This optimizes an edge case where some compute-only ops (e.g. add) could end up in an orphan graph at the input side due to the bucket for the next graph being full already. The fix is to fuse this graph (which is "empty" in parameter count) together with the adjoining "full" bucket. Note: i encountered this when trying to repro some suspected duplicate argument errors, but this is unrelated and I have not yet repro'd a duplicate arg issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/93162 Approved by: https://github.com/davidberard98	2023-01-28 15:33:53 +00:00
Will Constable	5441f2c067	Fix DDPOptimizer fake_mode execution (#92986 ) Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #92986 When running compiled submods for the purpose of producing outputs to pass to the compilation step for the next submod, we use fake parameters and assume fake inputs, but we forgot to activate our fake_mode during execution. This caused certain edge cases where tensors other than activations or parameters got created during execution, such as scalar->tensor expansion in the case of executing torch.where(tensor, scalar, scalar). Also add a test and clarify behavior of DDPOptimizer via comments. Fixes #92941 Pull Request resolved: https://github.com/pytorch/pytorch/pull/92986 Approved by: https://github.com/bdhirsh	2023-01-26 00:37:54 +00:00
Edward Z. Yang	6dcc214ac2	Fix AssertionError fake_mode is not None in distributed (#90392 ) Fixes https://github.com/pytorch/pytorch/issues/90375 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/90392 Approved by: https://github.com/voznesenskym	2022-12-07 20:12:39 +00:00
Ram Rachum	351d73b97f	Fix exception causes all over the codebase (#90271 ) This is the continuation to #90134 and hopefully the final PR in this series. Pull Request resolved: https://github.com/pytorch/pytorch/pull/90271 Approved by: https://github.com/kit1980	2022-12-07 04:29:00 +00:00
Will Constable	705ad36cc5	Dynamo asserts FSDP wrapped modules use_orig_param (#89523 ) - This is a strict requirement given the way dynamo+FSDP is implemented, but isn't convenient to assert. - By plumbing use_orig_param field on all wrapped modules, we can do this assertion inside dynamo Pull Request resolved: https://github.com/pytorch/pytorch/pull/89523 Approved by: https://github.com/awgu	2022-11-29 05:27:23 +00:00
Edward Z. Yang	95563b3eda	Reland "Add single process version of dynamo distributed hf_Bert tests (#89721 )" (#89756 ) This reverts commit `0d9a615af4`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89756 Approved by: https://github.com/anjali411, https://github.com/malfet	2022-11-28 19:15:03 +00:00
PyTorch MergeBot	0d9a615af4	Revert "Add single process version of dynamo distributed hf_Bert tests (#89721 )" This reverts commit `1a2dd6b15e`. Reverted https://github.com/pytorch/pytorch/pull/89721 on behalf of https://github.com/ezyang due to this broke inductor_distributed job	2022-11-28 14:56:54 +00:00
Edward Z. Yang	49eb43fc45	Don't modify log level in dynamo distributed test (#89655 ) Let the developer decide! Taken from voz's https://github.com/pytorch/pytorch/pull/89392 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/89655 Approved by: https://github.com/albanD	2022-11-28 14:47:52 +00:00
Edward Z. Yang	1a2dd6b15e	Add single process version of dynamo distributed hf_Bert tests (#89721 ) It's a lot easier to debug problems in the Dynamo optimization pass if you aren't actually triggering a multiprocessing run. Keep these tests around. I think the other tests can probably get this treatment too, leaving this to future work. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/89721 Approved by: https://github.com/voznesenskym	2022-11-28 03:16:47 +00:00
David Berard	304b5de1b0	Re-enable test_hf_bert_fsdp (#89223 ) It looks like this failure was actually caused by https://github.com/pytorch/pytorch/pull/88629, see the revert message on that PR. It probably just looked like a flaky test on CI because of how quickly the PR was reverted. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89223 Approved by: https://github.com/voznesenskym	2022-11-18 21:40:27 +00:00
Michael Voznesensky	06ce1338bc	[dynamo] Port all pytorch/dynamo and test/dynamo pieces over from symbolic-shapes branch (#88768 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88768 Approved by: https://github.com/jansel, https://github.com/ezyang	2022-11-13 04:50:21 +00:00
Will Constable	a3f3ec8fac	[FSDP+dynamo]: forward treats parameter-views as params (#88781 ) Dynamo+AotAutograd needs a way to wrap all tensors (whether inputs or params/buffers) in FakeTensor wrappers, and FSDP's mangling of parameters hides them from this wrapping. This PR unblocks running hf_bert and hf_T5 with FSDP under dynamo, whether using recursive wrapping around transformer layers or only applying FSDP around the whole model. Perf/memory validation and possibly optimization is the next step. `python benchmarks/dynamo/distributed.py --torchbench_model hf_Bert --fsdp --dynamo aot_eager` `python benchmarks/dynamo/distributed.py --torchbench_model hf_Bert --fsdp --dynamo aot_eager --fsdp_wrap` `python benchmarks/dynamo/distributed.py --torchbench_model hf_T5 --fsdp --dynamo aot_eager` `python benchmarks/dynamo/distributed.py --torchbench_model hf_T5 --fsdp --dynamo aot_eager --fsdp_wrap` The problem: Dynamo (Actually aot_autograd) trips up with FSDP becuase it must wrap all input tensors in FakeTensor wrappers, and it only knows to wrap graph inputs or named_(parameters, buffers). FSDP's pre_forward hook sets views (which are not nn.param) into the flatparam as attrs on the module with the same name as the original param, but they will not show up in named_parameters. - in use_orig_params mode, FSDP still de-registers params during pre-forward hook, then re-registers them post-forward - during forward (between the hooks), the params are setattr'd on the module as regular view tensors, not nn.Parameters - note: use_orig_params is the recommended way to use FSDP, and use_orig_params=False is being deprecated. So i only consider use_orig_params=True for this enablement The solution: - adding them to named_buffers is not possible because it interferes with how FSDP's `_apply` works - since they are not actual nn.parameters, register_parameter will complain about registering them - simply seting `module._parameters[name] = view` seems to be a viable workaround, despite being hacky, and FSDP code does modify _parameters directly already. Note: Manual checkpointing still isn't working with FSDP+dynamo, so that will have to be addressed in a follow up. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88781 Approved by: https://github.com/ezyang, https://github.com/awgu	2022-11-12 01:17:23 +00:00
Will Constable	3fd0729bb6	DDPOptimizer replace debug=True/False with using torchdynamo logger (#88480 ) Example output: ``` 2022-11-04 05:09:29,525] torch._dynamo.optimizations.distributed: [INFO] DDPOptimizer bucket assignments ┌─────────┬────────────┬───────────────────┐ │ Index │ Size (b) │ Param Names │ ├─────────┼────────────┼───────────────────┤ │ 0 │ 100120020 │ self_net_6_weight │ ├─────────┼────────────┼───────────────────┤ │ │ │ self_net_6_bias │ ├─────────┼────────────┼───────────────────┤ │ │ │ self_net_4_weight │ ├─────────┼────────────┼───────────────────┤ │ │ │ self_net_4_bias │ ├─────────┼────────────┼───────────────────┤ │ 1 │ 100020000 │ self_net_2_weight │ ├─────────┼────────────┼───────────────────┤ │ │ │ self_net_2_bias │ ├─────────┼────────────┼───────────────────┤ │ 2 │ 220000 │ self_net_0_weight │ ├─────────┼────────────┼───────────────────┤ │ │ │ self_net_0_bias │ └─────────┴────────────┴───────────────────┘ [2022-11-04 05:09:29,527] torch._dynamo.optimizations.distributed: [DEBUG] ---orig graph--- graph(): %inputs : torch.Tensor [#users=1] = placeholder[target=inputs] %self_net_0 : [#users=1] = call_module[target=self_net_0](args = (%inputs,), kwargs = {}) %self_net_1 : [#users=1] = call_module[target=self_net_1](args = (%self_net_0,), kwargs = {}) %self_net_2 : [#users=1] = call_module[target=self_net_2](args = (%self_net_1,), kwargs = {}) %self_net_3 : [#users=1] = call_module[target=self_net_3](args = (%self_net_2,), kwargs = {}) %self_net_4 : [#users=1] = call_module[target=self_net_4](args = (%self_net_3,), kwargs = {}) %self_net_5 : [#users=1] = call_module[target=self_net_5](args = (%self_net_4,), kwargs = {}) %self_net_6 : [#users=1] = call_module[target=self_net_6](args = (%self_net_5,), kwargs = {}) %self_net_7 : [#users=1] = call_module[target=self_net_7](args = (%self_net_6,), kwargs = {}) return (self_net_7,) ---split graph--- graph(): %inputs : torch.Tensor [#users=1] = placeholder[target=inputs] %submod_0 : [#users=1] = call_module[target=submod_0](args = (%inputs,), kwargs = {}) %submod_1 : [#users=1] = call_module[target=submod_1](args = (%submod_0,), kwargs = {}) %submod_2 : [#users=1] = call_module[target=submod_2](args = (%submod_1,), kwargs = {}) return (submod_2,) ---submod_0 graph--- graph(): %inputs : [#users=1] = placeholder[target=inputs] %self_net_0 : [#users=1] = call_module[target=self_net_0](args = (%inputs,), kwargs = {}) %self_net_1 : [#users=1] = call_module[target=self_net_1](args = (%self_net_0,), kwargs = {}) return self_net_1 ---submod_1 graph--- graph(): %self_net_1 : [#users=1] = placeholder[target=self_net_1] %self_net_2 : [#users=1] = call_module[target=self_net_2](args = (%self_net_1,), kwargs = {}) %self_net_3 : [#users=1] = call_module[target=self_net_3](args = (%self_net_2,), kwargs = {}) return self_net_3 ---submod_2 graph--- graph(): %self_net_3 : [#users=1] = placeholder[target=self_net_3] %self_net_4 : [#users=1] = call_module[target=self_net_4](args = (%self_net_3,), kwargs = {}) %self_net_5 : [#users=1] = call_module[target=self_net_5](args = (%self_net_4,), kwargs = {}) %self_net_6 : [#users=1] = call_module[target=self_net_6](args = (%self_net_5,), kwargs = {}) %self_net_7 : [#users=1] = call_module[target=self_net_7](args = (%self_net_6,), kwargs = {}) return self_net_7 --------------- ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/88480 Approved by: https://github.com/anj-s, https://github.com/davidberard98	2022-11-05 02:40:51 +00:00
Will Constable	678d038001	Support DDP ignored parameters in DDPOptimizer (#88460 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88460 Approved by: https://github.com/aazzolini	2022-11-04 21:42:15 +00:00
Will Constable	70b00b1383	Add hf_bert + DDP multigpu test (#88435 ) Spot-checks an e2e model working with ddp. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88435 Approved by: https://github.com/davidberard98	2022-11-04 03:17:48 +00:00
Will Constable	a51da28551	Support multi-gpu CI for inductor-distributed (#87996 ) This test by itself isn't the end goal, but it is a minimal test that exercises multi-gpu and the focus of the PR is the infra behind enabling that. I'll follow up with more tests using actual models etc. and @malfet @desertfire for awareness/feedback on the infra side Pull Request resolved: https://github.com/pytorch/pytorch/pull/87996 Approved by: https://github.com/aazzolini	2022-11-02 03:52:20 +00:00
Will Constable	82a9de16d4	Change dynamo/distributed tests to use cuda/nccl (#88133 ) - FSDP tests require nccl - also run in inductor shard and skip inductor in distributed shard - inductor shard has newer GPU and supports triton/inductor, but only runs on trunk - distributed shard runs on PR, but inductor shard only runs on trunk/opt-in Pull Request resolved: https://github.com/pytorch/pytorch/pull/88133 Approved by: https://github.com/davidberard98	2022-11-01 15:35:44 +00:00
Will Constable	91c95ff7c5	Enable graph_split_inductor test as it runs now (#87762 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87762 Approved by: https://github.com/davidberard98	2022-10-26 22:06:03 +00:00
Will Constable	aa66c6e01e	Fix missing weight init and clean up helper (#87760 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87760 Approved by: https://github.com/davidberard98	2022-10-26 19:29:35 +00:00

1 2

54 Commits