pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Will Constable	77df2ca9b6	Special-case fsdp wrapped modules to be Unspecialized (#89330 ) ### Summary Making dynamo treat the nn.Modules inside FSDP wrappers as 'Unspecialized' results in dynamo-produced graphs where nn.module parameters are inputs to the graph rather than attributes of the outer graphmodule. This helps in FSDP since it forces dynamo to pick the latest copy of the parameters off the user's nn.Module (which FSDP mutates every pre_forward), solving the ordering issue in backward. ### Details Imagine this toy model ``` class MyModule(torch.nn.Module): def __init__(self, a, b): super(MyModule, self).__init__() self.net = nn.Sequential( nn.Linear(a, b), nn.ReLU(), ) def forward(self, x): return self.net(x) class ToyModel(nn.Module): def __init__(self): super(ToyModel, self).__init__() self.net = nn.Sequential( *[MyModule(10, 10000)] + [MyModule(10000, 1000)] + [MyModule(1000, 5)] ) def forward(self, x): return self.net(x) ``` Where FSDP is recursively wrapped around each `MyModule`, then dynamo-compiled, with dynamo already configured to skip/break in FSDP code. You'd expect to get 3 compiled AOT functions, corresponding to the contents of `MyModule`, and then see FSDP's communication ops happen inbetween them (eagerly). This almost happens (everything works out fine in forward), but in backward there is an ordering issue. FSDP creates a flat buffer for all the parameters that are bucketed together, and then creates views into this buffer to replace the original parameters. On each iteration of forward, it creates a new view after 'filling' the flatbuffer with data from an all-gather operation, to 'unshard' the parameters from remote devices. Dynamo traces the first such view and stores it in a compiled graphmodule. During tracing, we see (1) view created for first MyModule, (2) compile first MyModule, (3) ... for the rest of layers Then during runtime, we see (A) view created for first MyModule (and orphaned), (B) execute first compiled MyModule, using old view, ... This is a problem, because we want backward hooks to run right after each compiled-backward, but autograd executes those hooks in an order mirroring their execution order during forward. Since we are forever using the views created during steps (1, 3, .. N), which all happen before the steps (A, B, ...), this means that all the hooks will happen after all the compiled backwards. An illustration of the problem - a torchviz graph showing the 2 possible orderings of autograd, and a profile showing the view-backwards ops happening after all the compiled backwards, and before all the backward hooks. <img width="2069" alt="image" src="https://user-images.githubusercontent.com/4984825/202828002-32dbbd15-8fc3-4281-93e9-227ab5e32683.png"> <img width="2069" alt="image" src="https://user-images.githubusercontent.com/4984825/202828632-33e40729-9a7f-4e68-9ce1-571e3a8dd2dd.png"> A solution is to make dynamo not specialize on these nn modules. It is worth pointing out that this nn.module specialization is de-facto failing, as we are modifying .parameters and this bypasses dynamo's __setattr__ monkeypatch, which should have automatically kicked us out to Unspecialized and forced a recompile. After unspecializing, the new views (created during steps A, C, ...) are actually _used_ at runtime by the module, making their creation order interleaved, making autograd execute their backwards interleaved. The new torchviz graph (this time with names added for the view tensors): <img width="2043" alt="image" src="https://user-images.githubusercontent.com/4984825/202828480-d30005ba-0d20-45d8-b647-30b7ff5e91d3.png"> And a new profile showing the interleaving of compiled backwards and hooks, allowing overlapping of reduce-scatter. <img width="2293" alt="image" src="https://user-images.githubusercontent.com/4984825/202828533-bb20a041-19b8-499c-b3cf-02808933df47.png"> @jansel @davidberard98 @aazzolini @mrshenli @awgu @ezyang @soumith @voznesenskym @anijain2305 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89330 Approved by: https://github.com/davidberard98	2022-11-29 01:24:03 +00:00
Edward Z. Yang	6904324781	Remove fake_tensor_propagation (#89646 ) You always have to run dynamo with fake tensors. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/89646 Approved by: https://github.com/soumith	2022-11-25 03:27:32 +00:00
Edward Z. Yang	fc7dcb684a	Run optimizer tests with fake tensors (#89643 ) This is a slight regression: RAdam and Adagrad don't appear to trace at all under fake tensors. But I think this is a more accurate reflection of the current state of affairs. Along the way fix some problems on the fake tensor path. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/89643 Approved by: https://github.com/anjali411	2022-11-24 22:46:49 +00:00
Edward Z. Yang	6fb6eb0a74	Support unspecialized integers with dynamic shapes (#89639 ) Previously, we hackily wrapped unspecialized integers into tensors and treated them as tensor inputs. Sometimes, downstream operations would not be able to deal with the tensor input. Now, we wrap them into SymInt, so more correct overload selection occurs. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/89639 Approved by: https://github.com/anjali411	2022-11-24 22:46:42 +00:00
Edward Z. Yang	94a88b53ed	Remove fake_tensors_available (#89637 ) As we are one repo now, they are always available. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/89637 Approved by: https://github.com/anjali411	2022-11-24 19:28:10 +00:00
Yanbo Liang	37e46a5035	[Dynamo] Fix several bugs & code refactor in RangeVariable (#89322 ) Fix bug in [7k github models](https://github.com/pytorch/torchdynamo/issues/1884): https://github.com/jansel/pytorch-jit-paritybench/blob/master/generated/test_clovaai_stargan_v2.py ``` E TypeError: 'list' object cannot be interpreted as an integer E E from user code: E File "/scratch/ybliang/work/repos/pytorch-jit-paritybench/generated/test_clovaai_stargan_v2.py", line 335, in forward E idx = torch.LongTensor(range(y.size(0))) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/89322 Approved by: https://github.com/jansel	2022-11-23 19:44:48 +00:00
Yanbo Liang	b72f5b9ae3	[Dynamo] Support typing.Mapping & Support function as argument (#88963 ) These missing features come from https://github.com/pytorch/benchmark/pull/1302, where we'd like to enable E2E hf_bert dynamo train/eval. The dependent [HuggingFace accelerate library](https://huggingface.co/docs/accelerate/index) requires these improvements. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88963 Approved by: https://github.com/jansel	2022-11-17 06:57:42 +00:00
Michael Voznesensky	06ce1338bc	[dynamo] Port all pytorch/dynamo and test/dynamo pieces over from symbolic-shapes branch (#88768 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88768 Approved by: https://github.com/jansel, https://github.com/ezyang	2022-11-13 04:50:21 +00:00
PyTorch MergeBot	ba4d5aae06	Revert "rename DisableTorchFunction to DisableTorchFunctionSubclass (#88218 )" This reverts commit `7f28be10e5`. Reverted https://github.com/pytorch/pytorch/pull/88218 on behalf of https://github.com/izaitsevfb due to BC-breaking change, D41211901	2022-11-11 19:13:05 +00:00
samdow	7f28be10e5	rename DisableTorchFunction to DisableTorchFunctionSubclass (#88218 ) First half of #87990. This doesn't change any of the behavior and is just a rename Pull Request resolved: https://github.com/pytorch/pytorch/pull/88218 Approved by: https://github.com/ezyang, https://github.com/zou3519	2022-11-10 14:51:13 +00:00
Yanbo Liang	56b150ac63	[Dynamo] Support optimizing over any Tensor with requires_grad = True (#87141 ) Fixes https://github.com/pytorch/torchdynamo/issues/1604 Re-submit for https://github.com/pytorch/torchdynamo/pull/1646 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87141 Approved by: https://github.com/jansel	2022-10-19 22:13:07 +00:00
Jason Ansel	054a2fd6c2	Sync changes from `pytorch/torchdynamo` (#87013 ) This updates to: `6380959be2` Generated with: https://github.com/pytorch/torchdynamo/blob/main/copy_to_core.sh Pull Request resolved: https://github.com/pytorch/pytorch/pull/87013 Approved by: https://github.com/voznesenskym	2022-10-15 21:00:57 +00:00
Jason Ansel	c7c09722ad	Move TorchDynamo into PyTorch core (#86461 ) Context: https://github.com/pytorch/torchdynamo/issues/1588 This PR moves [TorchDynamo](https://github.com/pytorch/torchdynamo) and TorchInductor into PyTorch core. - `torchdynamo` becomes `torch._dynamo` - `torchinductor` becomes `torch._inductor` This PR was generated by running `copy_to_core.sh` in https://github.com/pytorch/torchdynamo/pull/1538 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86461 Approved by: https://github.com/voznesenskym	2022-10-13 23:18:06 +00:00

13 Commits