It's kind of intractable to enable mypy everywhere at the moment,
because there are a lot of errors, and also mypy is really slow
for some reason. I just want enough types to explain the public
types for user compiler calls, going through typing the _C.dynamo
bindings along the way. This is a first step for this.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89731
Approved by: https://github.com/suo
There is only one call site for compiler_fn, so we can safely delay
wrapping verify correctness to here. This will help later when we
change the backend compiler calling convention to pass fake tensors
(but I need to pass real tensors here.)
This is adapted from voz's changes at https://github.com/pytorch/pytorch/pull/89392
but with less changes to the substantive logic. I only moved the relevant
inner implementation; there are no changes otherwise.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89662
Approved by: https://github.com/voznesenskym
Fixes https://github.com/pytorch/torchdynamo/issues/1839
Should I do this for all backends or just inductor?
## Test
On a V100 I got from AWS
```python
from torch._dynamo import optimize
import torch
def fn(x, y):
a = torch.cos(x)
b = torch.sin(y)
return a + b
new_fn = optimize("inductor")(fn)
a = new_fn(torch.Tensor(1),torch.Tensor(1))
print(a)
```
## New logs
```
(sourcetorch) ubuntu@ip-172-31-31-152:~/test$ python test.py
/home/ubuntu/pytorch/torch/_dynamo/eval_frame.py:318: UserWarning: Tensor cores are available but not enabled. Consider setting torch.backends.cuda.matmul.allow_tf32 == True in your python script for speedups
warnings.warn(
tensor([1.3717])
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88844
Approved by: https://github.com/ngimel, https://github.com/mlazos, https://github.com/anijain2305
This is an API change, so please review carefully.
With this PR, torchdynamo returns an `OptimizedModule` class object, a subclass of `torch.nn.Module`, when asked to optimize a `nn.Module` object. Most of the methods are redirected to the original `nn.Module`, which is installed as `_mod` in the `OptimizedModule`.
This is helpful for many cases
```
mod = MockModule()
opt_mod = torch._dynamo.optimize()(mod)
print(opt_mod) # Works
opt_mod = opt_mod.to(device="cuda")
print(opt_mod) # Works
opt_mod(input) # Triggers recompile if necessary, earlier we were shedding the TorchDynamo wrapper
opt_mod.parameters() # Refers to the original module
```
Topics unclear to me
* I have overridden many methods to raise NotImplementedError. A careful review of those will be good.
* hooks
* For the optimized forward, should we call torchdynamo optimization on `__call__` or `forward`
* What else to test
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88629
Approved by: https://github.com/Chillee, https://github.com/jansel, https://github.com/msaroufim
Summary:
Today when we transform the captured graph in the last step in export(aten_graph=True), we construct a new graph which doesn't have the all the metadata to be preserved, for example, node.meta["val"].
meta["val"] is important for writing passes and analysis on the graph later in the pipeline, we may want to preserve that on placeholder nodes.
Test Plan: test_export.py:test_export_meta_val
Differential Revision: D41110864
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88651
Approved by: https://github.com/tugsbayasgalan, https://github.com/jansel
Previously a check would only apply DDP optimizer on frames named "forward".
But on hf_T5_large, a graph break causes some frames like:
```
<graph break in _shift_right>
<graph break in forward>
```
So instead, apply DDP optimizer on all frames.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87097
Approved by: https://github.com/wconstab