Fixes#117685.
This PR only makes ConstantSource perserved for built-in ops when we find all the inputs are either constant tensors or python constants.
It doesn't fundamentally solve the problem of preserving ConstantSource information through all operators that's potentially can be constant folded.
For the following code in the issue:
```
class Bob(torch.nn.Module):
def __init__(self, p, val) -> None:
super().__init__()
self.p = p
self.y = torch.nn.Parameter(torch.tensor(val))
def forward(self, x: torch.Tensor) -> torch.Tensor:
# This only looks dynamic but it's actually a constant value
if get_y(self.y) < self.p:
return torch.cat([x,x])
else:
return x
```
The graph exported looks like following:
```python
class GraphModule(torch.nn.Module):
def forward(self, x):
arg0: "f32[s0, s1]";
arg0, = fx_pytree.tree_flatten_spec(([x], {}), self._in_spec)
l_x_ = arg0
# File: /home/yidi/local/pytorch/test/dynamo/test_export.py:1498 in forward, code: return torch.cat([x, x])
cat = torch.cat([l_x_, l_x_]); l_x_ = None
return pytree.tree_unflatten([cat], self._out_spec)
```
Test Plan:
Added a new test for the given repro.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/117704
Approved by: https://github.com/jansel, https://github.com/anijain2305
This PR refactors the distributed related variables to use
DistributedVariable for common methods, so that things like
`python_type` works for all distributed variables.
Maybe we can add `as_python_constant` to the DistributedVariable too? I
didn't add in this PR but if that make sense I can update.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/117590
Approved by: https://github.com/voznesenskym
# Context
In some cases, we might want to build the `context_fn` with runtime-defined policies. One way of implementing this is to make `context_fn` be a partial, which holds the information that we want to pass. One concrete example is the [automatic policy selection from `xformers`](ad986981b1/xformers/checkpoint.py (L185)).
# The problem
The previous implementation wouldn't work with partials because `FunctoolsPartialVariable` doesn't have a `fn` attribute.
This PR addresses this case, but ideally we could get this solved in a more general fashion, as callable classes and `NestedUserFunctionVariable` are not supported by this PR.
# Tests
I've added a basic test that mimics the tests around it. The tests could probably be simplified, but I've decided to keep changes to a minimum.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/117657
Approved by: https://github.com/yf225
For training graphs (when inputs require grad), previously, we would speculate the forward and backward graph to determine if there are any graph breaks, side effect and etc but would not actually use these speculated graphs. We would just insert a call function node on the graph and later rely on autograd's tracing.
This approach does not work for more generalized graphs like graphs that include user defined triton kernels because autograd is not able to do the higher order function conversation.
This PR speculates the forward and backward functions and emits them in a HOF that later gets used via templating mechanism.
While working on this PR, I have exposed some bugs in the current tracing due to trampoline functions losing the source information resulting in incorrect graphs being produced. I have fixed these source information bugs and killed the trampolines.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/116897
Approved by: https://github.com/Skylion007, https://github.com/jansel, https://github.com/voznesenskym
## Motivation
The current code of `value in [torch.backends.cudnn, torch.ops]` requires `value` to have the implementation of `__eq__`. If the value is a custom object and does not implement `__eq__`, dynamo will throw error. For example, ConvolutionOpContext, the custom 'torch._C.ScriptClass' object registered in IPEX, dynamo will throw the following error:
**torch._dynamo.exc.InternalTorchDynamoError: '__eq__' is not implemented for __torch__.torch.classes.ipex_prepack.ConvolutionOpContext**
I think this is a common issue, To avoid this issue, the PR replaces the current code `value in [torch.backends.cudnn, torch.ops]`with `isinstance(value, (torch.backends.cudnn.CudnnModule, torch._ops._Ops)`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/116856
Approved by: https://github.com/jansel
* This is an old builtin function equivalent to the bool constructor. it is easy enough to add support for.
* I also realized the tests were in the wrong class (the one reserved for testing default args) so I moved them.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/117463
Approved by: https://github.com/jansel
This is a placeholder implementation for reconstructing streams via global storage to unblock FSDP, pending proper stream support design
This PR does a few things:
1) fixes registration for devices with indices. We were only supporting "cuda", we now support "cuda:k" interfaces where k is # of gpu
2) Changes the stream objects in dynamo to take devices as device types, instead of strings, and updates the string based device APIs to gracefully take device types.
3) Introduces a reconstruct-by-global (using existing cleanup hook structures) to streams as a placeholder impl for now
Pull Request resolved: https://github.com/pytorch/pytorch/pull/117386
Approved by: https://github.com/jansel
Previously, kwargs were incorrectly dispatched by passing them as the true kwargs to the torch function call. To fix, the kwargs of the original torch op need to be stored in a dictionary and passed as an argument to the torch function implementation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/117083
Approved by: https://github.com/drisspg
This prepares the PR where we implement sets in terms of dicts.
To do so, rather than storing internally a dictionary that maps literals
to VariableTrackers, it stores (pretty much) a dictionary from VTs to VTs.
To do so, keys are wrapped in an opaque internal class _Hashable.
The Hashable class is opaque on purpose so that it fails hard if
if it inadvertently leaks back into user code.
We also found and fixed a number of latent bugs and inconsistencies
in the way dynamo checked what can be a dict key. More generally, we
make much clearer what are the things that need to be modified to add
a new supported key type to Dicts.
Fixes [#107595](https://www.internalfb.com/tasks?t=107595)
Fixes [#111603](https://www.internalfb.com/tasks?t=111603)
Re-PR of https://github.com/pytorch/pytorch/pull/111196 sadly due to reverts, we could not reuse @lezcano's original PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/116785
Approved by: https://github.com/mlazos
This is proof-of-concept implementation of how people can use a marker `mark_strict` to enable torchdynamo while exporting under non-strict mode. The main idea is that `mark_strict` will turn into an HOO which then utilizes dynamo to do correctness analysis in the same way how torch.cond works today. There are some notable limitations:
1. This API is not meant for public use yet
2. Strict region can't work with arbitrary container inputs
3. We don't preserve `nn_module_stack` and other node metadata for the strict region.
4. strict_mode HOO will show up in the final graph. This is undesirable in the long term, but for short term experiments, it should be good enough. Will fix this in the follow up PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/114658
Approved by: https://github.com/ydwu4
We change manually_set_subgraph_inputs to three modes: manual, automatic and flatten_manual. The flatten_manual wil first flatten the sub_args then recussively call set_subgrah_inputs = "manual". This allows us to control the order of the placeholder shown up in the graph, which is necessary for map, where we want to keep the mapped arguments before the rest positional arguments.
Right now, map only takes a single tensor as mapped argument but it would become pretty easy to match the subgraph inputs to original proxy if we have a "flatten_manual" option.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115853
Approved by: https://github.com/zou3519
For training graphs (when inputs require grad), previously, we would speculate the forward and backward graph to determine if there are any graph breaks, side effect and etc but would not actually use these speculated graphs. We would just insert a call function node on the graph and later rely on autograd's tracing.
This approach does not work for more generalized graphs like graphs that include user defined triton kernels because autograd is not able to do the higher order function conversation.
This PR speculates the forward and backward functions and emits them in a HOF that later gets used via templating mechanism.
While working on this PR, I have exposed some bugs in the current tracing due to trampoline functions losing the source information resulting in incorrect graphs being produced. I have fixed these source information bugs and killed the trampolines.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/116358
Approved by: https://github.com/jansel