Requested from @tugsbayasgalan: we want dynamo to preserve some FX node metadata when we trace `GraphModule`s (`nn_module_stack`, `source_fn`, `stack_trace`). This is helpful for the case when we export an aten-level `GraphModule`, add some (possibly non-torch or non-aten) ops, and we want to transform the graph back into an aten-level graph. Without preserving metadata, future passes that look at metadata (e.g. quantization passes) won't work.
This feature also has the additional benefit of being able to preserve origin line of code when `print_readable`'ing a `GraphModule`. This is helpful when debugging graphs that have passed through dynamo several times.
The added unit test demonstrates the added functionality of this PR.
~This PR is currently a proof-of-concept implementation that shows that preserving node metadata across dynamo is possible.~ This PR preserves node metadata across dynamo by doing the following:
- ~inject a counter variable into the `GraphModule` source code, which is incremented every time a node is run~
- Construct a line number -> node index map in `GraphModule` as the source code is being generated.
- pass a list of node metadata and the line number map to dynamo's bytecode analyzer
- ~dynamo traces the counter as a `ConstantVariable`, so when we create a new proxy, we can determine which original node index this proxy corresponds by looking at the value of the traced counter~
- When we create a new proxy, get the current instruction's line number, and get the node index using the line number map
- index into the original node metadata ~using the counter variable's tracked value.~
~Some things that should be addressed off the top of my head:~
- ~Is this feature even desirable? (Do we really want Dynamo to have special behavior for `GraphModules`? Should we expect users to re-export `GraphModules`?)~
- ~Is there a better approach than to use a counter? We considered using node names, line numbers, and assuming that proxies are created in the same order as the nodes, but each of these 3 have shortcomings. For node names, we only have access to new node names, not the old ones. Using line number is fragile. The third is problematic since not all created nodes go through `create_proxy` (e.g. inputs). We currently generate a line number to node index map when the `GraphModule`'s code is generated.~
- ~What's the best way to send data across the "CPython gap"? That is, it is not obvious how to cleanly pass data from dynamo's `eval_frame.py:_TorchDynamoContext.__call__` to `symbolic_convert.py:InstructionTranslatorBase.__init__`. In this PR, we use a global.~
Differential Revision: [D49257108](https://our.internmc.facebook.com/intern/diff/D49257108)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107067
Approved by: https://github.com/jansel
Requested from @tugsbayasgalan: we want dynamo to preserve some FX node metadata when we trace `GraphModule`s (`nn_module_stack`, `source_fn`, `stack_trace`). This is helpful for the case when we export an aten-level `GraphModule`, add some (possibly non-torch or non-aten) ops, and we want to transform the graph back into an aten-level graph. Without preserving metadata, future passes that look at metadata (e.g. quantization passes) won't work.
This feature also has the additional benefit of being able to preserve origin line of code when `print_readable`'ing a `GraphModule`. This is helpful when debugging graphs that have passed through dynamo several times.
The added unit test demonstrates the added functionality of this PR.
~This PR is currently a proof-of-concept implementation that shows that preserving node metadata across dynamo is possible.~ This PR preserves node metadata across dynamo by doing the following:
- ~inject a counter variable into the `GraphModule` source code, which is incremented every time a node is run~
- Construct a line number -> node index map in `GraphModule` as the source code is being generated.
- pass a list of node metadata and the line number map to dynamo's bytecode analyzer
- ~dynamo traces the counter as a `ConstantVariable`, so when we create a new proxy, we can determine which original node index this proxy corresponds by looking at the value of the traced counter~
- When we create a new proxy, get the current instruction's line number, and get the node index using the line number map
- index into the original node metadata ~using the counter variable's tracked value.~
~Some things that should be addressed off the top of my head:~
- ~Is this feature even desirable? (Do we really want Dynamo to have special behavior for `GraphModules`? Should we expect users to re-export `GraphModules`?)~
- ~Is there a better approach than to use a counter? We considered using node names, line numbers, and assuming that proxies are created in the same order as the nodes, but each of these 3 have shortcomings. For node names, we only have access to new node names, not the old ones. Using line number is fragile. The third is problematic since not all created nodes go through `create_proxy` (e.g. inputs). We currently generate a line number to node index map when the `GraphModule`'s code is generated.~
- ~What's the best way to send data across the "CPython gap"? That is, it is not obvious how to cleanly pass data from dynamo's `eval_frame.py:_TorchDynamoContext.__call__` to `symbolic_convert.py:InstructionTranslatorBase.__init__`. In this PR, we use a global.~
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107067
Approved by: https://github.com/jansel
Add similar semantics for creating a buffer object similar to creating a parameter. This is done by introducing a new `Buffer` class that can be used for type disambiguation. The underlying functionality of registering a buffer remains the same as the `register_buffer` method has not been changed. The `persistent` parameter in the `Buffer` type is to indicate whether a buffer object should be persistent or not. Other non-test changes have to do with getting the new `Buffer` type recognized by inductor and dynamo. Remaining changes are test changes to make sure that the `Buffer` type can be used as a drop in replacement for `register_buffer` as it just leads to `register_buffer` being called. The addition of this new functionality still allows for normal tensors to be used as buffers so these changes are intended to be backwards compatible.
Fixes#35735
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104069
Approved by: https://github.com/mikaylagawarecki
Previously, you'd get `<eval_with_key>.0`; now you get `<eval_with_key>.0 from /data/users/ezyang/b/pytorch/test/dynamo/test_misc.py:5683 in forward`
I used to do this with globals, but now I do it with a `co_fields` parameter that's plumbed around, because putting things in globals has implications(TM). Happy to bikeshed on the `co_fields` structure.
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103885
Approved by: https://github.com/albanD
Summary:
When we pickle/unpickle graph module in multipy, we would lost modules/attributes that are not referred in the graph. This is because when unpickle fx graph module, we use the stored `__dict__` and the fx graph to create a new graph module. In GraphModule init, we drop any attribute that is not referred in the graph.
This behavior is not ideal because we actually expect a graph module that's exactly the same after unpickling.
Test Plan:
```
buck test mode/opt caffe2/test:fx -- test_preserve_unused_attr_after_unpickle
Tests finished: Pass 1. Fail 0. Fatal 0. Skip 0. Build failure 0
```
Differential Revision: D46976230
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104115
Approved by: https://github.com/houseroad
Previously, you'd get `<eval_with_key>.0`; now you get `<eval_with_key>.0 from /data/users/ezyang/b/pytorch/test/dynamo/test_misc.py:5683 in forward`
I used to do this with globals, but now I do it with a `co_fields` parameter that's plumbed around, because putting things in globals has implications(TM). Happy to bikeshed on the `co_fields` structure.
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103885
Approved by: https://github.com/albanD
Summary:
# Context
In TorchRec's train pipeline, we need to fx trace a module to analyze the arguments on the forward call. In order to do this, we need to preserve some sort of meaning with each argument (a key or name of sorts that lets us identify the argument).
The issue is, when you use concrete args, internally, fx will unflatten the arg into it's constituents (to locate PHs).
Given a function that looks like this:
```
def process(batch: Dict[str, torch.Tensor]):
....
symbolic_trace(process, concrete_args: {"batch": {"f1": PH, "f2": PH}})
# function will be rewritten to look like:
def process(batch_1, batch_2): # batch_1 -> "f1", batch_2->"f2"
...
```
When you traverse through the nodes of the graph, the names of the argument nodes to the function are batch_1 and batch_2. **This doesn't mean anything to the user who is fx tracing.** There isn't anything indicating that batch_1 corresponds to key "f1" in the batch input.
# Solution
When fx sees a "PH", it creates a proxy node.
The user does not have direct access to proxy creation, but only through the PH structure.
Attach a piece of metadata, `ph_key`, to the PH when you set it in the concrete args, it will get passed into proxy + node creation. So when you traverse the graph, this metadata sticks onto the node as an attribute. This way you have a way of tagging that "batch_1" as "f1".
Test Plan: added a unit test
Reviewed By: dstaay-fb
Differential Revision: D44947653
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102195
Approved by: https://github.com/PaliC
Summary: Change placeholder check from singleton to instanceof PHBase so you can create your own PH class with metadata
Test Plan: added unit test
Reviewed By: joshuadeng
Differential Revision: D46085128
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102008
Approved by: https://github.com/PaliC
Added helper functions to match nodes in the graph that are decomposed from their source (leaf modules, or functional ops), as a result of dynamo tracing.
`get_source_partitions(graph: torch.fx.Graph, wanted_sources: List[Any]) -> Dict[Any, SourcePartition]`
Args:
* graph: The graph we want to partition
* wanted_sources: List of sources of nodes that were decomposed from this source. This can be a function (ex. torch.nn.functional.linear) or a leaf module type (ex. torch.nn.Linear)
Returns:
* Dictionary mapping sources (ex. torch.nn.modules.linear.Linear) to a list of SourcePartitions that correspond to the list of nodes that were flattened from a module of that type.
```
@dataclass
class SourcePartition():
# Nodes in a particular partition
nodes: List[Node]
# Module type
module_type: Type
# Nodes in the graph that are needed as inputs to the partition
input_nodes: List[Node] = field(default_factory=list)
# Nodes in the partition that are being used by nodes outside of the partition
output_nodes: List[Node] = field(default_factory=list)
# Parameters that are being used
params: List[str] = field(default_factory=list)
```
Example:
Original:
```
x -> linear -> linear -> relu -> linear
```
Traced graph:
```
.graph():
%arg0 : [#users=1] = placeholder[target=arg0]
%_param_constant0 : [#users=1] = get_attr[target=_param_constant0]
%t_default : [#users=1] = call_function[target=torch.ops.aten.t.default](args = (%_param_constant0,), kwargs = {})
%_param_constant1 : [#users=1] = get_attr[target=_param_constant1]
%addmm_default : [#users=1] = call_function[target=torch.ops.aten.addmm.default](args = (%_param_constant1, %arg0, %t_default), kwargs = {})
%_param_constant0_1 : [#users=1] = get_attr[target=_param_constant0]
%t_default_1 : [#users=1] = call_function[target=torch.ops.aten.t.default](args = (%_param_constant0_1,), kwargs = {})
%_param_constant1_1 : [#users=1] = get_attr[target=_param_constant1]
%addmm_default_1 : [#users=1] = call_function[target=torch.ops.aten.addmm.default](args = (%_param_constant1_1, %addmm_default, %t_default_1), kwargs = {})
%relu_default : [#users=1] = call_function[target=torch.ops.aten.relu.default](args = (%addmm_default_1,), kwargs = {})
%_param_constant2 : [#users=1] = get_attr[target=_param_constant2]
%t_default_2 : [#users=1] = call_function[target=torch.ops.aten.t.default](args = (%_param_constant2,), kwargs = {})
%_param_constant3 : [#users=1] = get_attr[target=_param_constant3]
%addmm_default_2 : [#users=1] = call_function[target=torch.ops.aten.addmm.default](args = (%_param_constant3, %relu_default, %t_default_2), kwargs = {})
return [addmm_default_2]
```
Result of `get_module_partitions`:
```
{<class 'torch.nn.modules.linear.Linear'>: [
ModulePartition(nodes=[_param_constant0, t_default, _param_constant1, addmm_default], module_type=<class 'torch.nn.modules.linear.Linear'>, input_nodes=[arg0], output_nodes=[addmm_default], params=["_param_constant0", "_param_constant1"]),
ModulePartition(nodes=[_param_constant0_1, t_default_1, _param_constant1_1, addmm_default_1], module_type=<class 'torch.nn.modules.linear.Linear'>, input_nodes=[addmm_default], output_nodes=[addmm_default_1], params=["_param_constant0_1", "_param_constant1_1"]),
ModulePartition(nodes=[_param_constant2, t_default_2, _param_constant3, addmm_default_2], module_type=<class 'torch.nn.modules.linear.Linear'>, input_nodes=[relu_default], output_nodes=[addmm_default_2], params=["_param_constant2", "_param_constant3"])],
<class 'torch.nn.modules.activation.ReLU'>: [
ModulePartition(nodes=[relu_default], module_type=<class 'torch.nn.modules.activation.ReLU'>, input_nodes=[addmm_default_1], output_nodes=[relu_default], params=[])]}
```
Also added helper function to check if two module partitions are connected:
`check_subgraphs_connected(subgraph1: SourcePartition, subgraph2: SourcePartition) -> bool`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98628
Approved by: https://github.com/cccclai
Summary: There're some customized functions that we would also like to keep during eliminate dead code pass. Add a function to help us to do.
Test Plan: Added a unit test
Differential Revision: D44273630
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97288
Approved by: https://github.com/houseroad
Summary:
Currently torch.fx support Modules with input of namedtuple/dataclass, return as namedtuple, but does not allow Module.forward to return a dataclass, running `test_trace_return_dataclass` without this change will have following error:
NotImplementedError: argument of type: <class 'test_fx.TestFX.test_trace_return_dataclass.<locals>.MyOutput'>
File "test_trace_return_dataclass
traced_graph = symbolic_trace(module).graph
File "test/__fx__/fx#link-tree/torch/fx/_symbolic_trace.py", line 1114, in symbolic_trace
graph = tracer.trace(root, concrete_args)
File "test/__fx__/fx#link-tree/torch/fx/_symbolic_trace.py", line 783, in trace
(self.create_arg(fn(*args)),),
File "test/__fx__/fx#link-tree/torch/fx/_symbolic_trace.py", line 378, in create_arg
return super().create_arg(a)
File "test/__fx__/fx#link-tree/torch/fx/proxy.py", line 269, in create_arg
raise NotImplementedError(f"argument of type: {type(a)}")
this diff handle dataclass type.
Test Plan:
buck test @//mode/opt @//mode/inplace //caffe2/test:fx -- test_trace_
graph():
%d : torch.Tensor [#users=1] = placeholder[target=d]
%my_output : [#users=1] = call_function[target=test_fx.MyOutput](args = (), kwargs = {foo: %d, bar: %d})
return my_output
Differential Revision: D44916519
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99576
Approved by: https://github.com/suo
Twice this week I have had people confuse "operator defined with Python
operator registration aka torch.library" and "PyOperator which is used
to define control flow operators and other operators that cannot be
represented in JIT schema." Renaming PyOperator for clarity.
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97493
Approved by: https://github.com/SherlockNoMad
Applies the remaining flake8-comprehension fixes and checks. This changes replace all remaining unnecessary generator expressions with list/dict/set comprehensions which are more succinct, performant, and better supported by our torch.jit compiler. It also removes useless generators such as 'set(a for a in b)`, resolving it into just the set call.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94676
Approved by: https://github.com/ezyang
To match nodes within the graph, the matcher currently flattens the arguments and compares each argument against each other. However, if it believes that a list input contains all literals, it will not flatten the list and will instead compare the list directly against each other. It determines if a list is a literal by checking if the first element is a node. However this doesn't work in some cases (like the test cases I added).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94375
Approved by: https://github.com/SherlockNoMad
Fixes https://github.com/pytorch/pytorch/issues/89421
The strategy is to patch the given function wrapped with `@torch.fx.wrap` so that if a tensor tracer is active, we will `proxy_call` the function.
`proxy_call` will also skip certain checks if the function to proxy call is not a torch op (checked with `isinstance(.., OpOverload)`.
@IvanYashchuk @ezyang @Chillee
Pull Request resolved: https://github.com/pytorch/pytorch/pull/93273
Approved by: https://github.com/ezyang
Summary:
One of such places where circular reference can occur is: _load_state_dict_pre_hooks contains a _WrappedHook, _WrappedHook has a weakref to the same module.
Test Plan:
Reviewers:
Subscribers:
Tasks:
Tags:
Fixes #ISSUE_NUMBER
Pull Request resolved: https://github.com/pytorch/pytorch/pull/93038
Approved by: https://github.com/jerryzh168
In 3.11 bytecode size is not constant, so in order to get from `f_lasti` to opcode index, one need to search for the closes offset in disassembled instructions.
Update `_patch_function` to construct code with all the properties that exist in 3.11 runtime.
Update `_torchscript_schema_to_signature` to mark `from` named arg as positional argument only, as this is a reserved keyword in Python and as such checked by `inspect` package in 3.11
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92895
Approved by: https://github.com/albanD
# Summary
In preparation for pt 2.0 launch this PR updates SDPA's API and makes the function a nn.funcitonal public function.
## Changes
### API
Previously the the function signature was:
`scaled_dot_product_attention(query, key, value, attn_mask=None, need_attn_weights=False, dropout_p=0.0, is_causal=False) -> (Tensor, Tensor)`
Updated signature:
`scaled_dot_product_attention(query, key, value, attn_mask=None, dropout_p=0.0, is_causal=False) -> Tensor`
This PR removes the need_attn_weights optional boolean variable and updates the return type to a singular tensor.
#### Reasoning:
The main goal of this function is to provide an easy interface for users to call into fused attention kernels e.g. (FlashAttention). The fused kernels do not currently support arbitrary attn_mask or dropout but there is a PR to mem-efficient attention to enable these. We want to have the API surface ready for when the backing kernels get updated.
The fused kernels save on memory usage by not materializing the weights and it is unlikely that a fast fused implementation will enable this feature so we are removing.
Discussed with folks at FAIR/Xformers and +1 this API change.
#### Make function Public
In preparation for the pt 2.0 launch we make the function public to start to generate user feedback
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92189
Approved by: https://github.com/cpuhrsch
Summary:
This PR supports the following feature for QConfigMapping:
```
qconfig_mapping = QConfigMapping().set_object_type(torch.nn.Conv2d, qconfig)
backend_config = get_qnnpack_pt2e_backend_config()
m = prepare_pt2e(m, qconfig_mapping, example_inputs, backend_config)
```
which means users want to set the qconfig for all calls to `torch.nn.Conv2d` to use `qconfig`, note this is only verified for the case when the module is broken down to a single aten op right now, e.g. torch.nn.Conv2d will be torch.ops.aten.convolution op when traced through. will need to support more complicated modules that is broken down to multiple operators later, e.g. (MaxPool)
Test Plan:
python test/test_quantization.py TestQuantizePT2E.test_qconfig_module_type
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92355
Approved by: https://github.com/jcaip
This PR:
- Updates the docs to say it is deprecated
- Raises a UserWarning
- Changes most of the callsites inside PyTorch to use
torch.func.functional_call, minus the test_stateless testing.
The motivation behind this is that we can now align behind a single
functional_call API in PyTorch.
Test Plan:
- existing tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92280
Approved by: https://github.com/albanD
Fixes: https://github.com/pytorch/pytorch/issues/88010
This PR does a couple things to stop slow gradcheck from timing out:
- Splits out test_ops_fwd_gradients from test_ops_gradients, and factors out TestFwdGradients and TestBwdGradients which both inherit from TestGradients, now situated in common_utils (maybe there is a better place?)
- Skips CompositeCompliance (and several other test files) for slow gradcheck CI since they do not use gradcheck
- because test times for test_ops_fwd_gradients and test_ops_gradients are either unknown or wrong, we hardcode them for now to prevent them from being put together. We can undo the hack after we see actual test times are updated. ("def calculate_shards" randomly divides tests with unknown test times in a round-robin fashion.)
- Updates references to test_ops_gradients and TestGradients
- Test files that are skipped for slow gradcheck CI are now centrally located in in run_tests.py, this reduces how fine-grained we can be with the skips, so for some skips (one so far) we still use the old skipping mechanism, e.g. for test_mps
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88216
Approved by: https://github.com/albanD
Re-submit of gh-72302
This still has a small performance hit, but it much smaller. On my
machine I see `_record_fucntion_exit._RecordFunction` takes 1.05 us
compared to the `Tensor` overload taking 0.79 us.
In an overall comparison, I see a 0.7 us slowdown from 6.0 us to
6.7 us for this timeit benchmark
```python
import torch
def foo():
with torch.profiler.record_function("foo"):
return torch.eye(3)
%timeit foo()
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76420
Approved by: https://github.com/robieta
Summary:
{F770932209}
Given the original execution order and the node dependency relationship (note that the same dependency order could generate multiple execution order, which refers to “Topological Order”), after reunion, we could find the new execution order of the new GraphModule is different from the original one which is not what we want.
For example, let’s assume that NewLeaf_1 is EmbeddingLookup (Calling EmbeddingLookup is awaitable, we will keep executing the following nodes rather than waiting for the result until we have to know the lookup result), NewLeaf_4 is the node where we HAVE to get the lookup result to interact with the NewLeaf_3. So NewLeaf_1 will launch a lookup kernel and all2all communication stream to distribute the result to all ranks. In the meantime, we want to keep executing NewLeaf_2 and NewLeaf_3 to avoid meaningless waiting. However, given the new execution order, we have to wait for the lookup kernel and all2all communication to be finished since the next node NewLeaf_4 needs the result, until then we can execute NewLeaf_2, etc. It cannot leverage the advantage of parallel computation and communication stream and will hurt the QPS a lot.
So while constructing the GraphModule, we have to change from the topological order to the original order
Test Plan:
Unit test
Not sure how to add tests in FX as there's no TARGETS, so I added in the TorchRec folder
Differential Revision: D39567314
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85188
Approved by: https://github.com/SherlockNoMad
Ran across this issue while using torch.fx.wrap on a decorated function: it triggered a KeyError: 'wrapper_inside_decorator'. torch.fx.wrap stores function.__code__.co_name, but that isn't set correctly (and doesn't match it's name in the global namespace) for decorators; function.__name__ is set correctly.
Also adjusted to checking for callable instead of checking for the existing of __code__ to allow for a broader variety of functions that can be passed in. Eg. using functools.cache returns a callable that won't have a __code__ attribute.
I added a unit test (that incidentally fails every test in the suite before the fix commit -- because it affects the global state), and then a fix that addresses it.
```
In [1]: import functools
In [2]: def decorator(f):
...: @functools.wraps(f)
...: def wrapper(*args, **kwargs):
...: return f(*args, **kwargs)
...: return wrapper
...:
In [3]: @decorator
...: def some_function(x):
...: return x
...:
In [4]: some_function.__name__
Out[4]: 'some_function'
In [5]: some_function.__code__.co_name
Out[5]: 'wrapper'
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84373
Approved by: https://github.com/jamesr66a, https://github.com/SherlockNoMad
Fixing [failing](https://github.com/pytorch/pytorch/runs/8083404365?check_suite_focus=true) tests by adjusting the input size for S3D. The reason the test is failing is because S3D requires a bigger input size than previously passed.
As noted before, TorchVision already checks that its models are FX traceable and ensures all the tests are updated and work properly prior adding new architectures. The tests here seem to duplicate our efforts and often break because they don't factor in details about each model. It might be worth considering running TorchVision's tests instead.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84526
Approved by: https://github.com/pbelevich
Resolves [breakages](https://github.com/pytorch/pytorch/runs/7762125339?check_suite_focus=true) observed at #82560
Context:
The current FX tests assume that every public method under `torchvision.models` is a model builder method. To get a list of those methods, they query the `__dict__` attribute of the module. Unfortunately this assumption is not true and the tests already contain some workarounds to filter some methods. A better approach would be to query TorchVision for all of its available models under a specific module. This is exactly what the new Registration API can help us do and that's what we use in this PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83187
Approved by: https://github.com/ezyang
PassManager is a class used to run multiple passes on a given graph module.
Class Attributes
* `passes: List[Callable]`: A list of callable passes
* `constraints: List[Callable]`: A list of constraints
* `run_checks_after_each_pass`: Flag for running checks each pass
Class Methods:
* `__call__(graph_module: DispatchGraphModule)`:
* Runs the passes based on the list of passes until the graph stops changes, or until `steps` number of times.
* Each time a pass is run, it will check that the graph module still maintains the required invariants by calling `check()` and will lint the graph to check that it’s well formed if the flag `run_checks_after_each_pass` is set.
* `check(graph_module: DispatchGraphModule)`: Runs various checks on the given graph module to make sure that it contains the needed data for passes
* `add_check(check: Callable)`: Adds the `check` function to the given pass manager instance
* `add_constraint(constraint: Callable)`: Adds a constraint to the current list of constraints
We can create a PassManager and run it by doing:
```
PassManager(passes=[pass1, pass2])(graph_module)
```
Differential Revision: [D37523159](https://our.internmc.facebook.com/intern/diff/D37523159)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80531
Approved by: https://github.com/SherlockNoMad
There are small typos in:
- caffe2/python/recurrent.py
- test/distributed/test_c10d_nccl.py
- test/test_fx.py
- torch/csrc/jit/runtime/autodiff.cpp
- torchgen/gen.py
Fixes:
- Should read `propagation` rather than `propogation`.
- Should read `multiplied` rather than `multuplied`.
- Should read `eliminate` rather than `elminate`.
- Should read `dispatcher` rather than `disaptcher`.
Semi-automated pull request generated by
https://github.com/timgates42/meticulous/blob/master/docs/NOTE.md
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81435
Approved by: https://github.com/ngimel
If installed with pep517 support, `torchvision` will be build againstreleased version of PyTorch rather than against the one currently installed on the system
Also update `torchvision` hash to 8a45147f9d and:
- Added `maskrcnn_resnet50_fpn_v2`, `maskrcnn_resnet50_fpn_v2`, `retinanet_resnet50_fpn_v2`, `ssd300_vgg16`, `fcos_resnet50_fpn` and `ssdlite320_mobilenet_v3_large` to the list of untraceable models
- Set default input size to (1, 3, 16, 224, 224) for `mvit_v1_b` model
- Skipped `test_roi_aligned`,`test_batched_nms`, `test_roi_pooled` and `test_roi_align_aligned` ONNX test (tracked in https://github.com/pytorch/pytorch/issues/81121 )
- Skipped TorchVision integration tests in `test_package` (tracked in https://github.com/pytorch/pytorch/issues/81115 )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81074
Approved by: https://github.com/kit1980
Summary: The root module may have different forward functions. The current implementation assumes only the func forward can be traced. In this PR, we add an attribute func name to Tracer class to enable users trace different functions
Test Plan:
python3 test/test_fx.py TestFX.test_trace_multiple_funcs
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77502
Approved by: https://github.com/jamesr66a
Summary: The root module may have different forward functions. The current implementation assumes only the func `forward` can be traced. In this diff, we add an argument of forward func name to enable users trace different forward functions
Test Plan: N1903198
Differential Revision: D36157032
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77109
Approved by: https://github.com/jamesr66a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76253
We're observing large QPS regression on the original PR https://github.com/pytorch/pytorch/pull/72302. For the training job we had, it regressed from 720k QPS to 450k QPS (see the test plan in FB internal). We suspect this is because the api was changed from `_record_function_enter` to `_record_function_enter_new`, and we're running experiments to confirm that. Will add more details when the runs in the test plan has finished. For now, it's better to revert the diff to unblock internal usecases and we can think about how to reland this diff later.
Original commit changeset: dc9939f1fa6d
Original Phabricator Diff: D35257354
Test Plan:
on trunk: f338665947
with this diff: f338502850
Reviewed By: malfet, robieta
Differential Revision: D35853300
fbshipit-source-id: dd38042aeacb848f66756491a4c849c7c652a0e1
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74662
Previously, we would not emit a check that `concrete_args` with value `None` matched that value during runtime. This fixes that and improves some of the warning messages
Test Plan: Imported from OSS
Reviewed By: Chillee
Differential Revision: D35137362
Pulled By: jamesr66a
fbshipit-source-id: 222a2c8a907748f90290f1c1b4ab8012b46099a0
(cherry picked from commit b960405ad87e57dcf62ca25dd4d4bdfc34c8744c)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74637
Forgot to update the expect file in https://github.com/pytorch/pytorch/pull/74242. Reland to include changes in expect file.
Test Plan: unit test
Reviewed By: yinghai
Differential Revision: D35089989
fbshipit-source-id: 5e3ad9c696cf31cbc691d34fdb77eff26f92e38d
(cherry picked from commit 110ac12f5e2bcca7552d4b4691c7d98fafb21a57)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74242
The inputs and outputs of the graph module might be different from the graph inputs and outputs if users are using custom codegen. In interpreter, it runs the graph instead of the generated forward function so it might not work if user provides the inputs to the graph module. To fill the gap, we call `process_inputs` and `process_outputs` inside interpreter.
Test Plan: unit test: test_interpreter_with_codegen
Reviewed By: jamesr66a, Chillee
Differential Revision: D34898108
fbshipit-source-id: 250bd236f6c8c1268a363cf19a09521a4f64b3a9
(cherry picked from commit b33076fa3b10788d455cecc590bc01c4ad8ef94c)
Summary:
The fx test case wasn't disabled properly because it didn't call the parent class' setUp().
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74216
Reviewed By: zou3519
Differential Revision: D34898707
Pulled By: janeyx99
fbshipit-source-id: 83e56f5a1efc50d24646c182160f7cfcb5bc9935
(cherry picked from commit bb8dd72d1640c1ef0201d615c5d405479afdf078)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74189
Use the codegen on the original graph module for the new graph module produced by transformer.
Test Plan: Added a unit test: test_custom_codegen_with_transformer
Reviewed By: yinghai
Differential Revision: D34867938
fbshipit-source-id: fcda6600faeccfa7a650ba7226ca125e8440b19c
(cherry picked from commit d098c12081f61ddcf69052db5b8a1f31b0a0b67b)