Commit Graph

227 Commits

Author SHA1 Message Date
Sherlock Huang
a8add2b92f Support matching Args for SubgraphMatcher (#85456)
Subgraph matcher now handles the matching of non-Node arguments.

Here are the 4 cases
- pn is Node, gn is Node: this go through the regular _match_node() function
- pn is Noed, gn is not a Node: this is a match if only pn is a placeholder op
- pn is not Node, gn is Node: this is a no match case
- pn is not a Node, gn is not a Node: this will go through the argument comparison.

With this change
```
def target(x):
    return foo(x, 3)

def pattern(x, y):
    return foo(x, y)
```

is a match

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85456
Approved by: https://github.com/jerryzh168
2022-09-24 20:06:48 +00:00
Renfei Chen
4befe45084 [FX] Add one option to maintain the FX graph execution order after splitting_module (#85188)
Summary:
{F770932209}

Given the original execution order and the node dependency relationship (note that the same dependency order could generate multiple execution order, which refers to “Topological Order”), after reunion, we could find the new execution order of the new GraphModule is different from the original one which is not what we want.
For example, let’s assume that NewLeaf_1 is EmbeddingLookup (Calling EmbeddingLookup is awaitable, we will keep executing the following nodes rather than waiting for the result until we have to know the lookup result), NewLeaf_4 is the node where we HAVE to get the lookup result to interact with the NewLeaf_3. So NewLeaf_1 will launch a lookup kernel and all2all communication stream to distribute the result to all ranks. In the meantime, we want to keep executing NewLeaf_2 and NewLeaf_3 to avoid meaningless waiting. However, given the new execution order, we have to wait for the lookup kernel and all2all communication to be finished since the next node NewLeaf_4 needs the result, until then we can execute NewLeaf_2, etc. It cannot leverage the advantage of parallel computation and communication stream and will hurt the QPS a lot.
So while constructing the GraphModule, we have to change from the topological order to the original order

Test Plan:
Unit test

Not sure how to add tests in FX as there's no TARGETS, so I added in the TorchRec folder

Differential Revision: D39567314

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85188
Approved by: https://github.com/SherlockNoMad
2022-09-23 23:21:54 +00:00
Sherlock Huang
34296e2f4c SubgraphMatcher remove invalid matches (#85444)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85444
Approved by: https://github.com/rkindi
2022-09-22 02:59:11 +00:00
Elias Ellison
8bd9fe3f49 Changes to prepare for fake tensors on in functorch by default (#84432)
Fixes some errors you run into in dynamo when turning on fake tensors. I'm waiting on flipping the switch because I need to also get some fixes into dynamo + do benchmarking.

I could manually turn off fake tensors in functorch in dynamo, and then turn it on here if requested, although the changes here are pretty minimal.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84432
Approved by: https://github.com/Chillee
2022-09-08 04:29:30 +00:00
Wei Wei
31ef8ddb8c add option to remove passes (#84425)
Summary:
Add a remove_pass method in pass_manager to provide user option to remove any pass.

Reviewed By: wushirong

Differential Revision: D39080077

Pull Request resolved: https://github.com/pytorch/pytorch/pull/84425
Approved by: https://github.com/yinghai
2022-09-07 17:21:27 +00:00
Qiming Lu
e71370064c Improvements to FX Minimizer (#83833)
Summary: This diff improves the FX Minimizer for better error reports, and fixes a few other issues.

Test Plan: CI

Differential Revision: D38900309

Pull Request resolved: https://github.com/pytorch/pytorch/pull/83833
Approved by: https://github.com/yuhc, https://github.com/Chillee
2022-09-01 18:39:26 +00:00
Horace He
85931eaa6b Rename fake_result to val (#84331)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84331
Approved by: https://github.com/ezyang
2022-08-31 17:44:18 +00:00
Sungmin Cho
bf67589915 Escape curly brackets in FxGraphDrawer _typename (#83604)
Summary:
Encountered `Error: bad label format` from dot (i.e. graphviz) when benchmarking models that have dict-like structure.

The root cause was that curly brackets were not properly escaped, like this example P522499127 (unescaped curly brackets in target= string)

This diff insert the fix in FxGraphDrawer, since many of these graph generation codes rely on that class.

(Modified summary before exporting to GitHub PR)

Test Plan:
```
CUDA_VISIBLE_DEVICES=7 buck run mode/opt -c python.package_style=inplace //hpc/new/models/feed/benchmark:feed_lower_benchmark -- --model-name={INSERT IFR QE MODEL NAME HERE} --batch-iter 100 --batch-size 768 --num-gpu 1 --lower-presets {INSERT ITS PRESET}
```

Will not encounter dot errors after this diff.

(Modified test plan before exporting to GitHub PR)

Reviewed By: yinghai

Differential Revision: D38758827

Pull Request resolved: https://github.com/pytorch/pytorch/pull/83604
Approved by: https://github.com/yinghai, https://github.com/jianyuh
2022-08-31 15:15:21 +00:00
Isaac Hoffman
20018aa766 modify split_by_tags to retain output order (#84136)
Summary: Currently `split_by_tags` determines submodule output order by iterating over `used_in_main`. Since this is a `Set`, insertion order is not retained so we run into problems with submodule output order being "randomized" & inconsistent between splits. By using `Dict[Node, None]` we can implement `used_in_main` as an ordered set so that output order is consistent when splitting the same model.

Test Plan: CI

Differential Revision: D39039268

Pull Request resolved: https://github.com/pytorch/pytorch/pull/84136
Approved by: https://github.com/houseroad
2022-08-30 20:36:33 +00:00
Zhengxu Chen
a402e100be [fx] Make wrapped_fn also work for non-mutating passes. (#84232)
Summary: Before the change, wrapped_fn should only take mutating passes, but we don't actually have any way to detect whether a pass is mutating before running it. To make this an abstraction without involving any precondition depending on PassManager run, we could just relax the precondition to take any kind of passes, and conditionally return the original pass based on the pass result.

Test Plan: eyes

Reviewed By: qihqi, angelayi

Differential Revision: D39086343

Pull Request resolved: https://github.com/pytorch/pytorch/pull/84232
Approved by: https://github.com/angelayi
2022-08-30 01:16:58 +00:00
Angela Yi
352da6de6b [fx][pass] Fix type of exception (#84094)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84094
Approved by: https://github.com/SherlockNoMad
2022-08-29 16:55:59 +00:00
PyTorch MergeBot
1945d28f58 Revert "[fx][pass] Fix type of exception (#84094)"
This reverts commit eb2fa2e042.

Reverted https://github.com/pytorch/pytorch/pull/84094 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally
2022-08-29 16:41:09 +00:00
Angela Yi
eb2fa2e042 [fx][pass] Fix type of exception (#84094)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84094
Approved by: https://github.com/SherlockNoMad
2022-08-26 22:34:14 +00:00
Nan Xiao
c47e0450f8 [fbia] Keep Track of full qualified name before and after remote sharding (#83889)
Summary: track qualname changes in embedding sharding & FX split, and compose target qualname in the end of FBIA transform stage, so we can use the qualname mapping in XL materialize stage

Test Plan:
CI/CD

with DISABLE_XLEBB_MATERIALIZATION = True
https://fburl.com/fblearner/a8yljbux

with DISABLE_XLEBB_MATERIALIZATION = False
https://fburl.com/fblearner/2nvi0dam

Reviewed By: lliu315gt

Differential Revision: D38772525

Pull Request resolved: https://github.com/pytorch/pytorch/pull/83889
Approved by: https://github.com/houseroad
2022-08-24 01:15:25 +00:00
Shirong Wu
fc470cf980 Back out "Support regex-style matching for Any and Oneof (#82853)" (#83922)
Reviewed By: hl475

Differential Revision: D38945806

Pull Request resolved: https://github.com/pytorch/pytorch/pull/83922
Approved by: https://github.com/hl475
2022-08-24 00:17:46 +00:00
Angela Yi
89072177e1 [fx][pass infra] Adding error catching (#83933)
Example:

```
======================================================================
ERROR: test_pass_manager_error (fx.test_pass_infra.TestPassManager)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/angelayi/Projects/pytorch/torch/fx/passes/infra/pass_manager.py", line 285, in __call__
    res = fn(module)
  File "/Users/angelayi/Projects/pytorch/test/fx/test_pass_infra.py", line 164, in pass_fail
    raise RuntimeError("bad")
RuntimeError: bad

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/angelayi/Projects/pytorch/test/fx/test_pass_infra.py", line 170, in test_pass_manager_error
    pm(traced_m)
  File "/Users/angelayi/Projects/pytorch/torch/fx/passes/infra/pass_manager.py", line 289, in __call__
    raise RuntimeError(msg) from e
RuntimeError: An error occured when running the 'pass_fail' pass after the following passes: ['replace_add_with_mul_pass', 'replace_mul_with_div_pass']
```

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/83933
Approved by: https://github.com/SherlockNoMad
2022-08-23 23:56:50 +00:00
Brian Hirsh
8db04c1113 reinplace pass: special handling for view_scatter ops (#83846)
There is already special handling in the reinplacing pass for removing `{view}_scatter` ops, but there is another case that needs special handling. In this code:
```
         def f():
             a = torch.zeros(4, 4, 4)
             a[:, 2:] = torch.ones(4, 2, 4)
             return a
```

Tracing normally with `make_fx()` gives you:
```

def forward(self):
    zeros = torch.ops.aten.zeros.default([4, 4, 4], device = device(type='cpu'), pin_memory = False)
    ones = torch.ops.aten.ones.default([4, 2, 4], device = device(type='cpu'), pin_memory = False)
    slice_tensor = torch.ops.aten.slice.Tensor(zeros, 0, 0, 9223372036854775807)
    slice_tensor_1 = torch.ops.aten.slice.Tensor(slice_tensor, 1, 2, 9223372036854775807);  slice_tensor = None
    copy__default = torch.ops.aten.copy_.default(slice_tensor_1, ones);  slice_tensor_1 = ones = None
    return zeros
```
Functionalizing it gives you:

```
def forward(self):
    zeros = torch.ops.aten.zeros.default([4, 4, 4], device = device(type='cpu'), pin_memory = False)
    ones = torch.ops.aten.ones.default([4, 2, 4], device = device(type='cpu'), pin_memory = False)
    slice_tensor = torch.ops.aten.slice.Tensor(zeros, 0, 0, 9223372036854775807)
    slice_tensor_1 = torch.ops.aten.slice.Tensor(slice_tensor, 1, 2, 9223372036854775807);  slice_tensor = None
    slice_tensor_2 = torch.ops.aten.slice.Tensor(zeros, 0, 0, 9223372036854775807)
    slice_scatter_default = torch.ops.aten.slice_scatter.default(slice_tensor_2, ones, 1, 2, 9223372036854775807);  slice_tensor_2 = ones = None
    slice_scatter_default_1 = torch.ops.aten.slice_scatter.default(zeros, slice_scatter_default, 0, 0, 9223372036854775807);  zeros = slice_scatter_default = None
    return slice_scatter_default_1
```

Notice that there are not any functional ops to directly re-inplace! What actually happened is that functionalization turned the `copy_()` into a `copy()`, but the out-of-place `copy()` operator gets optimized away because it's a no-op (when the input and output metadata are the same, `out = copy(a, b)` just returns `b`).

What we actually want is to replace this line:
```
slice_scatter_default = torch.ops.aten.slice_scatter.default(slice_tensor_2, ones, 1, 2, ...);
```
with this:
```
new_slice = torch.ops.aten.slice.Tensor(slice_tensor_2, 1, 2, ...);
_ = torch.ops.aten.copy_.default(new_slice, ones)
```

In the above, we're taking a fresh slice of the "base" tensor, and performing a `copy_()` on the slice, adding back what functionalization removed.

We actually need to create a fresh "slice" node, because we're not guaranteed that one already exists in the graph (technically there should be one, but it might have been DCE'd by the time we hit re-inplacing)

I also updated the docs for re-inplacing to more closely match the order of the logic.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/83846
Approved by: https://github.com/ezyang
2022-08-23 17:13:58 +00:00
Brian Hirsh
75ec7b7547 reinplace pass: bugfix for output node replacement (#83845)
Cleaned up some of the arg replacement logic to use tree_map, so it handles FX nodes that have nested containers.

See the added test: when you write a function that returns a list, the `output` node in the FX graph shows up as having `node.args = tuple(immutable_list(...))`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/83845
Approved by: https://github.com/ezyang
2022-08-23 17:13:58 +00:00
Alex Beloi
3c6c39e66e [fx] refactor fba_passes into FBAPassManagerBuilder (#83268)
Summary:
This diff integrate FBAPassManagerBuilder as the primary orchestrator of FBA-FX passes

Reviewed By: jfix71, dborkovic

Differential Revision: D38186354

Pull Request resolved: https://github.com/pytorch/pytorch/pull/83268
Approved by: https://github.com/dborkovic
2022-08-22 06:54:18 +00:00
Brian Hirsh
e9e7363854 reinplacing pass fixes for torchbench + huggingface (#83626)
I'm testing out turning on re-inplacing + functionalization by default with the AOTAutograd + eager backend on torchbench + huggingface models. This PR contains a few bug fixes from turning re-inplacing on:

(1) Handle more gracefully when FakeTensorMode is already turned on when you call reinplace

(2) More robust detection for when an inplace variant of an op exists (the dumb bug was that `pow.Scalar` doesn't have an inplace variant, even though there are several overloads of `pow_`. None of them are eligible though

(3) Avoid re-inplacing when it would require resizing the input buffer. This isn't allowed, because inplace ops aren't allowed to resize their inputs.

For the last one, I gave the two main examples in more detail in the comments. Important cases are:
```
# This should not be re-inplaced at all; the op broadcasts, so this would require resizing the self tensor
torch.add(tensor[1, 4], tensor[4, 4])

# This should not be re-inplaced, because the inplace and out-of-place variants of the op return different dtypes
torch.ge(a, b)
# However, this means that today when functionalization functionalists a `torch.ge_(a, b)` call, reinplacing won't properly de-functionalize it. I mentioned that optimization is worth adding later in the comments
```

(4) There's some logic around keeping `storage_to_nodes` up to date when we see a view op: if we re-inplace `out = a.add(...)`, and later in the program we encounter a "later_node",`out.view(..)`, and need to replace it with `a.view(...)`, then we need to update some metadata structures. I had to fix that logic: specifically, if "later_node" isn't a dispatcher op, (e.g. if it's an FX output node), I wasn't properly handling the case where the node's fake_meta info was not a tensor.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/83626
Approved by: https://github.com/ezyang
2022-08-19 23:30:45 +00:00
Sherlock Huang
39e6238788 Support regex-style matching for Any and Oneof (#82853)
pseudo.any is a wildcard node that can be matched with any fx node with arbitrary number of inputs and outputs.
For example, to match relu followed by one fx node:
```
    def pattern(a):
        y = a.relu()
        z = torch.ops.pseudo.any(y)
        return z
```

pseudo.oneof is a special node that can be matched with a fx node whose target is in the permissible list.
`targets` must be be a list of qualified name for operators, e.g. ["operator.add", "torch.sigmoid",
"torch.ops.aten.foo", "torch.ops.prims.bar"]

For example, using following pattern with pseudo.oneof
```
    def pattern(a):
        y = a.relu()
        z = torch.ops.pseudo.oneof(y, targets=["relu", "torch.sigmoid", "operator.add"])
        return z
```

It will have 3 matches in the following function
```
    def forward(y):
        z = y.relu()
        x = z.relu()    # first match

        x = x.relu()
        x = torch.sigmoid(x)    # second match

        x = x.relu()
        return x + 1    # third match
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82853
Approved by: https://github.com/ezyang
2022-08-12 18:43:13 +00:00
Sherlock Huang
2ca721cda5 An improved version of subgraph matcher (#82090)
This new version of subgraph matcher further supports
- optionally match with pattern's placeholder and output nodes
- patterns with multiple outputs
- filtering out non-containing matches
- filtering out overlapping matches

TODOs:
- [x] Update replace_pattern() to use this matcher
- [x] Fix cases with identical anchor
- [x] Introduce wildcard matching, such Any, OneOf
- [ ] Improve node comparer to match args and kwargs values
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82090
Approved by: https://github.com/ezyang
2022-08-12 03:32:09 +00:00
Sergii Dymchenko
a0b3854548 Change seperate -> separate (#83056)
One instance was caught by Meta-internal "exact-word-misspell" linter in D38505529.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83056
Approved by: https://github.com/huydhn, https://github.com/seemethere
2022-08-09 23:11:34 +00:00
Horace He
51bbf6329a Improved legalize_graph pass in FX (#82874)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82874
Approved by: https://github.com/jamesr66a
2022-08-07 00:13:17 +00:00
Shirong Wu
4ae40d74ac Back out "Add an op_lowering_disallow_list in fx splitter base class. (#82288)" (#82750)
Summary:
Revert since this breaks BC test
More context:
failing test
https://www.internalfb.com/.../fblearner/details/361780349/
issue report thread
https://fb.workplace.com/groups/2211200152361974/permalink/2303690223112966/

Test Plan: All unit test

Differential Revision: D38399966

Pull Request resolved: https://github.com/pytorch/pytorch/pull/82750
Approved by: https://github.com/yinghai
2022-08-05 02:15:00 +00:00
Brian Hirsh
d362b8e9e6 reland "add a reinplacing FX pass (#80897)" (#82407)
fixes #81457
fixes #81216
fixes #81212
fixes #81207
fixes #81206
fixes #81218
fixes #81203
fixes #81202
fixes #81214
fixes #81220
fixes #81205
fixes #81200
fixes #81204
fixes #81221
fixes #81209
fixes #81210
fixes #81215
fixes #81217
fixes #81222
fixes #81211
fixes #81201
fixes #81208

As part of this PR I'm also re-enabling all of the functionalization tests that got marked as flaky in CI (they're not actually flaky - I think they got marked because a PR that should have changed their expect-test output made it to master without the changes. I'll let CI run on this PR to confirm though).

reland of https://github.com/pytorch/pytorch/pull/80897
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82407
Approved by: https://github.com/ezyang
2022-08-02 18:03:29 +00:00
Shirong Wu
09059d9148 integrate plugin (#82395)
Differential Revision: D38162861

Pull Request resolved: https://github.com/pytorch/pytorch/pull/82395
Approved by: https://github.com/frank-wei
2022-08-02 00:41:36 +00:00
Angela Yi
e06d1029f7 [fx] Minor modifications to pass infra (#82485)
* Made PassBase calls optionally return PassResult since some passes
  might want to base inplace.
* Added additional documentation
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82485
Approved by: https://github.com/SherlockNoMad
2022-08-01 20:10:01 +00:00
Ying Zhang
a71d0e882c Add an op_lowering_disallow_list in fx splitter base class. (#82288)
Summary: ATT, so that we can control not to lower some specific ops.

Test Plan: Tested together with the next diff in stack.

Differential Revision: D38188836

Pull Request resolved: https://github.com/pytorch/pytorch/pull/82288
Approved by: https://github.com/mikeiovine, https://github.com/khabinov
2022-07-28 05:19:33 +00:00
PyTorch MergeBot
df36ccbd81 Revert "add a reinplacing FX pass (#80897)"
This reverts commit 3ef7a6921d.

Reverted https://github.com/pytorch/pytorch/pull/80897 on behalf of https://github.com/malfet due to broke windows trunk tests, see 3ef7a6921d
2022-07-27 22:32:03 +00:00
Brian Hirsh
3ef7a6921d add a reinplacing FX pass (#80897)
Adds a "reinplacing" FX transform, that goes through an FX graph and tries to convert out-of-place op calls into inplace calls whenever possible.

Followups from this PR include:
- Set up torch bench, and run the whole torchbench suite using AOTAutograd + functionalize + rein placing transforms to surface any issues (this is what I'm currently working on). Right now, I have some basic unit tests just to sanity check that the general logic makes sense.
- Add any missing inplace ops. This is mostly the `*_scatter*` ops, e.g. `diagonal_scatter_`, because these ops will commonly show up an FX graph after running functionalization.

The criteria for when you can swap an op `b = a.add(...)` with `a.add_(...)` is:
(1) An inplace variant of the operator with the same schema needs to exist (`aten.add` -> `aten.add_`)
(2) `a` (**or any of its aliases**) can't be used as an input to any other operators later on in the graph
(3) `a` can't be one of the inputs to the entire graph. It also can't be an **alias** of any of the inputs ***

*** One thing to note: (3) means that we can't technically guarantee that we'll get back **all** memory usage that we lost from functionalization. Functionalization converts input mutations into out-of-place calls, and then adds a `copy_()` to the end of the graph to preserve semantics.

I added logic to handle `copy_()` in this PR because it it's a pretty important optimizations in the context of `functionalization()`: any program that performs input mutations will have a `copy_()` in it after running functionalization.

There are some examples in the test file, but I think staring at an example of where re-inplacing is/isn't allowed to run is helpful:
```
// Before functionalization
def foo(a):
    tmp1 = a.add_(1)
    tmp2 = a.add(2)

// After functionalization
def foo(a)
    tmp1 = a.add(1)
    tmp2 = a.add(2)
    ....
    a.copy_(tmp1)

// After re-inplacing
def foo(a)
    // first add() is safe to re-inplace even though a is a program input,
    // because a's data is overwritten later by a copy_()
    tmp1 = a.add_(1)
    // second add() is NOT safe to re-inplace, because:
    // (1) a and tmp1 are aliased. Note that they weren't aliased in the original program,
             but they are now that we've done some re-inplacing.
    // (2) tmp1 is used as an input later in the program
    tmp2 = a.add(2)
    ....
    a.copy_(tmp1)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/80897
Approved by: https://github.com/ezyang
2022-07-27 19:11:15 +00:00
Sherlock Huang
dc3c1ade4b Some fixes for FX pass with nvFuser backend (#81911)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81911
Approved by: https://github.com/jjsjann123, https://github.com/IvanYashchuk, https://github.com/davidberard98
2022-07-22 19:49:33 +00:00
Edward Z. Yang
3c2c2cc947 cudagraphs dynamo backend (#80566)
This backend handles cases where the preexisting cuda graphs
implementation from dynamo is unsound/has errors.

Requires this functorch bug fix: https://github.com/pytorch/functorch/pull/935

Signed-off-by: Edward Z. Yang <ezyangfb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80566
Approved by: https://github.com/ngimel, https://github.com/wconstab
2022-07-22 14:06:07 +00:00
Shangdi Yu
c52ee6dc0a CSE Pass and common pass Tests (#81742)
Test cases for CSE Pass and common passes
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81742
Approved by: https://github.com/SherlockNoMad
2022-07-22 03:45:09 +00:00
Sherlock Huang
43e7fee764 [Reland] Recursively print graph module and its submodule (#81639)
ghstack-source-id: fcfc024c440981ee3fe3537a5816089eadf2cc13
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81080

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/81639
Approved by: https://github.com/ezyang
2022-07-21 16:58:25 +00:00
Shangdi Yu
7c5dac5228 Dialect agnostic CSE Pass (#81530)
Fixes comments in https://github.com/pytorch/pytorch/pull/81512

- banned ops is an input to the pass
- update the fx/readme.md to include this file for better discoverability
- use make_fx in torch repo
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81530
Approved by: https://github.com/SherlockNoMad
2022-07-20 00:56:41 +00:00
PyTorch MergeBot
4035a53cca Revert "Recursively print graph module and its submodule (#81080)"
This reverts commit fe7262329c.

Reverted https://github.com/pytorch/pytorch/pull/81080 on behalf of https://github.com/DanilBaibak due to Break internal build
2022-07-18 14:46:26 +00:00
Sherlock Huang
fe7262329c Recursively print graph module and its submodule (#81080)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81080
Approved by: https://github.com/ezyang
2022-07-18 01:19:03 +00:00
Sherlock Huang
d625637c7c Include aten.where.self in NvFuserOperatorSupport (#81436)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81436
Approved by: https://github.com/davidberard98
2022-07-16 03:29:27 +00:00
Shangdi Yu
938643b8bc CSE_Pass (#81512)
Migrate the CSE pass in functorch to pytorch

Pull Request resolved: https://github.com/pytorch/pytorch/pull/81512
Approved by: https://github.com/angelayi
2022-07-15 02:32:48 +00:00
Angela Yi
3d0b0b2f9b [fx] PassManager changes (#80531)
PassManager is a class used to run multiple passes on a given graph module.

Class Attributes
* `passes: List[Callable]`: A list of callable passes
* `constraints: List[Callable]`: A list of constraints
* `run_checks_after_each_pass`: Flag for running checks each pass

Class Methods:
* `__call__(graph_module: DispatchGraphModule)`:
    * Runs the passes based on the list of passes until the graph stops changes, or until `steps` number of times.
    * Each time a pass is run, it will check that the graph module still maintains the required invariants by calling `check()` and will lint the graph to check that it’s well formed if the flag `run_checks_after_each_pass` is set.
* `check(graph_module: DispatchGraphModule)`: Runs various checks on the given graph module to make sure that it contains the needed data for passes
* `add_check(check: Callable)`: Adds the `check` function to the given pass manager instance
* `add_constraint(constraint: Callable)`: Adds a constraint to the current list of constraints

We can create a PassManager and run it by doing:
```
PassManager(passes=[pass1, pass2])(graph_module)
```

Differential Revision: [D37523159](https://our.internmc.facebook.com/intern/diff/D37523159)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80531
Approved by: https://github.com/SherlockNoMad
2022-07-15 00:58:43 +00:00
jjsjann123
cc67a92e74 fixing call_module on subscripting into generator (#81258)
named_modules() return a generator, which is not subscriptable and causes node support query to fail
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81258
Approved by: https://github.com/SherlockNoMad
2022-07-14 16:41:18 +00:00
Angela Yi
614779f975 [fx] PassResult (#81366)
Passes should now return a `PassResult` which (for now) contain the following fields:
* `graph_module`: The graph module modified during the pass
* `modified`: A flag for if the graph module has been modified
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81366
Approved by: https://github.com/SherlockNoMad
2022-07-13 02:03:11 +00:00
Sherlock Huang
6b280e880a Update NvFuserOperatorSupport (#81311)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81311
Approved by: https://github.com/davidberard98
2022-07-12 21:19:37 +00:00
Sherlock Huang
fc10a63727 Prims+NvFuser Backend Prototype (#80591)
This PR integrates FX graph partitioner + Aten2Prims DecompositionInterpreter + Prims' TraceExecutor + naive caches for nvFuser.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80591
Approved by: https://github.com/jjsjann123, https://github.com/ezyang
2022-07-08 19:53:03 +00:00
anjali411
4bf076e964 Add __all__ to torch.distributed, futures, fx, nn, package, benchmark submodules (#80520)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80520
Approved by: https://github.com/rohan-varma
2022-07-08 14:31:24 +00:00
Drazen Borkovic
9402219a36 Move serialize_module() out of OSS graph_manipulation.py to internal (#80785)
Differential Revision: D37582495

Pull Request resolved: https://github.com/pytorch/pytorch/pull/80785
Approved by: https://github.com/jfix71
2022-07-05 23:39:13 +00:00
Riley Dulin
d579838eb5 [torch][fx] Add ignore_parameters_and_buffers kwarg to FxGraphDrawer (#79982)
Summary:
Add an `ignore_parameters_and_buffers` parameter which will tell the graph drawer
to leave off adding parameter and buffer nodes in the dot graph.

This is useful for large networks, where we want to view the graph to get an idea of
the topology and the shapes without needing to see every detail. Removing these buffers
de-clutters the graph significantly without detracting much information.

Reviewed By: jfix71

Differential Revision: D37317917

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79982
Approved by: https://github.com/jfix71
2022-06-29 22:48:43 +00:00
Sherlock Huang
ac5a94789f Refactor lift_subgraph_as_module as a fx.passes.util function (#80292)
lift_subgraph_as_module can be shared between fuser_utils.py and spliter_utils.py
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80292
Approved by: https://github.com/jjsjann123, https://github.com/842974287
2022-06-29 22:35:39 +00:00
PyTorch MergeBot
58532256e9 Revert "Add __all__ for torch.distributed and fx modules (#80460)"
This reverts commit 5d40c3d5c8.

Reverted https://github.com/pytorch/pytorch/pull/80460 on behalf of https://github.com/malfet due to Broke MacOS testing, see https://github.com/pytorch/pytorch/runs/7105579664?check_suite_focus=true
2022-06-29 16:20:55 +00:00
anjali411
5d40c3d5c8 Add __all__ for torch.distributed and fx modules (#80460)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80460
Approved by: https://github.com/albanD, https://github.com/rohan-varma
2022-06-29 02:53:56 +00:00
anjali411
f68f77610a Add __all__ to torch.nn.quantized, fx.passes, ao.nn and amp submodules (#80376)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80376
Approved by: https://github.com/albanD
2022-06-27 21:36:27 +00:00
anjali411
3bcc19b29a Add __all__ to various submodules in torch.fx, distributions, distributed, package (#80367)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80367
Approved by: https://github.com/albanD
2022-06-27 21:27:30 +00:00
Sherlock Huang
752c06e0e1 FX graph partitioner and fuser (#79439)
This PR introduces two components.

CapabilityBasedPartitioner for FX graph: given a list of supported operators, this partitioner tries to forms the largest subgraphs that only contain the supported ops.

Fuser utility: given a list of nodes in FX graph, it lifts them as a sub-GraphModule in the original graph.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79439
Approved by: https://github.com/jjsjann123, https://github.com/davidberard98
2022-06-24 18:49:37 +00:00
Oleg Khabinov
848af37209 Debug small ACC subgraphs elimination (#80117)
Reviewed By: yinghai

Differential Revision: D37368729

Pull Request resolved: https://github.com/pytorch/pytorch/pull/80117
Approved by: https://github.com/yinghai, https://github.com/houseroad
2022-06-23 18:45:24 +00:00
Yinghai Lu
5a559a547d [FX] fix split_util by using getattr_recursive instead of getattr (#80011)
Summary: If the model contains ModuleList, it's possible that we got some of the weight attributes as module.sub.0.weight. `getattr` doesn't work in this case and we have a dedicated function `getattrt_recursive` for that. Just use that.

Reviewed By: houseroad

Differential Revision: D37326955

Pull Request resolved: https://github.com/pytorch/pytorch/pull/80011
Approved by: https://github.com/houseroad
2022-06-23 03:35:46 +00:00
Angela Yi
8155753f8c Added PassBase implementation
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79878

Approved by: https://github.com/SherlockNoMad
2022-06-22 16:37:08 +00:00
Drazen Borkovic
f54098cd3e Create JSON from new FX IR and lower to LLVM (#77765)
Summary:
Replace TensorView objects with maps for JSONing.
Lower to LLVM.

Reviewed By: jaybean-dev, jfix71

Differential Revision: D36318989

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77765
Approved by: https://github.com/jfix71, https://github.com/jamesr66a
2022-05-20 03:20:57 +00:00
Jordan Fix
0c91efb64e [fx/graph_manipulation] Fix _update_weight_fused_dtypes (#77702)
Summary: D36335238 (18e36a6295) wasn't fully working due to previous impl which used op names for looking for matches. Instead use the FX graph itself.

Differential Revision: D36462875

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77702
Approved by: https://github.com/jamesr66a
2022-05-19 03:28:28 +00:00
Jordan Fix
18e36a6295 [graph_manipulation] Set fused dtypes for all constant params/buffers (#77401)
Summary: We were handling constant attrs in a few different ways before, leading to confusion and missed handing for fused dtypes. This diff consolidates some of that code and unbreaks current breakage.

Test Plan: CI. Recently broken tests now pass.

Differential Revision: D36335238

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77401
Approved by: https://github.com/jaybean-dev, https://github.com/jamesr66a
2022-05-17 07:42:29 +00:00
Shirong Wu
ea8a0184b7 Fix fuse_parallel_linear (#76202)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76202

move legalize_graph to a common tool class.

Reviewed By: yinghai, jfix71, 842974287

Differential Revision: D35694145

fbshipit-source-id: b044df3b46b3029c383581f7853a4338c2b13c62
(cherry picked from commit 49884d557d220f981f5f894bdcd381df749e3efb)
2022-04-22 18:59:07 +00:00
Alex Beloi
78ba87ec4b [fx][ShapeProp] make shapes and args/kwargs concrete for minimizer (#75291)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75291

As the title. Also adds debugging messaging to `ShapeProp.run_node`

Reviewed By: yuhc

Differential Revision: D34930081

fbshipit-source-id: ea4341ac2377b7b81404b14afeb5149d5556d92c
(cherry picked from commit 8a929a910c17cff69fac501c5b9260bb23f260e2)
2022-04-06 07:57:38 +00:00
Alex Beloi
fd2050900b [fx][1/2] add PassManager and refactor AFG/AGM (#74972)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74972

This diff
* adds PassManager and supporting logic

Test Plan:
CI and

```
buck test //caffe2/torch/fx/passes:test_pass_manager
```
```
Building: finished in 3.1 sec (100%) 124/124 jobs, 30/124 updated
  Total time: 3.7 sec
More details at https://www.internalfb.com/intern/buck/build/4f947267-671c-48bc-ad07-190e5a731d2d
BUILD SUCCEEDED
Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details.
Running with tpx session id: 1423fed7-4674-44ce-9b84-c634f28a0406
Trace available for this run at /tmp/tpx-20220309-144735.217835-1423fed7-4674-44ce-9b84-c634f28a0406/trace.log
RemoteExecution session id: reSessionID-1423fed7-4674-44ce-9b84-c634f28a0406-tpx
Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/6473924544097816
    ✓ ListingSuccess: caffe2/torch/fx/passes:test_pass_manager : 3 tests discovered (0.639)
    ✓ Pass: caffe2/torch/fx/passes:test_pass_manager - test_these_before_those_pass_constraint (caffe2.torch.fx.passes.tests.test_pass_manager.TestPassManager) (0.335)
    ✓ Pass: caffe2/torch/fx/passes:test_pass_manager - test_this_before_that_pass_constraint (caffe2.torch.fx.passes.tests.test_pass_manager.TestPassManager) (0.336)
    ✓ Pass: caffe2/torch/fx/passes:test_pass_manager - test_pass_manager_builder (caffe2.torch.fx.passes.tests.test_pass_manager.TestPassManager) (0.344)
Summary
  Pass: 3
  ListingSuccess: 1
If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users
Finished test run: https://www.internalfb.com/intern/testinfra/testrun/6473924544097816
```

Reviewed By: yuhc, wushirong

Differential Revision: D31484770

fbshipit-source-id: 7a8cde4c23727ff612bf7bf0d7b7db5d0c47b1a9
(cherry picked from commit c281c288fe870624574d34cfc93d732d4607f7d0)
2022-04-01 09:12:47 +00:00
Chao Gu
bdf468b94d [FX] Fix type of argument min_acc_module_size (#74891)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74891

As title, otherwise the below error is thrown:
```
TypeError: '>=' not supported between instances of 'int' and 'str'
```

Test Plan: easy

Reviewed By: jackm321

Differential Revision: D35206473

fbshipit-source-id: 200c83b9a19b6aae6f0da03abe99121e55893fd3
(cherry picked from commit 20744d2ce59ea07ecdb2570929dd5344c65b751a)
2022-03-29 17:48:32 +00:00
James Reed
214951bc6b [FX] Make split_module preserve proper placeholder names (#74736)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74736

Previously, `split_module` would incorrectly carry over the `name` of placeholders rather than their `target`:

Original GraphModule

```
def forward(self, x, **kwargs):
    _kwargs = kwargs
    getitem = _kwargs['foo'];  _kwargs = None
    add = x + getitem;  x = getitem = None
    return add
```

After splitting:

```
def forward(self, x, _kwargs):
    submod_0 = self.submod_0(_kwargs);  _kwargs = None
    submod_1 = self.submod_1(x, submod_0);  x = submod_0 = None
    return submod_1
```

Notice that `**kwargs` is turned into `_kwargs`, which is incorrect and we lose the kwarg expansion behavior. This patch switches the erroneous code in `split_module`, resulting in the correct split code being emitted:

Original GraphModule

```
def forward(self, x, **kwargs):
    _kwargs = kwargs
    getitem = _kwargs['foo'];  _kwargs = None
    add = x + getitem;  x = getitem = None
    return add
```

After splitting:

```
def forward(self, x, **kwargs):
    _kwargs = kwargs
    submod_0 = self.submod_0(_kwargs);  _kwargs = None
    submod_1 = self.submod_1(x, submod_0);  x = submod_0 = None
    return submod_1
```

Test Plan: Imported from OSS

Reviewed By: VitalyFedyunin

Differential Revision: D35137361

Pulled By: jamesr66a

fbshipit-source-id: 46d079cfe16093c293fc268404fb8bc86ffcf583
(cherry picked from commit a020066281856184621561a8672eb57f5de31e92)
2022-03-25 23:36:27 +00:00
Jerry Zhang
eaae62fed9 Make args work in the uru10x10_to_trt_eval script (#74707)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74707

att

Test Plan:
```
buck run mode/dev-nosan -c fbcode.split-dwarf=true -c fbcode.platform=platform009 accelerators/workloads/models/uru10x10:uru_10x10_to_trt_eval -- -h
```

Reviewed By: 842974287

Differential Revision: D34088069

fbshipit-source-id: 5c89d25db6493e0f66f7e57aac24ed72196d0378
(cherry picked from commit d9d79f03e28d609a14ddc3e55b97c52b0e102438)
2022-03-25 03:52:47 +00:00
Jordan Fix
e99e3fa580 [fx/graph_drawer] Add skip_node_names_in_args option, default to True (#73815)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73815

Add `skip_node_names_in_args` (default=`True`) which will skip including node names in args/kwargs during graph drawing.

Test Plan:
Default (`skip_node_names_in_args=True`):

{F707455583}

Vs. `skip_node_names_in_args=False`:

{F707046375}

Reviewed By: wushirong

Differential Revision: D34659144

fbshipit-source-id: 9f0bd7bee98dc1ca8eecdabc960804564d83777b
(cherry picked from commit a0ed64b51f0187115586f4001dc81148c7ed18b9)
2022-03-08 01:50:03 +00:00
Ke Wen
d14de3139a [PyTorch FX] Return mapping of qualified names from split_module() (#73564)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73564

While maintaining API backward compatibility, add an optional output parameter to split_module() that returns a mapping from the new qualified names in the modules after split to the old qualified names in the original module

Test Plan:
1. Added a test (test_split_qualname_mapping) to test_fx_experimental.py to check the returned qualname mapping
```
$ python test_fx_experimental.py
...
Ran 1084 tests in 73.464s
OK (skipped=531, expected failures=4)
```
2. Ask test_fx.py to accept split_module's new signature
```
$ python test_fx.py --accept
```

Reviewed By: jamesr66a

Differential Revision: D34541792

fbshipit-source-id: e8ec7e77ec884e4db7cad0c0593e31861c76e42d
(cherry picked from commit d2e5a95a353ee5fb52cdba065f127489e9df47ae)
2022-03-02 23:32:54 +00:00
Shiyan Deng
5e86505693 Move util functions to a more common place (#73519)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73519

Move `getattr_recursive()` and `setattr_recursive()` to fx main.

Test Plan: CI

Reviewed By: khabinov

Differential Revision: D34524723

fbshipit-source-id: a656e821d9dc1d446aa80cdc03a923bf0c05aeb5
(cherry picked from commit 4835965ac72d299487be14687823ea62394f4079)
2022-03-01 01:33:30 +00:00
Jordan Fix
b196e016a6 [fx/graph_drawer] Add args/kwargs and users (#73464)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73464

- Improve formatting of graph by centering everything
- Add num_users
- Add args/kwargs
  - Don't print more than 10 of any list/tuple by default (this is necessary for very large concats)

Test Plan: tested locally

Reviewed By: khabinov

Differential Revision: D34492256

fbshipit-source-id: 8073992edb3efddcf8bfd72e2d3db49cc242db10
(cherry picked from commit b1b802965c143fdb0d308b70f51aa741f7d90f78)
2022-02-26 11:29:39 +00:00
Jordan Fix
540cb5fee2 [graph_manipulation] Unpack list of outputs (#72940)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72940

att

Reviewed By: jackm321

Differential Revision: D34282062

fbshipit-source-id: 743710c18e1f38286d1b91c91868bb22c760f3ca
(cherry picked from commit fd2bdd189d)
2022-02-17 06:38:52 +00:00
Huamin Li
32dd4a8639 move fx_acc out of pytorch core (#72803)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72803

as title

Reviewed By: jfix71

Differential Revision: D34101788

fbshipit-source-id: a9fd84671929af21405c049603e9895ec68de3d8
(cherry picked from commit e98fd1c32d)
2022-02-15 16:13:43 +00:00
Jordan Fix
4737ae7a16 [tools_common] Don't remove underscores from call_module targets in get_acc_ops_name (#72664)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72664

Test Plan: CI.

Reviewed By: wushirong

Differential Revision: D34148357

fbshipit-source-id: 9c75aaeae59461d7550fb00c6f98c879e98274f6
(cherry picked from commit 553525698a)
2022-02-11 08:32:10 +00:00
Shiyan Deng
2afed243b5 [fx2trt] remove split.py (#71933)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71933

Add the functionalities provided by split.py to splitter_base.
- Propagate submodule inputs
- Create SplitResult to hold the split results.
Then removed split.py, to me this makes navigating the lowering code a bit easier.

Added default split and trace function for use.

Next step is to add better error handling for each stage during lowering and create unit tests for each stage. I'll probably make some bootcamp tasks for unit tests.

Test Plan: CI

Reviewed By: frank-wei, wushirong

Differential Revision: D33794322

fbshipit-source-id: f991893047a3701177f54cf22d9a6e48e0529472
(cherry picked from commit 1f3e13efba)
2022-02-08 03:31:25 +00:00
Alban Desmaison
6451e525e4 Revert D31316086: [fx-acc] PassManager
Test Plan: revert-hammer

Differential Revision:
D31316086 (aa4f048de9)

Original commit changeset: 4302c39e221c

Original Phabricator Diff: D31316086 (aa4f048de9)

fbshipit-source-id: 27554f001705d35ba38152fa36445d384b347b00
(cherry picked from commit 386c492d66)
2022-02-04 20:02:54 +00:00
Alex Beloi
aa4f048de9 [fx-acc] PassManager (#67261)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67261

Adds a `Pass(callable)`, `PassManager`, `PassConstraint(callable)`, `PassManagerBuilder`.

The idea is that a `Pass` modifies an IR in-place. `PassConstraint`s define a partial ordering on `Pass`s as a less than callable. `PassManager` manages the collection of `Pass`s, `PassConstraint`s and ensures validation before execution. `PassManagerBuilder` creates `PassManager`s (example usage in follow-up diff).

These are very loosely typed, so could be applied to different IRs as well as transformation between IRs.

Test Plan:
```
buck test mode/opt //caffe2/torch/fx/passes:test_pass_manager
```
```
ore details at https://www.internalfb.com/intern/buck/build/210
BUILD SUCCEEDED
Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details.
Running with tpx session id: c635415b-cdc4-4574-9739-a16d2b93ad3a
Trace available for this run at /tmp/tpx-20220203-114748.608700/trace.log
RemoteExecution session id: reSessionID-c635415b-cdc4-4574-9739-a16d2b93ad3a-tpx
Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/1970324927640328
    ✓ ListingSuccess: caffe2/torch/fx/passes:test_pass_manager : 3 tests discovered (0.332)
    ✓ Pass: caffe2/torch/fx/passes:test_pass_manager - test_this_before_that_pass_constraint (caffe2.torch.fx.passes.tests.test_pass_manager.TestPassManager) (0.232)
    ✓ Pass: caffe2/torch/fx/passes:test_pass_manager - test_these_before_those_pass_constraint (caffe2.torch.fx.passes.tests.test_pass_manager.TestPassManager) (0.231)
    ✓ Pass: caffe2/torch/fx/passes:test_pass_manager - test_pass_manager_builder (caffe2.torch.fx.passes.tests.test_pass_manager.TestPassManager) (0.231)
Summary
  Pass: 3
  ListingSuccess: 1
If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users
Finished test run: https://www.internalfb.com/intern/testinfra/testrun/1970324927640328
```

Reviewed By: jfix71, kflu

Differential Revision: D31316086

fbshipit-source-id: 4302c39e221cfa43e2b2eda9f26d6d78da4db0f1
(cherry picked from commit 13c981ab00)
2022-02-04 19:07:42 +00:00
Shirong Wu
e03c3dd150 Add leaf module code example (#72100)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72100

Facebook :
enable splitter to properly read the leaf module specified by acc_tracer leaf module list, and parse leaf module as run_on_acc if customize leaf module converter is provided.
add scratch board for customize leaf module converter and example code for std_conv2d_same converter.

Reviewed By: jfix71

Differential Revision: D33698402

fbshipit-source-id: 01ce84ee1543f0fb8a8899256530ef1300797417
(cherry picked from commit 1357b2d528)
2022-02-03 02:07:00 +00:00
Yinghai Lu
4c62ffa11e Improve fx2trt benchmark (#72145)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72145

- Added a predicate that allows us not to lower nodes with specific names.
- Added an observer function to help with the debugging

Reviewed By: jasonjk-park, houseroad

Differential Revision: D33785834

fbshipit-source-id: 7bdb7f33851da1118763c85f8e2121d01e4914a2
(cherry picked from commit 4e2268ed45)
2022-02-02 22:41:58 +00:00
Tomomas
b2b63209e1 make code simplify in get bufffers and parameters (#70399)
Summary:
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70399

Reviewed By: george-qi

Differential Revision: D33823535

Pulled By: khabinov

fbshipit-source-id: 8d1e49595da1f5cc14db7634a8c27556b02a5361
(cherry picked from commit 78bbb53614)
2022-01-28 19:15:56 +00:00
Yinghai Lu
e5794974cb [acc_tracer] Do not rewrite the leaf modules (#71790)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71790

If a leaf module is specified, it means we should treat it as a blackbox and we should just avoid rewriting it too.

Test Plan:
```
buck test caffe2/test:test_fx_acc_tracer
```
with a new unit test.

Reviewed By: jfix71, houseroad, wushirong

Differential Revision: D33731903

fbshipit-source-id: 0560d9e8435b40f30d9b99dc3b2f47d1a04eb38b
(cherry picked from commit 747e9e44ee)
2022-01-26 07:32:04 +00:00
Shirong Wu
84d4087874 Fix trt const_fold as output use case (#71194)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71194

Reviewed By: jfix71, khabinov

Differential Revision: D33541168

fbshipit-source-id: dd5787430b272977963323a6ce38b3e15e979278
2022-01-12 16:57:19 -08:00
James Reed
de902b5d02 [FX] Add a default_value arg to Graph.placeholder and fix split_module (#71016)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71016

I found out that `split_module` doesn't preserve default values for arguments. In trying to fix that, I noticed that `Graph.placeholder` doesn't make it easy to add a default argument when making a placeholder. This PR addresses both of those issues

Test Plan: Imported from OSS

Reviewed By: ansley

Differential Revision: D33482218

Pulled By: jamesr66a

fbshipit-source-id: 57ebcdab25d267333fb1034994e08fc1bdb128ee
2022-01-12 14:03:17 -08:00
Shiyan Deng
54fe2741a1 [fx2trt] break down div (#71172)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71172

Break down div to smaller ops to make those div ops look like all other elementwise ops.

Use operator div ops instead of torch div if possible to avoid converting literal numbers to torch tensor (like in the following).
```
a = 1
b = 2

// `c` would be 0.5
c = a / b

// `c` would be torch.tensor([0.5])
c = torch.div(a, b)
```

The problem we saw on shufflenet is that there's size op followed by a div op which results in int64 tensors in acc traced graph (acc tracer turns operator.div to acc_ops.div which uses torch.div). And trt splitter splits out the reshape op that consumes the div op because we have a rule to split out ops that takes in int64 tensors as inputs.

Test Plan: Unit tests.

Reviewed By: wushirong

Differential Revision: D33482231

fbshipit-source-id: 508a171520c4e5b4188cfc5c30c1370ba9db1c55
2022-01-12 09:46:46 -08:00
Mostafa Elhoushi
3f53365086 define get_dot_graph (#70541)
Summary:
In the [docstring](https://github.com/pytorch/pytorch/blob/master/torch/fx/passes/graph_drawer.py#L54-L60) we mention `get_dot_graph but it is not defined, so I defined it here.
Not sure if this is preferred, or should we update the docstring to use `get_main_dot_graph`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70541

Test Plan:
```
            g = FxGraphDrawer(symbolic_traced, "resnet18")
            with open("a.svg", "w") as f:
                f.write(g.get_dot_graph().create_svg())
```

Reviewed By: khabinov

Differential Revision: D33378080

Pulled By: mostafaelhoushi

fbshipit-source-id: 7feea2425a12d5628ddca15beff0fe5110f4a111
2022-01-05 20:00:20 -08:00
Shirong Wu
998daf44d6 All get_attr node to be in64 type (#68818)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68818

Operator Support was blocking all node with dtype int64 from lowering. This diff ease the condition, for input from get_attr node(which are known not gonna be used for trt compute) to have dtype int64.

Reviewed By: brad-mengchi, 842974287

Differential Revision: D32609457

fbshipit-source-id: ea255f3281349a4254cb6abdeed671ab2c0216ba
2021-11-23 15:21:47 -08:00
Shirong Wu
7d38768d84 Rename splitter result (#68303)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68303

Result of splitter is run on either accelerator or directly on gpu, rename gpu part graph to run_on_gpu

Test Plan: buck test mode/opt caffe2/test:trt_tools_test

Reviewed By: 842974287

Differential Revision: D32392492

fbshipit-source-id: b085376c00c1097752e856e22c631d74a0fbc38f
2021-11-18 09:04:30 -08:00
Shirong Wu
799ebce3aa Add algo recorder/replayer to lower.py (#68194)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68194

Add algorithm recorder/replayer to lower.py

Reviewed By: yinghai

Differential Revision: D31909575

fbshipit-source-id: 552f2ba4fbd6ea646316f6412d55416a76e1f69a
2021-11-11 21:22:22 -08:00
Shirong Wu
69adbc8778 Fix splitter_base and add unit test for trt splitter (#67569)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67569

Splitter_base has assumption that the first subgraph after split must be cpu subgraph if there exists cpu node. This is wrong, start subgraph should be determined by which subgraph has 0-dep node.
Also add unit test for splitter.

Reviewed By: yinghai

Differential Revision: D32012549

fbshipit-source-id: e2639ccd7774b4295ca05c2ddbefff9726702b3f
2021-10-29 18:51:59 -07:00
Alex Beloi
74849d9188 [acc_shape_inference] add shape inference for quantize_per_channel (#66562)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66562

Adding shape inference for `acc_ops.quantize_per_channel`, and fixing some bugs.

Bugs were related to the fact that `quantize_per_channel` arguments `scales` and `zero_points` take tensors, so when we fetch the values (which needs to be done using `.tolist()` instead of `.item()`) we may get either a list or a scalar value.

Test Plan:
# Test Quantized Resnet
From sandbox with GPU that supports quantized types (tested with V100)
`buck run mode/opt -c python.package_style=inplace caffe2:fx2trt_quantized_resnet_test`
Output
```
...
[TensorRT] INFO: [MemUsageSnapshot] Builder end: CPU 0 MiB, GPU 1548 MiB
[TensorRT] INFO: [MemUsageSnapshot] ExecutionContext creation begin: CPU 0 MiB, GPU 1548 MiB
[TensorRT] VERBOSE: Using cublasLt a tactic source
[TensorRT] WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.5.1 but loaded cuBLAS/cuBLAS LT 11.1.0
[TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 0, GPU 1556 (MiB)
[TensorRT] VERBOSE: Using cuDNN as a tactic source
[TensorRT] INFO: [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 0, GPU 1564 (MiB)
[TensorRT] WARNING: TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.0.5
[TensorRT] VERBOSE: Total per-runner device memory is 23405056
[TensorRT] VERBOSE: Total per-runner host memory is 73760
[TensorRT] VERBOSE: Allocated activation device memory of size 154140672
[TensorRT] INFO: [MemUsageSnapshot] ExecutionContext creation end: CPU 0 MiB, GPU 1736 MiB
trt fp16 time (ms/iter) 1.252899169921875
trt int8 time (ms/iter) 1.3774776458740234
trt implicit int8 time (ms/iter) 1.3835883140563965
PyTorch time (CUDA) (ms/iter) 4.34483528137207
PyTorch time (CPU) (ms/iter) 55.687150955200195
[TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 0, GPU 1918 (MiB)
[TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 0, GPU 1866 (MiB)
[TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 0, GPU 1738 (MiB)
WARNING: Logging before InitGoogleLogging() is written to STDERR
W1012 12:07:23.556475 711816 DynoConfigLoader.cpp:32] Failed to read config: No dyno config client
```

# Test shape inference
`buck test mode/opt glow/fb/fx/acc_tracer:test_acc_shape_inference`
Output
```
...
Summary
  Pass: 95
  ListingSuccess: 1
If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users
Finished test run: https://www.internalfb.com/intern/testinfra/testrun/1407375092088240
```

Reviewed By: jfix71, jerryzh168

Differential Revision: D31457323

fbshipit-source-id: 8ccc4a9b0ca655fb30838e88575aff2bf3a387a6
2021-10-13 21:03:08 -07:00
Patrick Spencer
9fb6ba24e7 Update torch.fx.passes.split_module docstring (#65542)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65542

Add docstring for torch.fx.passes.split_module that conforms to Google Python Style conventions.

Changed original example to the example from this diff:
https://www.internalfb.com/diff/D24925283 (9734c042b8)

Test Plan:
Ran buck test //caffe2/test:fx. No errors detected
https://pxl.cl/1QCch

Reviewed By: jamesr66a

Differential Revision: D31145694

fbshipit-source-id: 8e54f3b1be3dca1c4d414fdeeab71b9f2b5d9f3e
2021-10-07 10:37:10 -07:00
Jordan Fix
592481a5cc [fx][const_fold] Refactor to use base split module to simplify, and correctly handle non-single-Tensor outputs (#65933)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65933

We use `split_module` to split the input model that we want to const fold into const and non-const subgraphs. Previously we were taking the non-const graph and trying to hack it back into the same signature as the input model. However this was complex/buggy.

Instead, refactor to just keep using the base split module that contains both const and non-const graphs. This means we:
- Inline the non-const graph into the split module
- Remove the const graph from the module and replace it with a getattr that will be run to insert that attr when we `run_folding`

Test Plan: Added test coverage to cover newly supported folding, and updated other tests for new strategy.

Reviewed By: yinghai

Differential Revision: D31293307

fbshipit-source-id: 6e283a8c7222cf07b14e30e74dffc8ae5ee8b55f
2021-10-01 10:26:29 -07:00
Jerry Zhang
3d6d4f4322 [fx2trt][quant] Add lowering support for per channel quantization in fx2trt (#64787)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64787

This PR added support for lowering per channel quantization and dequantization operators
in fx2trt, this also extends TensorMeta with extra arguments corresponding to per channel quantized Tensors,
initially I was thinking of adding a qpram that can capture everything, but currently we still have some lowering support
for fbgemm ops (which has scale and zero_point in operator interface). I think we can move everything to qprams
after we deprecate lowering support for fbgemm ops in the future.

Test Plan:
Test for per channel weight:
```
python torch/fx/experimental/fx2trt/example/quantized_resnet_test.py
```

change BC compatibility test expect for TensorMeta
```
python test/test_fx.py TestFXAPIBackwardCompatibility.test_class_member_back_compat --accept
```

Imported from OSS

Reviewed By: jfix71, mrshenli, 842974287

Differential Revision: D30879848

fbshipit-source-id: 76c3804bb1d9343183ae53d9f02c1a3bf6c79e1c
2021-09-30 18:54:14 -07:00
Kefei Lu
d4d3bb91f9 Refactor OperatorSupport related code and fix TRT not supporting int64 dtype (#65848)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65848

This diff includes:

* [fix]: The initialization of `OperatorSupport._support_dict` makes it a class variable, so we need to move its initialization into constructor.
* Add abstract class (more of an interface) `OperatorSupportBase`, since `OperatorSupport`'s purpose is too specific.
* [refactor]: what `TRToperatorSupport` really does is to populate a `OperatorSupport._support_dict`, so there really is no reason for subclassing. So removing it, and changing it to instantiating a `OperatorSupport` with properly populated `_support_dict`.
* Add a framework for defining simple and basic op support logic, and composing them into more complex ones:
    1. `create_op_support` wraps a function into a `OperatorSupportBase` instance
    2. `chain` can combine several simple `OperatorSupportBase` into more complex ones
    3. `OpSupports` provides a set of pre-defined, simple `OperatorSupportBase` that can be composed together using `chain`.
        1. Currently the only pre-defined one is `decline_if_input_dtype(..)`, which declares a node non-supported, if its args are of user specified dtype
* Fix `TRTOperatorSupport` so that it not only looks for registered converters, but also decline a node if its arg is of int64

Test Plan: linter and CI

Reviewed By: 842974287

Differential Revision: D31275525

fbshipit-source-id: bbc02f7ccf4902a7912bb98ba5be2c2fbd53b606
2021-09-30 13:36:26 -07:00
Kefei Lu
911d01c1de type annotate operator_support (#65136)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65136

Opportunistically add type annotation for operator_support.py

Test Plan: run linter, CI

Reviewed By: yinghai

Differential Revision: D30928464

fbshipit-source-id: 615c75152b9938792f03cdceb2a113bda6ab28c7
2021-09-29 10:38:47 -07:00
James Reed
0559cb37cd [FX] Ensure BC coverage for all of torch.fx.passes (#65081)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65081

Test Plan: Imported from OSS

Reviewed By: jbschlosser, khabinov

Differential Revision: D30967428

Pulled By: jamesr66a

fbshipit-source-id: 2ff83da728dc469f086cf504e71b43396db612d8
2021-09-17 09:32:43 -07:00
James Reed
cf7409e184 [FX] Move graph_manipulation and param_fetch out of experimental and into passes (#65183)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65183

ghstack-source-id: 138309655

Test Plan: waitforsadcastle

Reviewed By: protonu

Differential Revision: D31007630

fbshipit-source-id: 77d14b284737aabbe2b9e6394177a0c2e40aafba
2021-09-17 09:32:40 -07:00
James Reed
874f9bd509 [FX] Gate FXGraphDrawer on whether pydot is installed (#65088)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65088

Test Plan: Imported from OSS

Reviewed By: khabinov

Differential Revision: D30967951

Pulled By: jamesr66a

fbshipit-source-id: dba2f13a47889b3d4187de925b4fe74ee90b7f79
2021-09-16 10:04:33 -07:00
Kefei Lu
adbcc819cd Fix fx2trt SplitterBase non_tensor_input logic (#64286)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64286

During graph splitting, `_SplitterBase` supports taking into consideration whether the subnet boundary nodes
produces "supported" outputs that will cross the acc/non-acc boundary. Specifically, if the backend only
supports Tensor-based data passing cross boundary, then we cannot split the graph at a place where the node
output is a non-Tensor type (e.g., `Tuple[Tensor]`).

There's currently a bug in this logic that it does not correctly detect the output type of a Node. Instead of
using `Node.meta['tensor_meta']`, we should instead check `Node.meta['type']`.

`Node.meta['tensor_meta']` is not appropriate because this key will exist if the node output is an iterable
and one of the element is of type `Tensor`. So `Tuple[Tensor]` will be wrongly considered "supported".

Test Plan:
arc lint
run CI tests

Reviewed By: yinghai, 842974287

Differential Revision: D30617147

fbshipit-source-id: e8ba70dfaddc05cafb8037d58fca73b7ccbb1a49
2021-09-07 04:02:29 -07:00
James Reed
538647fe1f [WIP][FX] BC guarantees for 1.10 (#63888)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63888

Test Plan: Imported from OSS

Reviewed By: pbelevich

Differential Revision: D30523133

Pulled By: jamesr66a

fbshipit-source-id: b04cc0d842a74862f42ecba98b757310cd2ec7b0
2021-08-30 19:56:46 -07:00
Kefei Lu
5757d03145 Add logging for _MinimizerBase
Summary: Add logging so we know which nodes are currently being visited

Test Plan: lint & SC tests

Reviewed By: 842974287

Differential Revision: D30509865

fbshipit-source-id: 09e77e44c97c825242e0b24f90463b50f3ca19c6
2021-08-26 00:52:58 -07:00
Oleg Khabinov
a0c1c7e5d4 Fixing the case when starter nodes depend on get_attr node (#62234)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62234

There was a typo that we caught until recently, thus making this fix.

Reviewed By: 842974287

Differential Revision: D29924190

fbshipit-source-id: ee6259fcd41358aefe9680b419acc87c0c2821cb
2021-07-27 10:29:53 -07:00
Shiyan Deng
cc18654d66 [fx_acc] Refactoring acc_tracer (#61963)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61963

Test Plan: CI

Reviewed By: jfix71

Differential Revision: D29772522

fbshipit-source-id: 4b117735147624f9428b933ea798495823423a0e
2021-07-21 20:09:15 -07:00
Zeina Migeed
4e2fe9718d flatten operation (resnet50) (#61265)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61265

Test Plan: Imported from OSS

Reviewed By: jamesr66a

Differential Revision: D29626383

Pulled By: migeed-z

fbshipit-source-id: 107769fc14f1fad295a93a10e84235f25ae17357
2021-07-16 16:06:10 -07:00
Malay Bag
287c0ab170 [FX] Add requires_grad to TensorMetadata (#60972)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60972

For PyTorch model memory requirement calculation, requires_grad is needed. Output tensors with requires_grad are saved in module context and increases memory during forward pass.

Test Plan: Existing test cases

Reviewed By: jamesr66a

Differential Revision: D29024932

fbshipit-source-id: def990f8c6ff6fa4537bfc377c646b9d44464ebd
2021-06-29 23:07:27 -07:00
Philip Meier
d5988c5eca remove unused type: ignore directives (#60006)
Summary:
During development it is common practice to put `type: ignore` comments on lines that are correct, but `mypy` doesn't recognize this. This often stems from the fact, that the used `mypy` version wasn't able to handle the used pattern.

With every new release `mypy` gets better at handling complex code. In addition to fix all the previously accepted but now failing patterns, we should also revisit all `type: ignore` comments to see if they are still needed or not. Fortunately, we don't need to do it manually: by adding `warn_unused_ignores = True` to the configuration, `mypy` will error out in case it encounters an `type: ignore` that is no longer needed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60006

Reviewed By: jbschlosser, malfet

Differential Revision: D29133237

Pulled By: albanD

fbshipit-source-id: 41e82edc5cd5affa7ccedad044b59b94dad4425a
2021-06-18 07:23:31 -07:00
Oleg Khabinov
0d7d316dc1 [fx ir] Support lists and dicts in FX IR GraphDrawer (#58775)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58775

Reviewed By: RoshanPAN

Differential Revision: D28613939

fbshipit-source-id: 4164e2dd772b59240ea3907001fe4ebddb003060
2021-06-10 01:55:53 -07:00
Jordan Fix
6d97a80dd2 [fx][graph_drawer] Improve graph drawer coloring and tensor_meta handling (#58699)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58699

Make `call_function`/`call_method` random colors based on their target name. This coloring is stable according to the name of the target. Also handle tensor_meta more elegantly for quantized types, including print q_scale/q_zero_point if they're used.

Test Plan: Tested locally

Reviewed By: chenccfb, 842974287

Differential Revision: D28580333

fbshipit-source-id: ad9961e1106a1bfa5a018d009b0ddb8802d2163c
2021-05-20 21:26:04 -07:00
Shiyan Deng
bcacf91a71 [fx_glow]Add Support for importing quantized linear in FXIRImporter (#57483)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57483

Pull Request resolved: https://github.com/pytorch/glow/pull/5622

Quantized linear has packed parameters. We want to unpack it so that it would be easier for graph optimization and importer to deal with the weight and bias. A customized remapping function is used to unpack quantized linear and map it to acc_op.linear.

Test Plan: `buck test glow/fb/fx/nnpi_importer:test_importer`

Reviewed By: gcatron, jfix71, khabinov

Differential Revision: D27451237

fbshipit-source-id: e46e961734788fd5333e227ca6143fd37c33204e
2021-05-14 18:48:31 -07:00
Shiyan Deng
9d56176034 Fix splitter and add a unittest (#58075)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58075

Pull Request resolved: https://github.com/facebookresearch/pytext/pull/1687

Reviewed By: mikekgfb

Differential Revision: D28357724

fbshipit-source-id: 36c2d211576a90107bc75468a39408ffecaeed43
2021-05-12 10:40:37 -07:00
Oleg Khabinov
36a22967b7 [fx ir] Handle the case when output consumes get_attr directly (#57844)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57844

Reviewed By: 842974287

Differential Revision: D28294298

fbshipit-source-id: db337fadca9f10f208324c9da6d95620178a189b
2021-05-10 22:04:43 -07:00
Aravind Kalaiah
747312bf61 Support for accumulate nodes traversal and to access op names in the compare function (#57685)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57685

- Accumulate traversal : `minimizer.settings.traverse_method = "accumulate" `
   - Feature
   - net_min_tests
- Return op name to the compare function so that we can map the cosine similarity to the individual ops
- Fix the settings combinations in net_min_tests

Test Plan:
buck test glow/fb/nnpi/lowering:net_min_tests

NNPI_LOG_LEVEL=5 USE_INF_API=1 buck run mode/opt -j 12 --config fbcode//cxx.link_weight=3 --config misc.strip_binaries=debug-non-line -c glow.nnpi_project_name='fb-nnpi-nextgen' ai_codesign/video/inference:xrayvideo_2019a_eval -- --job create --model_a model_prod --device_a PTCPU --trace_a none --model_b model_v3 --device_b NNPI --trace_b fusion --replace_b true --log_level INFO --use_scrambled false --save_repro false --num_ab_runs 0 --symbolic_trace_b true --save_modified_model_b false

USE_INF_API=1 buck test glow/fb/nnpi/lowering:net_min_tests

Reviewed By: 842974287

Differential Revision: D27867010

fbshipit-source-id: 6a756468b1f1fe24ef0400669d911825a7562484
2021-05-10 15:52:17 -07:00
Oleg Khabinov
73f22bcbf9 [fx ir] Handle cases in GraphDrawer when shape, type or stride are not present (#57845)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57845

As title says

Test Plan: N/A

Reviewed By: 842974287

Differential Revision: D28295999

fbshipit-source-id: f2cbf80c468f13685b17bb396c1f48972744ced0
2021-05-07 17:24:48 -07:00
Shiyan Deng
d896d1f4ce [fx splitter] Fix fusion group utility (#57280)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57280

We've found an issue that fusion group would results in circular dependency. For example
```
a -> b -> c -> d
|              ^
+ -------------+

Only a has non tensor output and currently we would create a fusion group (a, b, d). This results in circular dependency because now the fusion group depends on c while c depends on the fusion group as well.
```

This diff implement the solution discussed before. When we add a node to fusion group, we add all the nodes that are in the middle of the fusion group and this newly added node.

Use the same logic in minimizer to build fusion group.

Test Plan: split_tests and net_min_tests

Reviewed By: khabinov

Differential Revision: D27917432

fbshipit-source-id: a3d99fe5929dbc9f8eb0f45bccd83fd7b173795a
2021-04-30 10:18:01 -07:00
Shiyan Deng
a6fa6a6cda [fx minimizer] Add an option to minimizer to allow return all intermediate results (#57279)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57279

Added an option "return_intermediate". If true, when building the submodule we want to run , we will replace the output with all the nodes, so that intermediate results of all the nodes will be returned as output.

This is recommended to use with `run_node()` function.

Test Plan: `buck test glow/fb/nnpi/lowering:net_min_tests`

Reviewed By: khabinov

Differential Revision: D27913887

fbshipit-source-id: 5a3eab02da05214fb9adeb25656c267b58075b1d
2021-04-29 13:46:25 -07:00
Horace He
786b0a8091 [FX] fix normalization issues with lists of tensors (#57004)
Summary:
Fixes issue with lists of tensors not being normalized correctly.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57004

Reviewed By: jamesr66a

Differential Revision: D28034559

Pulled By: Chillee

fbshipit-source-id: f935f0b73a8356acd8a2ae93fcfc0417f0eab224
2021-04-27 20:02:00 -07:00
Shiyan Deng
45692fbef0 [fx splitter][fx net_min] Move Splitter, Minimizer and necessary deps to OSS (#56201)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56201

Refactor Splitter and Minimizer to superclass `_SplitterBase` and `_MinimizerBase` and move them to OSS. This is needed to create an OSS example of GPU lowering with those tools.

Test Plan: CI

Reviewed By: jackm321

Differential Revision: D27629598

fbshipit-source-id: 0d4da02105ca509b31f1a6c4a39b1122c2bc7bf0
2021-04-24 15:19:12 -07:00
Horace He
0df239e550 [FX] Make arg normalization a method on Node and not a pass (also augment tests to be exhaustive) (#55992)
Summary:
Commandeered from https://github.com/pytorch/pytorch/pull/54563

Primary changes from first PR:
1. Refactored primary `normalize_function` logic into `operator_schemas.py` so that non-FX users can use it.
2. Refactored tests a bit, and added a path to call `normalize_function` directly.
3. Moved check for `boolean_dispatch` so that `torch.lu` also gets properly handled.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55992

Reviewed By: mruberry

Differential Revision: D27774396

Pulled By: Chillee

fbshipit-source-id: 7f65632e1d608e4abd55aec5ccbfdc3f67f52b8e
2021-04-22 03:53:41 -07:00
Sam Estep
75024e228c Add lint for unqualified type: ignore (#56290)
Summary:
The other half of https://github.com/pytorch/pytorch/issues/56272.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56290

Test Plan:
CI should pass on the tip of this PR, and we know that the lint works because the following CI runs (before this PR was finished) failed:

- https://github.com/pytorch/pytorch/runs/2384511062
- https://github.com/pytorch/pytorch/actions/runs/765036024

Reviewed By: seemethere

Differential Revision: D27867219

Pulled By: samestep

fbshipit-source-id: e648f07b6822867e70833e23ddafe7fb7eaca235
2021-04-21 08:07:23 -07:00
James Reed
d02919dd50 [FX] Make shape_prop handle targets with aggregate outputs (#56221)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56221

Test Plan: Imported from OSS

Reviewed By: Chillee

Differential Revision: D27810693

Pulled By: jamesr66a

fbshipit-source-id: 17c6ad671786b3bacb5026bd88b8f5b7b4b96a1a
2021-04-16 18:58:25 -07:00
Jordan Fix
5eadc243f3 Preserve node meta info in split_module (#56212)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56212

The current design doesn't make it easy to use `node.copy()`. Explicitly copy over the node's meta.

Test Plan: Updated `test_subgraph_creation` in `test_fx_experimental`

Reviewed By: jamesr66a

Differential Revision: D27808477

fbshipit-source-id: 7fe7b6428c830307dbd1e395f16fa2774936d3b3
2021-04-16 18:02:50 -07:00
James Reed
2236f43da0 [FX] Put tensor metadata into a NamedTuple in ShapeProp (#55930)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55930

Test Plan: Imported from OSS

Reviewed By: ansley

Differential Revision: D27741730

Pulled By: jamesr66a

fbshipit-source-id: 0a0a1b94beed6c482add9e9551f316f3b4220ab2
2021-04-13 22:21:50 -07:00
James Reed
8bdea14cd3 [FX] Add memory_format to shape_prop (#55815)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55815

Test Plan: Imported from OSS

Reviewed By: pbelevich, ansley

Differential Revision: D27716342

Pulled By: jamesr66a

fbshipit-source-id: f7c22dd77a4f48650700fc4c3c44b1c59196282e
2021-04-13 16:37:54 -07:00
Shiyan Deng
43ede4c2e3 Add Per Tensor Quantization Support to FXIRImporter (#55405)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55405

Pull Request resolved: https://github.com/pytorch/glow/pull/5516

Allows FXIRImport to import quantized model.

This diff doesn't include the supports for per-channel weights, linear and conv. Will address them in the next diff.

Test Plan: buck test glow/fb/fx/nnpi_importer:test_importer

Reviewed By: jackm321, jfix71

Differential Revision: D27313543

fbshipit-source-id: bf5c96ef5f2ff1835c09db981e0ceefaec56dd5b
2021-04-09 10:49:48 -07:00
James Reed
641d4ff160 [FX] Add stride to shape_prop pass (#55108)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55108

Test Plan: Imported from OSS

Reviewed By: ansley

Differential Revision: D27482241

Pulled By: jamesr66a

fbshipit-source-id: 7d928015712126e916c86225dc3ab27aba22d431
2021-04-02 19:57:11 -07:00
James Reed
bcb4583170 [FX] Add a metadata dict to Node and switch shapeprop to use that (#54926)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54926

Test Plan: Imported from OSS

Reviewed By: ansley

Differential Revision: D27417801

Pulled By: jamesr66a

fbshipit-source-id: 68a5155120a235065f58aa64ba1a6a97818dd0c1
2021-03-31 14:36:54 -07:00
Zeina Migeed
5105250e16 [FX] Add docs for shape propagation (#54554)
Summary:
Fixes #{i54538}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54554

Reviewed By: nikithamalgifb

Differential Revision: D27281263

Pulled By: jamesr66a

fbshipit-source-id: 2fd3914f0e24be0b6a18ad7715f3336dcf7949ba
2021-03-23 21:18:11 -07:00
James Reed
a1c5eba4bd [FX] Move some heavily used passes out of experimental (#51392)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51392

Test Plan: Imported from OSS

Reviewed By: Chillee

Differential Revision: D26161172

Pulled By: jamesr66a

fbshipit-source-id: 04bfe606555bdf1988f527231d4de2e0196e6b37
2021-02-01 19:02:26 -08:00