Commit Graph

29 Commits

Author SHA1 Message Date
Xuehai Pan
6ff1e43a41 [BE][Easy][13/19] enforce style for empty lines in import segments in test/j*/ (#129764)
See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter.

You can review these PRs via:

```bash
git diff --ignore-all-space --ignore-blank-lines HEAD~1
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129764
Approved by: https://github.com/ezyang
2024-08-01 12:13:42 +00:00
Yuanhao Ji
604c9c5601 Enable UFMT on all of test/jit (#123623)
Partially addresses #123062

Ran lintrunner on:

- `test/jit`

with command:

```bash
lintrunner -a --take UFMT --all-files
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/123623
Approved by: https://github.com/ezyang
2024-04-11 23:45:05 +00:00
Jason Ansel
ae57bd6630 PT2/TorchScript interoperability fix (#94678)
Allows torch.compile() to inline into ScriptFunction

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94678
Approved by: https://github.com/ezyang
2023-02-15 01:21:10 +00:00
Elias Ellison
6694fdaccd Clean up profiling mode and profiling executor strategy (#73875)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73875

Previously we had a few settings:
- getExecutor - which toggled between Profiling Executor and Legacy
- getGraphOptimize - if true, overrides PE/Legacy to run with simple executor (no optimizations)
and then...
- getProfilingMode - which would set PE to 0 specializtions.

The last mode is redundant with getGraphOptimize, we should just remove it and use getGraphOptimize in these cases. It would lead to potentially invalid combinations of logic - what does mean if getProfilingMode is true but getExecutor is set to false ? This would lead to a bug in specialize_autograd_zero in this case, see: https://github.com/pytorch/pytorch/blob/master/torch%2Fcsrc%2Fjit%2Fpasses%2Fspecialize_autogradzero.cpp#L93.

The tests here are failing but get fixed with the PR above it, so i'll squash for landing.

Test Plan: Imported from OSS

Reviewed By: cpuhrsch

Differential Revision: D34938130

Pulled By: eellison

fbshipit-source-id: 1a9c0ae7f6d1cfddc2ed3499a5af611053ae5e1b
(cherry picked from commit cf69ce3d155ba7d334022c42fb2cee54bb088c23)
2022-03-29 18:38:51 +00:00
Elias Ellison
fc3622904f Add Gflags for fusion strategy and make it local to executor (#73668)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73668

Test Plan: Imported from OSS

Reviewed By: dagitses

Differential Revision: D34598128

Pulled By: eellison

fbshipit-source-id: a67a258d5dde8ad81bf19151bcf7a4da1321a2c0
(cherry picked from commit 578d3150f1557c4df91d35ad464896610bf8c23b)
2022-03-03 19:41:46 +00:00
Jane Xu
09c7771e9c Set test owners for jit tests (#66808)
Summary:
Action following https://github.com/pytorch/pytorch/issues/66232

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66808

Reviewed By: mrshenli

Differential Revision: D31761414

Pulled By: janeyx99

fbshipit-source-id: baf8c49ff9c4bcda7b0ea0f6aafd26380586e72d
2021-10-25 07:51:10 -07:00
Bert Maher
8dda299d96 Re-apply: [nnc] Support thread level parallelism in fused kernels (#63776)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63776

I reverted this out of an abundance of caution because some test
failures occurred, but they were all due to precision issues fixed lower in
this stack.  Let's try again.

I've rolled the elimination of the allow-parallelism-in-fusions toggle into
this diff since they're pretty tightly coupled.
ghstack-source-id: 136529847

Test Plan: CI

Reviewed By: huiguoo

Differential Revision: D30484555

fbshipit-source-id: 38fd33520f710585d1130c365a8c60c9ce794a59
2021-08-24 18:56:55 -07:00
Bert Maher
76da46ccdc Revert D30417127: Remove flag to toggle CPU fusion in the presence of parallelism
Test Plan: revert-hammer

Differential Revision:
D30417127 (6600bc9651)

Original commit changeset: b77d7c68364f

fbshipit-source-id: 6b52fb83a84fe241945e3cb3eeb71050d1d9c8f1
2021-08-21 03:38:07 -07:00
Bert Maher
6600bc9651 Remove flag to toggle CPU fusion in the presence of parallelism (#63514)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63514

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D30417127

Pulled By: bertmaher

fbshipit-source-id: b77d7c68364f2af73570740540f3b1152313016e
2021-08-20 11:18:19 -07:00
Bert Maher
8e82e932f3 Reland: D27652485: [nnc] Enable CPU fusion only when num_threads == 1" (#56120)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56120

This reverts commit ad17fadbfc (D27786457).

The big annoyance here is that depending on the threading mode you may not be
able to toggle num_threads at will, so the fusion tests won't fail.

I hate this solution, but I'm adding a secondary override for the TE fuser.
Now you need to both turn on fusion (_jit_override_can_fuse_on_cpu), and you're
OK if you're running with 1 thread, or you can add
`_jit_set_texpr_parallel_cpu_enabled` to enable it anyways.

This is (a) mainly for tests, since a real user probably won't fiddle aimlessly
with the thread count, and (b) will go away once NNC's threading support is
fully baked.

Test Plan: Imported from OSS

Reviewed By: Krovatkin

Differential Revision: D27788199

Pulled By: bertmaher

fbshipit-source-id: 070d04474f15e9689dbdf8cc1fde43050c6506b1
2021-04-15 15:50:18 -07:00
Natalia Gimelshein
ad17fadbfc Revert D27652485: [nnc] Enable CPU fusion only when num_threads == 1
Test Plan: revert-hammer

Differential Revision:
D27652485 (e7e164f9e6)

Original commit changeset: 182580cf758d

fbshipit-source-id: e3c95b06d1eef668095f3cf461485395179d94af
2021-04-14 20:23:15 -07:00
Bert Maher
e7e164f9e6 [nnc] Enable CPU fusion only when num_threads == 1 (#55621)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55621

Fuser support for thread-level parallelism is a work in progress, so
only fuse when the program is running single-threaded.
ghstack-source-id: 126069259

Test Plan: observe fusion groups formed when torch.get_num_threads==1 vs not

Reviewed By: ZolotukhinM

Differential Revision: D27652485

fbshipit-source-id: 182580cf758d99dd499cc4591eb9d080884aa7ef
2021-04-14 09:16:54 -07:00
Raghavan Raman
c7a70eec1b Make LLVM the default backend for TE (#52314)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/52264

When CPU fusion is enabled without LLVM support in PyTorch, it causes huge slowdown (> 50x). This PR makes the LLVM backend the default backend for TE. Now, an error will be reported if CPU fusion is enabled without LLVM support, to avoid this performance regression.

This PR also updates the tests to not use LLVM, so that the old flow is continued. This is necessary because tests run in CI do not have LLVM.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/52314

Reviewed By: ejguan

Differential Revision: D26491294

Pulled By: navahgar

fbshipit-source-id: 74561db1207da805d6d28039450db046ba2988fb
2021-02-18 12:00:38 -08:00
Thomas Viehmann
ea087e2d92 JIT: guard DifferentiableGraph node (#49433)
Summary:
This adds guarding for DifferentiableGraph nodes in order to not depend on
Also bailing out on required gradients for the CUDA fuser.

Fixes https://github.com/pytorch/pytorch/issues/49299

I still need to look into a handful of failing tests, but maybe it can be a discussion basis.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49433

Reviewed By: ngimel

Differential Revision: D25681374

Pulled By: Krovatkin

fbshipit-source-id: 8e7be53a335c845560436c0cceeb5e154c9cf296
2021-01-08 20:01:27 -08:00
Elias Ellison
71ddc0ba19 [TensorExpr Fuser] Add support for nodes which have tensor constant inputs (#47814)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47814

Previously, we would bail completely if a node had a constant tensor input. This PR adds support for this case by lifting the constant out of the fusion graph after we've done fusion. It might be nice to add support for Tensor Constants in NNC itself, but it looked kind of tricky and this is an easy enough temporary solution.

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D25286215

Pulled By: eellison

fbshipit-source-id: 9ff67f92f5a2d43fd3ca087569898666525ca8cf
2020-12-10 12:19:47 -08:00
Elias Ellison
0a42003f8f [TensorExpr Fuser] Handle fusing values with un-profiled uses (#48689)
Summary:
Copying myself from the code comments:

A value can be profiled with differently typed uses.
This can occur from:
- having a use which is not executed, so the type will be
TensorType::get()
- control-flow that depends on tensor type:
  if x.size() == 2 op(x) else op(x)
- mutation of the value on a field represented in the tensor type
  op(x); x.resize_([...]); op(x)

The most common case today with num_profiles = 1 is from the first case. Here we can just ignore non-profiled uses, and choose any of the profiled uses. Because we guard all tensor types in the runtime, even if we set a Value to have a profiled type from one use and then execute a use with a different profiled type, we will still be correct. In the future we could consider unifying the types of uses, or adding a type refinement node so uses can have the correct corresponding type.

Fix for https://github.com/pytorch/pytorch/issues/48043 I think there's probably too much context required for that to be a good bootcamp task...

There was an observed missed fusion opportunity in detectron2 because of this issue.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48689

Reviewed By: ngimel

Differential Revision: D25278791

Pulled By: eellison

fbshipit-source-id: 443e5e1254446a31cc895a275b5f1ac3798c327f
2020-12-04 12:48:10 -08:00
Elias Ellison
5dd288eb06 [JIT] Regularize tensorexpr fuser strategy with other fusers (#44972)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44972

Previously, our fusion strategy would be:
- start at the end of the block, find a fusable node
- iteratively try to merge inputs into the fusion group, sorted topologically

This strategy works pretty well, but has the possibility of missing fusion groups. See my attached test case for an example where we wouldn't find all possible fusion groups. bertmaher found an example of a missed fusion groups in one of our rnn examples (jit_premul) that caused a regression from the legacy fuser.

Here, I'm updating our fusion strategy to be the same as our other fusion passes - create_autodiff_subgraphs, and graph_fuser.cpp.

The basic strategy is:
- iterate until you find a fusible node
- try to merge the nodes inputs, whenever a succesful merge occurs restart at the beginning of the nodes inputs
- after you've exhausted a node, continue searching the block for fusion opportunities from the node
- continue doing this on the block until we go through an iteration without an succesful merges

Since we create the fusion groups once, and only re-specialize within the fusion groups, we should be running this very infrequently (only re-triggers when we fail undefinedness specializations). Also bc it's the same algorithm as the existing fuser it is unlikely to cause a regression.

Test Plan: Imported from OSS

Reviewed By: Krovatkin, robieta

Differential Revision: D23821581

Pulled By: eellison

fbshipit-source-id: e513d1ef719120dadb0bfafc7a14f4254cd806ee
2020-09-24 15:34:21 -07:00
Elias Ellison
8df0400a50 Fix fallback graph in specialize autogradzero (#44654)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44654

Previously we weren't creating a fallback graph as intended in specialize autograd zero, so if a Tensor failed one of our undefinedness checks we would run the backward normally without reprofiling & optimizing.

Test Plan: Imported from OSS

Reviewed By: jamesr66a

Differential Revision: D23691764

Pulled By: eellison

fbshipit-source-id: 10c6fa79518c84a6f5ef2bfbd9ea10843af751eb
2020-09-15 11:12:20 -07:00
Elias Ellison
856510c96d [JIT] Dont optimize shape info in batch_mm (#44565)
Summary:
We run remove profile nodes and specialize types before batch_mm, so we cannot run peepholes on the type information of tensors since these properties have not been guarded to be guaranteed to be correct.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44565

Reviewed By: albanD

Differential Revision: D23661538

Pulled By: eellison

fbshipit-source-id: 0dd23a65714f047f49b4db4ec582b21870925fe1
2020-09-14 12:34:20 -07:00
Elias Ellison
cc5a1cf616 [JIT] Erase shapes before fallback graph (#44434)
Summary:
Previously the specialized types were copied over to the fallback function, although the tensors in the fallback type were not of that type.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44434

Reviewed By: SplitInfinity

Differential Revision: D23611943

Pulled By: eellison

fbshipit-source-id: 2ea88a97529409f6c5c4c1f59a14b623524933de
2020-09-10 12:07:31 -07:00
Sujoy Saraswati
54931ebb7b Release saved variable from DifferentiableGraphBackward (#42994)
Summary:
When the backward ops execute via the autograd engine evaluate_function(), the fn.release_variables() is called to release the SavedVariables. For the eager mode ops, this releases the saved inputs that was required for backward grad function. However, with TorchScript, we get a DifferentableGraph and the DifferentiableGraphBackward() doesn't implement a release_variables(). This leads to the SavedVariables to be alive longer. Implement release_variables() for DifferentiableGraphBackward to release these SavedVariables  early.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/42994

Reviewed By: izdeby

Differential Revision: D23503172

Pulled By: albanD

fbshipit-source-id: d87127498cfa72883ae6bb31d0e6c7056c4c36d4
2020-09-08 14:36:52 -07:00
Elias Ellison
df67f0beab [TensorExpr fuser] Guard nodes that have tensor output properties determined by non-tensor inputs (#44137)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44137

We only insert guards on Tensor types, so we rely on the output
of a node being uniquely determined by its input types.
bail if any non-Tensor input affects the output type
and cannot be reasoned about statically

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D23543602

Pulled By: eellison

fbshipit-source-id: abd6fe0b1fd7fe6fc251694d4cd442b19c032dd7
2020-09-05 01:40:18 -07:00
Elias Ellison
6868bf95c6 [JIT] Fuser match on schemas not node kind (#44083)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44083

Match on the complete schema of a node instead of its node kind when deciding to fuse it. Previously we matched on node kind, which could fail with something like `aten::add(int, int)` and if a new overload was added to an op without corresponding NNC support we would fuse it.

Follow ups are:
 - bail when an output tensor type isnt uniquely determined by the input types (e.g. aten::add and the second input could be either a float or an int)
- remove NNC lowering for _tanh_backward & _sigmoid_backward
- Validate that we support all of the overloads here. I optimistically added ops that included Tensors, it's possible that we do not support every overload here. This isn't a regression, and this PR is at least improving our failures in that regard.

I can do any of these as part of this PR if desired, but there are a number of failures people have run into that this PR fixes so I think it would be good to land this sooner than later.

Test Plan: Imported from OSS

Reviewed By: SplitInfinity

Differential Revision: D23503704

Pulled By: eellison

fbshipit-source-id: 3ce971fb1bc3a7f1cbaa38f1ed853e2db3d67c18
2020-09-03 14:47:19 -07:00
Bert Maher
33d51a9b32 Respect canFuseOn{CPU,GPU} in TE fuser (#43967)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43967

Test Plan: Imported from OSS

Reviewed By: asuhan

Differential Revision: D23469048

Pulled By: bertmaher

fbshipit-source-id: 1005a7ae08974059ff9d467492caa3a388070eeb
2020-09-02 18:00:25 -07:00
Elias Ellison
3c8b1d73c9 Update aliasing in tensorexpr fuser (#43743)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43743

Test Plan: Imported from OSS

Reviewed By: Krovatkin

Differential Revision: D23385205

Pulled By: eellison

fbshipit-source-id: 097a15d5bcf216453e1dd144d6117108b3deae4d
2020-08-31 11:52:26 -07:00
Elias Ellison
5da8a7bf2d use types in the IR instead of vmap (#43742)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43742

We can remove all prim::profiles, update the values to their specialized profiled types, and then later guard the input graphs based on the input types of the fusion group. After that we remove specialized tensor types from the graph. This gets rid of having to update the vmap and removes all of the profile nodes in fusing.

Test Plan: Imported from OSS

Reviewed By: Krovatkin

Differential Revision: D23385206

Pulled By: eellison

fbshipit-source-id: 2c84bd1d1c38df0d7585e523c30f7bd28f399d7c
2020-08-31 11:52:23 -07:00
Elias Ellison
01f974eb1e Specialize optionals for grad_sum_to_size (#43633)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43633

In the backward graph, _grad_sum_to_size is inserted whenever a possibly broadcasting op is called:"
`"aten::_grad_sum_to_size(Tensor(a) self, int[]? size) -> Tensor(a)"`
 If a broadcast occurred, a sum is called, otherwise the second input is None and it is a no-op. Most of the time, it's a no-op (in the fast RNNs benchmark > 90% of the time).

We can get rid of this op by profiling the optionality of the second input. I added `prim::profile_optional` to do this, which counts the number of times it saw a None value and the number of times it saw a value present. When specializing the backward graph, we insert checks for values we profiled as None, and in the optimized block can remove the grad_sum_to_size calls that use those values.

In the future we may revisit this when NNC supports reductions and we want to replace grad_sum_to_size with sums as well, but I think this is worth landing now.

Test Plan: Imported from OSS

Reviewed By: bwasti, ZolotukhinM

Differential Revision: D23358809

Pulled By: eellison

fbshipit-source-id: a30a148ca581370789d57ba082d23cbf7ef2cd4d
2020-08-27 14:35:37 -07:00
Elias Ellison
a19fd3a388 Add undefined specializations in backward (#43632)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43632

Specialize the backward graph by guarding on the undefinedness of the input tensors. The graph will look like:
```
ty1, ty2, succesful_checks = prim::TypeCheck(...)
if (succesful_checks)
-> optimized graph
else:
-> fallback graph
```

Specializing on the undefinedness of tensors allows us to clean up the
```
if any_defined(inputs):
 outputs = <original_computation>
else:
 outputs = autograd zero tensors
```
blocks that make up the backward graph, so that we can fuse the original_computation nodes together.

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D23358808

Pulled By: eellison

fbshipit-source-id: f5bb28f78a4a3082ecc688a8fe0345a8a098c091
2020-08-27 14:35:35 -07:00
Elias Ellison
a4cf4c2437 refactor tests (#43631)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43631

I added a new test for just profiler stuff - I don't think the test should go in test_jit.py. Maybe this should just go in test_tensorexpr_fuser, but I'm not really testing tensorexpr stuff either... LMK

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D23358810

Pulled By: eellison

fbshipit-source-id: 074238e1b60e4c4a919a052b7a5312b790ad5d82
2020-08-27 14:35:33 -07:00