Commit Graph

234 Commits

Author SHA1 Message Date
Xuehai Pan
754fb834db [BE][CI] bump ruff to 0.9.0: string quote styles (#144569)
Reference: https://docs.astral.sh/ruff/formatter/#f-string-formatting

- Change the outer quotes to double quotes for nested f-strings

```diff
- f'{", ".join(args)}'
+ f"{', '.join(args)}"
```

- Change the inner quotes to double quotes for triple f-strings

```diff
  string = """
-     {', '.join(args)}
+     {", ".join(args)}
  """
```

- Join implicitly concatenated strings

```diff
- string = "short string " "short string " f"{var}"
+ string = f"short string short string {var}"
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144569
Approved by: https://github.com/Skylion007
ghstack dependencies: #146509
2025-02-24 19:56:09 +00:00
Aaron Gokaslan
f3304571fc [BE][Ez]: FURB148 - remove useless enumerate calls (#145619)
Remove useless enumerate calls

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145619
Approved by: https://github.com/drisspg
2025-01-24 23:37:15 +00:00
Aaron Orenstein
99dbc5b0e2 PEP585 update - test (#145176)
See #145101 for details.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145176
Approved by: https://github.com/bobrenjc93
2025-01-22 04:48:28 +00:00
Tom Ritchford
d8c8ba2440 Fix unused Python variables in test/[e-z]* (#136964)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136964
Approved by: https://github.com/justinchuby, https://github.com/albanD
2024-12-18 23:02:30 +00:00
Xuehai Pan
4226ed1585 [BE] Format uncategorized Python files with ruff format (#132576)
Remove patterns `**`, `test/**`, and `torch/**` in `tools/linter/adapters/pyfmt_linter.py` and run `lintrunner`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132576
Approved by: https://github.com/ezyang, https://github.com/Skylion007
ghstack dependencies: #132574
2024-08-04 17:13:31 +00:00
Oguz Ulgen
221350e3a4 Add None return type to init -- tests (#132352)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132352
Approved by: https://github.com/ezyang
ghstack dependencies: #132335, #132351
2024-08-01 15:44:51 +00:00
Xuehai Pan
ba48cf6535 [BE][Easy][6/19] enforce style for empty lines in import segments in test/ (#129757)
See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter.

You can review these PRs via:

```bash
git diff --ignore-all-space --ignore-blank-lines HEAD~1
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129757
Approved by: https://github.com/ezyang
2024-07-17 06:42:37 +00:00
Hu Niu
36e9b71613 Enable UFMT on test/test_jit_fuser_te.py (#127759)
Part of #123062

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127759
Approved by: https://github.com/ezyang
2024-06-04 16:56:03 +00:00
andrewor14
773ae817f7 Batch Norm Consolidation (#116092)
**Summary:**

This commit simplifies the existing decomposition hierarchy
of batch norm ops by adding a single, backend agnostic op:
`batch_norm_with_update`. The existing hierarchy looks like:

```
aten.batch_norm ->
aten._batch_norm_impl_index ->
[
  aten.native_batch_norm ->
  aten._native_batch_norm_legit (export only) ->
  _batch_norm_legit_cpu/cuda (kernels, export only) ->
  _batch_norm_cpu/cuda (kernels)
] OR
[ aten.cudnn_batch_norm ] OR
[ aten.miopen_batch_norm ]
```

Aside from complexity, an important problem with the
above decomposition hierarchy is cuda numerics in
export flows. We observed significantly worse convergence
when training a mobilenetv2-like model when using the
`_batch_norm_cuda` kernel instead of the `cudnn_batch_norm`
kernel. This means users who export their models on CPU
first then move the models to cuda later may silently
see worse accuracies even when cudnn is installed,
because they are using the worse kernel. This issue is
summarized in https://github.com/pytorch/pytorch/issues/111384.

Instead, the new hierarchy proposed by consolidating
existing batch norm ops will look like:

```
aten.batch_norm ->
aten.batch_norm_with_update ->
[ _batch_norm_cpu (kernel) ] OR
[ _batch_norm_cuda (kernel) ] OR
[ cudnn_batch_norm (kernel) ] OR
[ miopen_batch_norm (kernel) ]
```

The new op `batch_norm_with_update` hides backend
implementation details and automatically picks the right
kernel based on what is installed. This commit also adds
the following variants to this op:

```
batch_norm_with_update_functional
batch_norm_with_update.out
batch_norm_no_update
batch_norm_no_update.out
batch_norm_backward
```

Note that this commit only adds this op and its variants,
but does not actually change the decomps to produce these
ops in the graph. This will be done after the 2 week FC
window, and the ops used in the old stack is planned to
be removed after the 6 month BC window.

Test Plan: `OpInfo` tests for `batch_norm_with_update`.

Reviewers: albanD, bdhirsh

Subscribers: albanD, bdhirsh, supriyar

Tasks: https://github.com/pytorch/pytorch/issues/111384

Differential Revision: [D54805279](https://our.internmc.facebook.com/intern/diff/D54805279)
Co-authored-by: Tugsbayasgalan Manlaibaatar <tmanlaibaatar@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/116092
Approved by: https://github.com/bdhirsh, https://github.com/albanD
2024-03-18 21:01:30 +00:00
PyTorch MergeBot
fd0dbcd891 Revert "Batch Norm Consolidation (#116092)"
This reverts commit 7b4f70eda5.

Reverted https://github.com/pytorch/pytorch/pull/116092 on behalf of https://github.com/osalpekar due to Causes build failure in //caffe2:aten-hip (AMD build) target. See [D54707318](https://www.internalfb.com/diff/D54707318) for more details, may require internal build system changes to resolve. ([comment](https://github.com/pytorch/pytorch/pull/116092#issuecomment-1989542965))
2024-03-11 22:22:41 +00:00
andrewor14
7b4f70eda5 Batch Norm Consolidation (#116092)
**Summary:**

This commit simplifies the existing decomposition hierarchy
of batch norm ops by adding a single, backend agnostic op:
`batch_norm_with_update`. The existing hierarchy looks like:

```
aten.batch_norm ->
aten._batch_norm_impl_index ->
[
  aten.native_batch_norm ->
  aten._native_batch_norm_legit (export only) ->
  _batch_norm_legit_cpu/cuda (kernels, export only) ->
  _batch_norm_cpu/cuda (kernels)
] OR
[ aten.cudnn_batch_norm ] OR
[ aten.miopen_batch_norm ]
```

Aside from complexity, an important problem with the
above decomposition hierarchy is cuda numerics in
export flows. We observed significantly worse convergence
when training a mobilenetv2-like model when using the
`_batch_norm_cuda` kernel instead of the `cudnn_batch_norm`
kernel. This means users who export their models on CPU
first then move the models to cuda later may silently
see worse accuracies even when cudnn is installed,
because they are using the worse kernel. This issue is
summarized in https://github.com/pytorch/pytorch/issues/111384.

Instead, the new hierarchy proposed by consolidating
existing batch norm ops will look like:

```
aten.batch_norm ->
aten.batch_norm_with_update ->
[ _batch_norm_cpu (kernel) ] OR
[ _batch_norm_cuda (kernel) ] OR
[ cudnn_batch_norm (kernel) ] OR
[ miopen_batch_norm (kernel) ]
```

The new op `batch_norm_with_update` hides backend
implementation details and automatically picks the right
kernel based on what is installed. This commit also adds
the following variants to this op:

```
batch_norm_with_update_functional
batch_norm_with_update.out
batch_norm_no_update
batch_norm_no_update.out
batch_norm_backward
```

Note that this commit only adds this op and its variants,
but does not actually change the decomps to produce these
ops in the graph. This will be done after the 2 week FC
window, and the ops used in the old stack is planned to
be removed after the 6 month BC window.

Test Plan: `OpInfo` tests for `batch_norm_with_update`.

Reviewers: albanD, bdhirsh

Subscribers: albanD, bdhirsh, supriyar

Tasks: https://github.com/pytorch/pytorch/issues/111384

Co-authored-by: Tugsbayasgalan Manlaibaatar <tmanlaibaatar@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/116092
Approved by: https://github.com/bdhirsh, https://github.com/albanD
2024-03-08 15:07:15 +00:00
rzou
7eecbf8a30 Remove unnecessary skipIfTorchDynamo from test_jit_fuser_te (#118728)
And add some expected failures.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118728
Approved by: https://github.com/bdhirsh
2024-02-12 20:55:29 +00:00
Aaron Gokaslan
bd10fea79a [BE]: Enable F821 and fix bugs (#116579)
Fixes #112371

I tried to fix as many of the bugs as I could, a few I could not figure out what the proper fix for them was though and so I left them with noqas.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116579
Approved by: https://github.com/ezyang
2024-01-01 08:40:46 +00:00
Aaron Gokaslan
ee5d981249 [BE]: Enable RUFF PERF402 and apply fixes (#115505)
* Enable PERF402. Makes code more efficient and succinct by removing useless list copies that could be accomplished either via a list constructor or extend call. All test cases have noqa added since performance is not as sensitive in that folder.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115505
Approved by: https://github.com/malfet
2023-12-20 18:01:24 +00:00
David Berard
405f014c26 [jit] Skip NNAPI, test_ivalue, CPU NNC tests in fbcode (#108937)
Summary:
NNAPI: Internal test infra can't find test_nnapi.py. Easiest solution is to just skip these tests if test_nnapi.py can't be found
test_ivalue: fails due to qscheme op not implemented for CPU backend. In OSS, it doesn't run because it's not included in test_jit.py.
CPU NNC tests: test_working_byte_cpu_float32 is failing, but hard to repro; we don't use CPU NNC internally, so let's just skip CPU NNC tests internally.

Differential Revision: D48041615

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108937
Approved by: https://github.com/eellison
2023-09-11 22:42:30 +00:00
Justin Chu
73e1455327 [BE] Enable ruff's UP rules and autoformat test/ (#105434)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105434
Approved by: https://github.com/albanD
2023-07-19 20:36:06 +00:00
Justin Chu
79c9e82e27 Fix flake8 lint errors reported by ruff - take 2 (#99798)
Replaces #99784. This PR is pure autofix.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99798
Approved by: https://github.com/Skylion007, https://github.com/kit1980
2023-04-23 23:09:51 +00:00
Catherine Lee
1f85390eb2 Skip test_batch_norm in test_jit_fuser_te for asan (#98016)
it takes 10+ minutes on asan?

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98016
Approved by: https://github.com/huydhn
2023-03-30 20:58:41 +00:00
Jason Ansel
ae57bd6630 PT2/TorchScript interoperability fix (#94678)
Allows torch.compile() to inline into ScriptFunction

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94678
Approved by: https://github.com/ezyang
2023-02-15 01:21:10 +00:00
Xuehai Pan
046e88a291 [BE] [3/3] Rewrite super() calls in test (#94592)
Rewrite Python built-in class `super()` calls. Only non-semantic changes should be applied.

- #94587
- #94588
- #94592

Also, methods with only a `super()` call are removed:

```diff
class MyModule(nn.Module):
-   def __init__(self):
-       super().__init__()
-
    def forward(self, ...):
        ...
```

Some cases that change the semantics should be kept unchanged. E.g.:

f152a79be9/caffe2/python/net_printer.py (L184-L190)

f152a79be9/test/test_jit_fuser_te.py (L2628-L2635)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94592
Approved by: https://github.com/ezyang, https://github.com/seemethere
2023-02-12 22:20:53 +00:00
Aaron Gokaslan
67d9790985 [BE] Apply almost all remaining flake8-comprehension checks (#94676)
Applies the remaining flake8-comprehension fixes and checks. This changes replace all remaining unnecessary generator expressions with list/dict/set comprehensions which are more succinct, performant, and better supported by our torch.jit compiler. It also removes useless generators such as 'set(a for a in b)`, resolving it into just the set call.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94676
Approved by: https://github.com/ezyang
2023-02-12 01:01:25 +00:00
PyTorch MergeBot
913cf2908e Revert "Disable torch_jit_fuser_te for dynamo CI (#92945)"
This reverts commit 0fc2f9febb.

Reverted https://github.com/pytorch/pytorch/pull/92945 on behalf of https://github.com/huydhn due to The test looks ok now after moving dynamo shard to 3.8 https://github.com/pytorch/pytorch/issues/92942, so trying to re-enable it
2023-01-26 21:41:17 +00:00
Nikita Shulga
0fc2f9febb Disable torch_jit_fuser_te for dynamo CI (#92945)
Not clear, what caused SIGIOT, but we need to get signal from other tests (and NNC+Dynamo is probably not the most important usecase)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92945
Approved by: https://github.com/ezyang, https://github.com/huydhn
2023-01-25 05:02:59 +00:00
David Berard
c4a6f21b50 [JIT] Add tests for pow() with different dtype inputs (#91946)
Fixes #75476

Apparently this NNC bug has been fixed at some point. Adding tests to track this and verify via CI that this is actually fixed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91946
Approved by: https://github.com/qihqi
2023-01-12 19:39:55 +00:00
Sergii Dymchenko
00118f5c30 Fix issue 38095 TODO in test_jit_fuser_te.py (#90246)
Fix TODO related to https://github.com/pytorch/pytorch/issues/38095
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90246
Approved by: https://github.com/clee2000
2022-12-08 01:39:26 +00:00
Ram Rachum
351d73b97f Fix exception causes all over the codebase (#90271)
This is the continuation to #90134 and hopefully the final PR in this series.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90271
Approved by: https://github.com/kit1980
2022-12-07 04:29:00 +00:00
Philip Meier
bc73affdad prepare removal of deprecated functionality in torch.testing (#87969)
_Redo of #86586 with all BC breaking changes granularly placed into separate commits._

---

Per title. Deprecation happened on Feb 25, 2022 in c6f1bbc0ac, which made it into the 1.12 release. Since it is now 245 days later and the next release will be 1.14, the removals later in the stack comply with the [BC policy](https://github.com/pytorch/pytorch/wiki/PyTorch's-Python-Frontend-Backward-and-Forward-Compatibility-Policy#minimizing-the-disruption-of-bc-breaking-changes).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87969
Approved by: https://github.com/mruberry
2022-11-02 14:04:48 +00:00
anjali411
9eb771583c symintify rand and randint functions and meta suport for randint (#86358)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86358
Approved by: https://github.com/ezyang, https://github.com/albanD
2022-10-10 17:07:11 +00:00
PyTorch MergeBot
7f607e8cb5 [torchdynamo hash update] update the pinned torchdynamo hash (#85774)
This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml).
Update the pinned torchdynamo hash.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85774
Approved by: https://github.com/pytorchbot, https://github.com/malfet
2022-10-05 17:02:33 +00:00
Wang, Eikan
45be74cc63 Optimize to if the datatyep of the source tensor is as same as the dest datatype (#85140)
The AMP inserts `_autocast_to_reduced_precision` and `_autocast_to_full_precision` automatically. The aten implementation provides a fast path to bypass the conversion if the tensor data type has been the reduced/full precision. But NNC always does the conversion which could bring >5% E2E performance regression.

This PR is to address the performance issue like aten. We will not pull `_autocast_to_reduced_precision` and `_autocast_to_full_precision` into NNC fusion group and fallback to aten to trigger its fast path if the tensor data type has been the reduced/full precision.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85140
Approved by: https://github.com/frank-wei
2022-09-27 04:40:42 +00:00
Wang, Eikan
a531a604a0 Support BF16ImmPtr (#84041)
- To support BF16 Immediate value by converting it to uint16. The behavior is as same as BF16 tensor
- Enable BF16 test cases.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84041
Approved by: https://github.com/ZolotukhinM
2022-09-24 11:58:43 +00:00
lezcano
2a88f1b2d8 Land "Make ceil,floor,round,trunc handle integers" (#85144)
PR to land https://github.com/pytorch/pytorch/pull/78480, as Rohit does
not work in the PyTorch project anymore
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85144
Approved by: https://github.com/ngimel, https://github.com/mruberry
2022-09-21 17:23:47 +00:00
Wu, Chunyuan
ca419c3338 [NNC] add eltwise OPs: mish and elu (#80586)
Enable more eltwise OPs in NNC:

- mish
- elu

Pull Request resolved: https://github.com/pytorch/pytorch/pull/80586
Approved by: https://github.com/ZolotukhinM, https://github.com/malfet
2022-09-17 01:44:34 +00:00
richard
0918154967 Supports symbolic diff for silu (#81724)
As per title
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81724
Approved by: https://github.com/jjsjann123, https://github.com/davidberard98
2022-08-09 01:18:10 +00:00
David Berard
b721ea9b82 [NNC] Skip buildShapeExpressions if ConstantChunk input shapes are unknown (#82698)
buildShapeExpressions skips shape building for nodes if their inputs are
unknown.

Before prim::ConstantChunk ops were not skipped if their inputs were
unknown, which caused issues for graphs like:

```
graph(%x.1 : Float(4, 4, strides=[4, 1], requires_grad=0, device=cpu),
      %y.1 : Float(4, 4, strides=[4, 1], requires_grad=0, device=cpu)):
  %2 : Long(requires_grad=0, device=cpu) = prim::Constant[value={4}]() # skip, constants unsupported
  %3 : int = prim::Constant[value=1]() # skip, constants unsupported
  %4 : Float(4, 4, strides=[4, 1], requires_grad=0, device=cpu) = aten::add(%x.1, %y.1, %3) # calculate
  %5 : Float(4, 4, strides=[4, 1], requires_grad=0, device=cpu) = aten::add(%4, %2, %3) # skip, because %2 doesn't have shapes defined in the map
  %6 : Float(4, 2, strides=[4, 1], requires_grad=0, device=cpu), %7 : Float(4, 2, strides=[4, 1], requires_grad=0, device=cpu) = prim::ConstantChunk[chunks=2, dim=1](%5) # <-- FAIL because %5 isn't defined
  %8 : Float(4, 2, strides=[2, 1], requires_grad=0, device=cpu) = aten::mul(%6, %7) # ...
```

(buildShapeExpressions would fail with std::out_of_range because the value was not found in the shapes map)

This moves the skip logic before the prim::ConstantChunk case to avoid this issue.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82698
Approved by: https://github.com/eellison
2022-08-04 23:49:10 +00:00
Peter Bell
9bf52f4be8 Add OpInfo for torch.equal and fix support for non-standard bools
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79389

Approved by: https://github.com/mruberry
2022-06-20 23:48:39 +00:00
goldenxuett
eb49dde9cf Disable TracerWarnings on NNC opinfo tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78756

Approved by: https://github.com/davidberard98
2022-06-03 18:11:12 +00:00
Mike Ruberry
bb8baea932 [primTorch] flatten, squeeze, unsqueeze... (#77043)
This PR ...

Makes the following testing changes:

- Updates stride testing in test_python_reference_consistency to only check strides of dimensions with length > 1
- Creates reference inputs for reshape
- Creates reference inputs for chunk
- Extends the sample inputs for unsqueeze
- Extends the sample inputs for stack -- test_conj_view and test_neg_view are now xfailed
  - https://github.com/pytorch/pytorch/issues/77046

Makes the following architecture changes:
- Adds the refs.special (sub)module
- Adds the refs.nn.functional (sub)module

Adds the following prims:
- expand_dims
- view_of
- rev
- clone

Adds the following references:
  -  flatten
  - squeeze
  - unsqueeze
  - special.i0e
  - special.i1e
  - logical_or
  - logical_and
  - isclose
  - flip
  - stack
  - nn.functional.elu
  - chunk
  - clone
  - narrow

Identifies the following bugs in PyTorch today:
- https://github.com/pytorch/pytorch/issues/77054
- https://github.com/pytorch/pytorch/issues/77055

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77043
Approved by: https://github.com/ngimel
2022-05-09 11:24:55 +00:00
Elias Ellison
0d7be81c9c [JIT] Add Context Manager to force strict fusion
Fixes https://github.com/pytorch/pytorch/issues/75464 Adds a context manager that will throw if the ops in the context are not fused.

API is :
```
with torch.jit.strict_fusion():
    ...
```

A few TODOs:
[+] Compose/figure out how to do with autodiff - right now it will run on autodiff as well
[+] Support all of the nvfuser operators that are added in guarding
[+] Figure out what to do with control flow that isn't taken (right now it will just error). this is probably a source of the original issue :/  - will just error
[+] (After those are figured out) add to docs

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75777
Approved by: https://github.com/davidberard98
2022-04-25 16:08:57 +00:00
David Berard
1324410f2e [JIT] Reuse traced fn for jit opinfos
Previously, jit opinfos would only run the traced function once. This is a problem for NNC and NVFuser, where the fused implementation only runs on the second invocation.

This caches the traced function and calls the cached implementation, so that subsequent calls actually perform fusion and use the fused implementation.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76000

Approved by: https://github.com/eellison
2022-04-22 20:14:29 +00:00
Nikita Shulga
320e5a8268 Revert D34808051: [tensorexpr] Enabled aten::stack in the fuser pass with static shapes
Test Plan: revert-hammer

Differential Revision:
D34808051

Original commit changeset: 213e2ffdf87f

Original Phabricator Diff: D34808051

fbshipit-source-id: b618daeb346f784e8ab9525040edcb4a30a39613
(cherry picked from commit e47b973cba5c95e9410f8aecdfd5619de6d4be7c)
2022-03-31 04:25:43 +00:00
Hui Guo
90c3699cc8 [tensorexpr] Enabled aten::stack in the fuser pass with static shapes (#74077)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74077

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D34808051

Pulled By: huiguoo

fbshipit-source-id: 213e2ffdf87fb1a74104037cea7ef25e4bfd4307
(cherry picked from commit ad9e84842e5b47eda845827d325b08ba361a8286)
2022-03-31 04:25:43 +00:00
Elias Ellison
6694fdaccd Clean up profiling mode and profiling executor strategy (#73875)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73875

Previously we had a few settings:
- getExecutor - which toggled between Profiling Executor and Legacy
- getGraphOptimize - if true, overrides PE/Legacy to run with simple executor (no optimizations)
and then...
- getProfilingMode - which would set PE to 0 specializtions.

The last mode is redundant with getGraphOptimize, we should just remove it and use getGraphOptimize in these cases. It would lead to potentially invalid combinations of logic - what does mean if getProfilingMode is true but getExecutor is set to false ? This would lead to a bug in specialize_autograd_zero in this case, see: https://github.com/pytorch/pytorch/blob/master/torch%2Fcsrc%2Fjit%2Fpasses%2Fspecialize_autogradzero.cpp#L93.

The tests here are failing but get fixed with the PR above it, so i'll squash for landing.

Test Plan: Imported from OSS

Reviewed By: cpuhrsch

Differential Revision: D34938130

Pulled By: eellison

fbshipit-source-id: 1a9c0ae7f6d1cfddc2ed3499a5af611053ae5e1b
(cherry picked from commit cf69ce3d155ba7d334022c42fb2cee54bb088c23)
2022-03-29 18:38:51 +00:00
David Berard
f685dfaac1 [JIT] call super().setUp() in test_jit_fuser_te.py (#73762)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73762

TestCase.setUp() controls slowTest behavior, so calling super().setUp() will prevent fast tests from running in the slow test CI jobs.

example: https://github.com/pytorch/pytorch/runs/5413135014?check_suite_focus=true: despite PYTORCH_TEST_SKIP_FAST=1, TestTEFuserStatic tests are still running

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D34628769

Pulled By: davidberard98

fbshipit-source-id: 84311ec1db2ac60fcafb7b77f377e9ae2ef792e3
(cherry picked from commit 67fdba7fb9b73ce2b9119f4c4bc84e5b38041e21)
2022-03-11 01:03:54 +00:00
Ryan Spring
4f8b986e28 Implement Tanh Gelu Approximation (#61439)
Summary:
1. Implements https://github.com/pytorch/pytorch/issues/39853
2. Adds approximate boolean flag to Gelu
3. Enables Tanh Gelu approximation
4. Adds double backward support for Gelu
5. Enable Tanh Gelu in NvFuser

```
def gelu(x, approximate : str = 'none'):
    if approximate == 'tanh':
        # sqrt(2/pi) = 0.7978845608028654
        return 0.5 * x * (1.0 + torch.tanh(0.7978845608028654 * (x + 0.044715 * torch.pow(x, 3.0))))
    else:
        return x * normcdf(x)
```

Linking XLA PR - https://github.com/pytorch/xla/pull/3039

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61439

Reviewed By: VitalyFedyunin

Differential Revision: D33894937

Pulled By: jbschlosser

fbshipit-source-id: b65e8fb6ea66168af8f34f45ed50e92737a33851
(cherry picked from commit 6e986f91a9)
2022-02-14 03:40:32 +00:00
David Berard
2e04295790 [tensorexpr] support for fusing autocasting ops (#72478)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72478

aten::_autocast_to_reduced_precision and `aten::_autocast_to_full_precision are essentially just aten::to operations, so they can be fused the same way aten::to is fused.

Test Plan: Imported from OSS

Reviewed By: bdhirsh

Differential Revision: D34057522

Pulled By: davidberard98

fbshipit-source-id: f3b53641415702a4ac56460587801b9c76d81b3c
(cherry picked from commit 838ce5542e)
2022-02-10 18:12:36 +00:00
David Berard
bbd42c605a [JIT] Opinfo tests for nnc fusion - retry (#72486)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72486

Retry #70465.

Test Plan: Imported from OSS

Reviewed By: mikaylagawarecki

Differential Revision: D34061628

Pulled By: davidberard98

fbshipit-source-id: e27ed315bc4ad57cdbfbc9cedffcbb7886004524
(cherry picked from commit 7937808d2e)
2022-02-09 19:01:22 +00:00
Nikita Shulga
bb101ec78d Revert D33595240: [JIT] Opinfo tests for nnc fusion
Test Plan: revert-hammer

Differential Revision:
D33595240 (0b57bd4c66)

Original commit changeset: e2e17a921bc3

Original Phabricator Diff: D33595240 (0b57bd4c66)

fbshipit-source-id: 172a3ffd19d180b1b3617956b1f881be62f37bc9
(cherry picked from commit 324cfaea86)
2022-02-08 01:28:42 +00:00
David Berard
0b57bd4c66 [JIT] Opinfo tests for nnc fusion (#70465)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70465

These tests check to ensure that
(a) the result after nnc fusion (of a single op) is the same as the
unfused op
(b) for certain ops where fusion is expected to occur, ensure that
fusion does actually occur

Test Plan: Imported from OSS

Reviewed By: wenleix

Differential Revision: D33595240

Pulled By: davidberard98

fbshipit-source-id: e2e17a921bc30c313e92e8e5bbc6c1b5fcd14bc1
(cherry picked from commit b1ba221acc)
2022-02-07 20:56:21 +00:00
Elias Ellison
defde3bb04 [NNC] Use index for stride mapping in kernel.cpp (#72266)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72266

Within the kernel, we may manipulate `Value *` in `OptimizeCat`, which would invalidate the input `Value *` -> Stride mapping.

Fix for https://github.com/pytorch/pytorch/issues/72173

Test Plan: Imported from OSS

Reviewed By: dagitses, davidberard98

Differential Revision: D33986306

Pulled By: eellison

fbshipit-source-id: dc33cd2b545e49e90d1e46b9fcf1e6dbb4b829db
(cherry picked from commit 5e4555968a)
2022-02-04 00:12:38 +00:00