Commit Graph

220 Commits

Author SHA1 Message Date
David Berard
405f014c26 [jit] Skip NNAPI, test_ivalue, CPU NNC tests in fbcode (#108937)
Summary:
NNAPI: Internal test infra can't find test_nnapi.py. Easiest solution is to just skip these tests if test_nnapi.py can't be found
test_ivalue: fails due to qscheme op not implemented for CPU backend. In OSS, it doesn't run because it's not included in test_jit.py.
CPU NNC tests: test_working_byte_cpu_float32 is failing, but hard to repro; we don't use CPU NNC internally, so let's just skip CPU NNC tests internally.

Differential Revision: D48041615

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108937
Approved by: https://github.com/eellison
2023-09-11 22:42:30 +00:00
Justin Chu
73e1455327 [BE] Enable ruff's UP rules and autoformat test/ (#105434)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105434
Approved by: https://github.com/albanD
2023-07-19 20:36:06 +00:00
Justin Chu
79c9e82e27 Fix flake8 lint errors reported by ruff - take 2 (#99798)
Replaces #99784. This PR is pure autofix.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99798
Approved by: https://github.com/Skylion007, https://github.com/kit1980
2023-04-23 23:09:51 +00:00
Catherine Lee
1f85390eb2 Skip test_batch_norm in test_jit_fuser_te for asan (#98016)
it takes 10+ minutes on asan?

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98016
Approved by: https://github.com/huydhn
2023-03-30 20:58:41 +00:00
Jason Ansel
ae57bd6630 PT2/TorchScript interoperability fix (#94678)
Allows torch.compile() to inline into ScriptFunction

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94678
Approved by: https://github.com/ezyang
2023-02-15 01:21:10 +00:00
Xuehai Pan
046e88a291 [BE] [3/3] Rewrite super() calls in test (#94592)
Rewrite Python built-in class `super()` calls. Only non-semantic changes should be applied.

- #94587
- #94588
- #94592

Also, methods with only a `super()` call are removed:

```diff
class MyModule(nn.Module):
-   def __init__(self):
-       super().__init__()
-
    def forward(self, ...):
        ...
```

Some cases that change the semantics should be kept unchanged. E.g.:

f152a79be9/caffe2/python/net_printer.py (L184-L190)

f152a79be9/test/test_jit_fuser_te.py (L2628-L2635)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94592
Approved by: https://github.com/ezyang, https://github.com/seemethere
2023-02-12 22:20:53 +00:00
Aaron Gokaslan
67d9790985 [BE] Apply almost all remaining flake8-comprehension checks (#94676)
Applies the remaining flake8-comprehension fixes and checks. This changes replace all remaining unnecessary generator expressions with list/dict/set comprehensions which are more succinct, performant, and better supported by our torch.jit compiler. It also removes useless generators such as 'set(a for a in b)`, resolving it into just the set call.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94676
Approved by: https://github.com/ezyang
2023-02-12 01:01:25 +00:00
PyTorch MergeBot
913cf2908e Revert "Disable torch_jit_fuser_te for dynamo CI (#92945)"
This reverts commit 0fc2f9febb.

Reverted https://github.com/pytorch/pytorch/pull/92945 on behalf of https://github.com/huydhn due to The test looks ok now after moving dynamo shard to 3.8 https://github.com/pytorch/pytorch/issues/92942, so trying to re-enable it
2023-01-26 21:41:17 +00:00
Nikita Shulga
0fc2f9febb Disable torch_jit_fuser_te for dynamo CI (#92945)
Not clear, what caused SIGIOT, but we need to get signal from other tests (and NNC+Dynamo is probably not the most important usecase)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92945
Approved by: https://github.com/ezyang, https://github.com/huydhn
2023-01-25 05:02:59 +00:00
David Berard
c4a6f21b50 [JIT] Add tests for pow() with different dtype inputs (#91946)
Fixes #75476

Apparently this NNC bug has been fixed at some point. Adding tests to track this and verify via CI that this is actually fixed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91946
Approved by: https://github.com/qihqi
2023-01-12 19:39:55 +00:00
Sergii Dymchenko
00118f5c30 Fix issue 38095 TODO in test_jit_fuser_te.py (#90246)
Fix TODO related to https://github.com/pytorch/pytorch/issues/38095
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90246
Approved by: https://github.com/clee2000
2022-12-08 01:39:26 +00:00
Ram Rachum
351d73b97f Fix exception causes all over the codebase (#90271)
This is the continuation to #90134 and hopefully the final PR in this series.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90271
Approved by: https://github.com/kit1980
2022-12-07 04:29:00 +00:00
Philip Meier
bc73affdad prepare removal of deprecated functionality in torch.testing (#87969)
_Redo of #86586 with all BC breaking changes granularly placed into separate commits._

---

Per title. Deprecation happened on Feb 25, 2022 in c6f1bbc0ac, which made it into the 1.12 release. Since it is now 245 days later and the next release will be 1.14, the removals later in the stack comply with the [BC policy](https://github.com/pytorch/pytorch/wiki/PyTorch's-Python-Frontend-Backward-and-Forward-Compatibility-Policy#minimizing-the-disruption-of-bc-breaking-changes).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87969
Approved by: https://github.com/mruberry
2022-11-02 14:04:48 +00:00
anjali411
9eb771583c symintify rand and randint functions and meta suport for randint (#86358)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86358
Approved by: https://github.com/ezyang, https://github.com/albanD
2022-10-10 17:07:11 +00:00
PyTorch MergeBot
7f607e8cb5 [torchdynamo hash update] update the pinned torchdynamo hash (#85774)
This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml).
Update the pinned torchdynamo hash.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85774
Approved by: https://github.com/pytorchbot, https://github.com/malfet
2022-10-05 17:02:33 +00:00
Wang, Eikan
45be74cc63 Optimize to if the datatyep of the source tensor is as same as the dest datatype (#85140)
The AMP inserts `_autocast_to_reduced_precision` and `_autocast_to_full_precision` automatically. The aten implementation provides a fast path to bypass the conversion if the tensor data type has been the reduced/full precision. But NNC always does the conversion which could bring >5% E2E performance regression.

This PR is to address the performance issue like aten. We will not pull `_autocast_to_reduced_precision` and `_autocast_to_full_precision` into NNC fusion group and fallback to aten to trigger its fast path if the tensor data type has been the reduced/full precision.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85140
Approved by: https://github.com/frank-wei
2022-09-27 04:40:42 +00:00
Wang, Eikan
a531a604a0 Support BF16ImmPtr (#84041)
- To support BF16 Immediate value by converting it to uint16. The behavior is as same as BF16 tensor
- Enable BF16 test cases.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84041
Approved by: https://github.com/ZolotukhinM
2022-09-24 11:58:43 +00:00
lezcano
2a88f1b2d8 Land "Make ceil,floor,round,trunc handle integers" (#85144)
PR to land https://github.com/pytorch/pytorch/pull/78480, as Rohit does
not work in the PyTorch project anymore
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85144
Approved by: https://github.com/ngimel, https://github.com/mruberry
2022-09-21 17:23:47 +00:00
Wu, Chunyuan
ca419c3338 [NNC] add eltwise OPs: mish and elu (#80586)
Enable more eltwise OPs in NNC:

- mish
- elu

Pull Request resolved: https://github.com/pytorch/pytorch/pull/80586
Approved by: https://github.com/ZolotukhinM, https://github.com/malfet
2022-09-17 01:44:34 +00:00
richard
0918154967 Supports symbolic diff for silu (#81724)
As per title
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81724
Approved by: https://github.com/jjsjann123, https://github.com/davidberard98
2022-08-09 01:18:10 +00:00
David Berard
b721ea9b82 [NNC] Skip buildShapeExpressions if ConstantChunk input shapes are unknown (#82698)
buildShapeExpressions skips shape building for nodes if their inputs are
unknown.

Before prim::ConstantChunk ops were not skipped if their inputs were
unknown, which caused issues for graphs like:

```
graph(%x.1 : Float(4, 4, strides=[4, 1], requires_grad=0, device=cpu),
      %y.1 : Float(4, 4, strides=[4, 1], requires_grad=0, device=cpu)):
  %2 : Long(requires_grad=0, device=cpu) = prim::Constant[value={4}]() # skip, constants unsupported
  %3 : int = prim::Constant[value=1]() # skip, constants unsupported
  %4 : Float(4, 4, strides=[4, 1], requires_grad=0, device=cpu) = aten::add(%x.1, %y.1, %3) # calculate
  %5 : Float(4, 4, strides=[4, 1], requires_grad=0, device=cpu) = aten::add(%4, %2, %3) # skip, because %2 doesn't have shapes defined in the map
  %6 : Float(4, 2, strides=[4, 1], requires_grad=0, device=cpu), %7 : Float(4, 2, strides=[4, 1], requires_grad=0, device=cpu) = prim::ConstantChunk[chunks=2, dim=1](%5) # <-- FAIL because %5 isn't defined
  %8 : Float(4, 2, strides=[2, 1], requires_grad=0, device=cpu) = aten::mul(%6, %7) # ...
```

(buildShapeExpressions would fail with std::out_of_range because the value was not found in the shapes map)

This moves the skip logic before the prim::ConstantChunk case to avoid this issue.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82698
Approved by: https://github.com/eellison
2022-08-04 23:49:10 +00:00
Peter Bell
9bf52f4be8 Add OpInfo for torch.equal and fix support for non-standard bools
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79389

Approved by: https://github.com/mruberry
2022-06-20 23:48:39 +00:00
goldenxuett
eb49dde9cf Disable TracerWarnings on NNC opinfo tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78756

Approved by: https://github.com/davidberard98
2022-06-03 18:11:12 +00:00
Mike Ruberry
bb8baea932 [primTorch] flatten, squeeze, unsqueeze... (#77043)
This PR ...

Makes the following testing changes:

- Updates stride testing in test_python_reference_consistency to only check strides of dimensions with length > 1
- Creates reference inputs for reshape
- Creates reference inputs for chunk
- Extends the sample inputs for unsqueeze
- Extends the sample inputs for stack -- test_conj_view and test_neg_view are now xfailed
  - https://github.com/pytorch/pytorch/issues/77046

Makes the following architecture changes:
- Adds the refs.special (sub)module
- Adds the refs.nn.functional (sub)module

Adds the following prims:
- expand_dims
- view_of
- rev
- clone

Adds the following references:
  -  flatten
  - squeeze
  - unsqueeze
  - special.i0e
  - special.i1e
  - logical_or
  - logical_and
  - isclose
  - flip
  - stack
  - nn.functional.elu
  - chunk
  - clone
  - narrow

Identifies the following bugs in PyTorch today:
- https://github.com/pytorch/pytorch/issues/77054
- https://github.com/pytorch/pytorch/issues/77055

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77043
Approved by: https://github.com/ngimel
2022-05-09 11:24:55 +00:00
Elias Ellison
0d7be81c9c [JIT] Add Context Manager to force strict fusion
Fixes https://github.com/pytorch/pytorch/issues/75464 Adds a context manager that will throw if the ops in the context are not fused.

API is :
```
with torch.jit.strict_fusion():
    ...
```

A few TODOs:
[+] Compose/figure out how to do with autodiff - right now it will run on autodiff as well
[+] Support all of the nvfuser operators that are added in guarding
[+] Figure out what to do with control flow that isn't taken (right now it will just error). this is probably a source of the original issue :/  - will just error
[+] (After those are figured out) add to docs

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75777
Approved by: https://github.com/davidberard98
2022-04-25 16:08:57 +00:00
David Berard
1324410f2e [JIT] Reuse traced fn for jit opinfos
Previously, jit opinfos would only run the traced function once. This is a problem for NNC and NVFuser, where the fused implementation only runs on the second invocation.

This caches the traced function and calls the cached implementation, so that subsequent calls actually perform fusion and use the fused implementation.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76000

Approved by: https://github.com/eellison
2022-04-22 20:14:29 +00:00
Nikita Shulga
320e5a8268 Revert D34808051: [tensorexpr] Enabled aten::stack in the fuser pass with static shapes
Test Plan: revert-hammer

Differential Revision:
D34808051

Original commit changeset: 213e2ffdf87f

Original Phabricator Diff: D34808051

fbshipit-source-id: b618daeb346f784e8ab9525040edcb4a30a39613
(cherry picked from commit e47b973cba5c95e9410f8aecdfd5619de6d4be7c)
2022-03-31 04:25:43 +00:00
Hui Guo
90c3699cc8 [tensorexpr] Enabled aten::stack in the fuser pass with static shapes (#74077)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74077

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D34808051

Pulled By: huiguoo

fbshipit-source-id: 213e2ffdf87fb1a74104037cea7ef25e4bfd4307
(cherry picked from commit ad9e84842e5b47eda845827d325b08ba361a8286)
2022-03-31 04:25:43 +00:00
Elias Ellison
6694fdaccd Clean up profiling mode and profiling executor strategy (#73875)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73875

Previously we had a few settings:
- getExecutor - which toggled between Profiling Executor and Legacy
- getGraphOptimize - if true, overrides PE/Legacy to run with simple executor (no optimizations)
and then...
- getProfilingMode - which would set PE to 0 specializtions.

The last mode is redundant with getGraphOptimize, we should just remove it and use getGraphOptimize in these cases. It would lead to potentially invalid combinations of logic - what does mean if getProfilingMode is true but getExecutor is set to false ? This would lead to a bug in specialize_autograd_zero in this case, see: https://github.com/pytorch/pytorch/blob/master/torch%2Fcsrc%2Fjit%2Fpasses%2Fspecialize_autogradzero.cpp#L93.

The tests here are failing but get fixed with the PR above it, so i'll squash for landing.

Test Plan: Imported from OSS

Reviewed By: cpuhrsch

Differential Revision: D34938130

Pulled By: eellison

fbshipit-source-id: 1a9c0ae7f6d1cfddc2ed3499a5af611053ae5e1b
(cherry picked from commit cf69ce3d155ba7d334022c42fb2cee54bb088c23)
2022-03-29 18:38:51 +00:00
David Berard
f685dfaac1 [JIT] call super().setUp() in test_jit_fuser_te.py (#73762)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73762

TestCase.setUp() controls slowTest behavior, so calling super().setUp() will prevent fast tests from running in the slow test CI jobs.

example: https://github.com/pytorch/pytorch/runs/5413135014?check_suite_focus=true: despite PYTORCH_TEST_SKIP_FAST=1, TestTEFuserStatic tests are still running

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D34628769

Pulled By: davidberard98

fbshipit-source-id: 84311ec1db2ac60fcafb7b77f377e9ae2ef792e3
(cherry picked from commit 67fdba7fb9b73ce2b9119f4c4bc84e5b38041e21)
2022-03-11 01:03:54 +00:00
Ryan Spring
4f8b986e28 Implement Tanh Gelu Approximation (#61439)
Summary:
1. Implements https://github.com/pytorch/pytorch/issues/39853
2. Adds approximate boolean flag to Gelu
3. Enables Tanh Gelu approximation
4. Adds double backward support for Gelu
5. Enable Tanh Gelu in NvFuser

```
def gelu(x, approximate : str = 'none'):
    if approximate == 'tanh':
        # sqrt(2/pi) = 0.7978845608028654
        return 0.5 * x * (1.0 + torch.tanh(0.7978845608028654 * (x + 0.044715 * torch.pow(x, 3.0))))
    else:
        return x * normcdf(x)
```

Linking XLA PR - https://github.com/pytorch/xla/pull/3039

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61439

Reviewed By: VitalyFedyunin

Differential Revision: D33894937

Pulled By: jbschlosser

fbshipit-source-id: b65e8fb6ea66168af8f34f45ed50e92737a33851
(cherry picked from commit 6e986f91a9)
2022-02-14 03:40:32 +00:00
David Berard
2e04295790 [tensorexpr] support for fusing autocasting ops (#72478)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72478

aten::_autocast_to_reduced_precision and `aten::_autocast_to_full_precision are essentially just aten::to operations, so they can be fused the same way aten::to is fused.

Test Plan: Imported from OSS

Reviewed By: bdhirsh

Differential Revision: D34057522

Pulled By: davidberard98

fbshipit-source-id: f3b53641415702a4ac56460587801b9c76d81b3c
(cherry picked from commit 838ce5542e)
2022-02-10 18:12:36 +00:00
David Berard
bbd42c605a [JIT] Opinfo tests for nnc fusion - retry (#72486)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72486

Retry #70465.

Test Plan: Imported from OSS

Reviewed By: mikaylagawarecki

Differential Revision: D34061628

Pulled By: davidberard98

fbshipit-source-id: e27ed315bc4ad57cdbfbc9cedffcbb7886004524
(cherry picked from commit 7937808d2e)
2022-02-09 19:01:22 +00:00
Nikita Shulga
bb101ec78d Revert D33595240: [JIT] Opinfo tests for nnc fusion
Test Plan: revert-hammer

Differential Revision:
D33595240 (0b57bd4c66)

Original commit changeset: e2e17a921bc3

Original Phabricator Diff: D33595240 (0b57bd4c66)

fbshipit-source-id: 172a3ffd19d180b1b3617956b1f881be62f37bc9
(cherry picked from commit 324cfaea86)
2022-02-08 01:28:42 +00:00
David Berard
0b57bd4c66 [JIT] Opinfo tests for nnc fusion (#70465)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70465

These tests check to ensure that
(a) the result after nnc fusion (of a single op) is the same as the
unfused op
(b) for certain ops where fusion is expected to occur, ensure that
fusion does actually occur

Test Plan: Imported from OSS

Reviewed By: wenleix

Differential Revision: D33595240

Pulled By: davidberard98

fbshipit-source-id: e2e17a921bc30c313e92e8e5bbc6c1b5fcd14bc1
(cherry picked from commit b1ba221acc)
2022-02-07 20:56:21 +00:00
Elias Ellison
defde3bb04 [NNC] Use index for stride mapping in kernel.cpp (#72266)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72266

Within the kernel, we may manipulate `Value *` in `OptimizeCat`, which would invalidate the input `Value *` -> Stride mapping.

Fix for https://github.com/pytorch/pytorch/issues/72173

Test Plan: Imported from OSS

Reviewed By: dagitses, davidberard98

Differential Revision: D33986306

Pulled By: eellison

fbshipit-source-id: dc33cd2b545e49e90d1e46b9fcf1e6dbb4b829db
(cherry picked from commit 5e4555968a)
2022-02-04 00:12:38 +00:00
Elias Ellison
aa99df5cc3 Check for grad mode enabled in dynamic shape fusion check (#72161)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72161

Following logic here: 3dce68fdf4/aten/src/ATen/core/tensor_type.cpp (L329)

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D33934368

Pulled By: eellison

fbshipit-source-id: 8555ef72070559905f65c6e883a7ae49e5bbbdc3
(cherry picked from commit 1db78befd6)
2022-02-02 04:40:22 +00:00
Elias Ellison
27a4d39756 NNC Dynamic Channels last fixes (#72032)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72032

This contains a few channels last changes from benchmarking:
- dont permute back to channels last on dynamic, cpu, perf is not good, and use cases for it are exotic atm
- remove the conditional one handling in permutting channels last symbolic tensor on cuda, it's not needed in the permutation case as tests show
- removing logic in torch/csrc/jit/tensorexpr/loopnest.cpp preventing inlining. the condition in checks is always valid given valid construction of ir

I can split up as needed.

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D33864652

Pulled By: eellison

fbshipit-source-id: f16674fb02dfff22670d8a2f856c5a317fd15717
(cherry picked from commit a9a0697839)
2022-02-01 19:07:02 +00:00
Elias Ellison
59a6375639 [NNC] Add Tests for Dynamic Shape Fusion Change default fusion strategy (#71651)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71651

The only tests that regress are because chunk NYI, the other tests that I touched were passing just because the `assertAllFused` wasn't working correctly. That, and we're no longer compiling conv/matmul w dynamic shapes

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D33801500

Pulled By: eellison

fbshipit-source-id: 074118ab4a975b7db876a4fcdfb9483afb879e79
(cherry picked from commit abaa7948c1)
2022-02-01 19:07:02 +00:00
Elias Ellison
f1499d6c18 Refactor PE so fusion specializations are configurable (#71650)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71650

*

Refactors PE so there is a current fusion strategy set, which will take in a vector of e.g. [(STATIC, 2), (DYNAMIC, 10)] which means fuse two static invocations then fuse 10 dynamic ones, then stop specializing.

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D33801501

Pulled By: eellison

fbshipit-source-id: ebc7ac3c57e35a3b9bb15ab751f0aa1d25cc9bd5
(cherry picked from commit 8dd89088d3)
2022-02-01 19:07:02 +00:00
Elias Ellison
cf1833df70 [WIP] add explicit dynamic fusion arg (#71173)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71173

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D33536222

Pulled By: eellison

fbshipit-source-id: a097408ecdd6e284432de128feb297993d882d52
(cherry picked from commit 0e3419b2d3)
2022-02-01 19:07:02 +00:00
Nikita Shulga
74c44ba9d6 Revert D33850228: [pytorch][PR] Implement Tanh Gelu Approximation
Test Plan: revert-hammer

Differential Revision:
D33850228 (23d03025dc)

Original commit changeset: 3cc33fb298e4

Original Phabricator Diff: D33850228 (23d03025dc)

fbshipit-source-id: 9436e7df73c2b2e2011f321674f24973316d3692
(cherry picked from commit c9efb58223)
2022-01-31 17:44:19 +00:00
Ryan Spring
23d03025dc Implement Tanh Gelu Approximation (#61439)
Summary:
1. Implements https://github.com/pytorch/pytorch/issues/39853
2. Adds approximate boolean flag to Gelu
3. Enables Tanh Gelu approximation
4. Adds double backward support for Gelu
5. Enable Tanh Gelu in NvFuser

```
def gelu(x, approximate : str = 'none'):
    if approximate == 'tanh':
        # sqrt(2/pi) = 0.7978845608028654
        return 0.5 * x * (1.0 + torch.tanh(0.7978845608028654 * (x + 0.044715 * torch.pow(x, 3.0))))
    else:
        return x * normcdf(x)
```

Linking XLA PR - https://github.com/pytorch/xla/pull/3039

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61439

Reviewed By: cpuhrsch

Differential Revision: D33850228

Pulled By: jbschlosser

fbshipit-source-id: 3cc33fb298e480d7ecc5c67716da019d60c6ab33
(cherry picked from commit 3a53b3e94f)
2022-01-31 17:07:45 +00:00
Joel Schlosser
cb823d9f07 Revert D33744717: [pytorch][PR] Implement Tanh Gelu Approximation
Test Plan: revert-hammer

Differential Revision:
D33744717 (f499ab9cef)

Original commit changeset: d64532a562ed

Original Phabricator Diff: D33744717 (f499ab9cef)

fbshipit-source-id: 396c3f63de5865f894dbc353d0790a01a624be93
(cherry picked from commit e9fb2d1db1)
2022-01-28 18:35:01 +00:00
Ryan Spring
f499ab9cef Implement Tanh Gelu Approximation (#61439)
Summary:
1. Implements https://github.com/pytorch/pytorch/issues/39853
2. Adds approximate boolean flag to Gelu
3. Enables Tanh Gelu approximation
4. Adds double backward support for Gelu
5. Enable Tanh Gelu in NvFuser

```
def gelu(x, approximate : str = 'none'):
    if approximate == 'tanh':
        # sqrt(2/pi) = 0.7978845608028654
        return 0.5 * x * (1.0 + torch.tanh(0.7978845608028654 * (x + 0.044715 * torch.pow(x, 3.0))))
    else:
        return x * normcdf(x)
```

Linking XLA PR - https://github.com/pytorch/xla/pull/3039

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61439

Reviewed By: mikaylagawarecki

Differential Revision: D33744717

Pulled By: jbschlosser

fbshipit-source-id: d64532a562ed53247bb4fa52bb16722634d5c187
(cherry picked from commit 4713dd9cca)
2022-01-28 16:59:09 +00:00
Mikhail Zolotukhin
bd6ec4efb4 [TensorExpr] Add lowerings for scalar binary ops (+,-,*,/,&,|,^,<<,>>,cmp). (#71298)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71298

Differential Revision:
D33576534
D33576534

Test Plan: Imported from OSS

Reviewed By: anjali411

Pulled By: ZolotukhinM

fbshipit-source-id: 93787b6f11180fcbfbacbb55e1bfb79700320a0e
(cherry picked from commit b2a8e83f97)
2022-01-26 06:32:51 +00:00
David Berard
8ba1ee6aa7 [tensorexpr][easy] add missing comma to test_jit_fuser_te.py (#71642)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71642

Missing comma was causing string concatenation in a list of strings

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D33713185

Pulled By: davidberard98

fbshipit-source-id: a2458629d78202713a5bb2f8c720ff9b81939c31
(cherry picked from commit b077598f1d)
2022-01-24 22:18:37 +00:00
Raghavan Raman
70c9146c40 [nnc] Update block and thread extents in cuda_codegen to use int64_t (#71428)
Summary:
The block and thread extent calculations in `cuda_codegen` should be using `int64_t` instead of `int`. The updated test, `test_dynamic_shapes`, fails without this change.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71428

Reviewed By: samdow

Differential Revision: D33640374

Pulled By: navahgar

fbshipit-source-id: 64c340ad2a9a1fa1fe066cf1c5dfc3b546b7be6d
(cherry picked from commit 6ea546ce11)
2022-01-19 23:21:24 +00:00
Elias Ellison
5480deb183 Add support for permutting dynamic fusion group outputs to channels last format (#70656)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70656

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D33458650

Pulled By: eellison

fbshipit-source-id: f0c7d20743deac7a87f7c9176e60da8100aefe41
2022-01-12 09:11:34 -08:00
Elias Ellison
39be20f259 [JIT][NNC] Add handling of strides to dynamic shape support. (#70464)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70464

Add handling of strided input tensors to dynamic fusion. This is done with the same set of input striding specializations as https://github.com/pytorch/pytorch/pull/60684/:
```
  S_ONE, // STRIDE_ONE: packed
  S_CONT, // STRIDE_CONTIGUOUS: stride[i + 1] * sizes[i + 1]
  S_TRAN_CONT, // STRIDE_TRANSPOSED_CONTIGUOUS: stride[i-1] * sizes[i-1]
  S_AS_ARG, // STRIDE_AS_ARG: stride passed in as runtime value
```
and then two additional specializations for a) contiguous tensor and b) channels-last tensor. channels-last is a common case and we should optimize for it. additionally, tensors natively store whether they are contiguous/channels-last contiguous, which makes it faster to check if tensors follow this pattern.

Output striding will be done in a follow up.

The striding is stored on both the TensorGroup node and on the guard node. The striding descriptors are stored as a vector of strings on the node for debugability and to make use of storing ivalues as attributes on nodes.

As an example:

```

%8 : Double(10, 11, 12, 13, strides=[1716, 1, 143, 11], requires_grad=0, device=cpu) = prim::TensorExprGroup_0[symbolic_shape_inputs=[-37, -36, -35, -34], striding_inputs_desc=[["TENSOR_CONT_CHANNELS_LAST"]](%x, %24, %23, %22, %21)```
```

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D33458649

Pulled By: eellison

fbshipit-source-id: c42616d3c683d70f6258180d23d3841a31a6030d
2022-01-12 09:11:31 -08:00