Commit Graph

247 Commits

Author SHA1 Message Date
Mike Ruberry
bb8baea932 [primTorch] flatten, squeeze, unsqueeze... (#77043)
This PR ...

Makes the following testing changes:

- Updates stride testing in test_python_reference_consistency to only check strides of dimensions with length > 1
- Creates reference inputs for reshape
- Creates reference inputs for chunk
- Extends the sample inputs for unsqueeze
- Extends the sample inputs for stack -- test_conj_view and test_neg_view are now xfailed
  - https://github.com/pytorch/pytorch/issues/77046

Makes the following architecture changes:
- Adds the refs.special (sub)module
- Adds the refs.nn.functional (sub)module

Adds the following prims:
- expand_dims
- view_of
- rev
- clone

Adds the following references:
  -  flatten
  - squeeze
  - unsqueeze
  - special.i0e
  - special.i1e
  - logical_or
  - logical_and
  - isclose
  - flip
  - stack
  - nn.functional.elu
  - chunk
  - clone
  - narrow

Identifies the following bugs in PyTorch today:
- https://github.com/pytorch/pytorch/issues/77054
- https://github.com/pytorch/pytorch/issues/77055

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77043
Approved by: https://github.com/ngimel
2022-05-09 11:24:55 +00:00
Elias Ellison
0d7be81c9c [JIT] Add Context Manager to force strict fusion
Fixes https://github.com/pytorch/pytorch/issues/75464 Adds a context manager that will throw if the ops in the context are not fused.

API is :
```
with torch.jit.strict_fusion():
    ...
```

A few TODOs:
[+] Compose/figure out how to do with autodiff - right now it will run on autodiff as well
[+] Support all of the nvfuser operators that are added in guarding
[+] Figure out what to do with control flow that isn't taken (right now it will just error). this is probably a source of the original issue :/  - will just error
[+] (After those are figured out) add to docs

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75777
Approved by: https://github.com/davidberard98
2022-04-25 16:08:57 +00:00
David Berard
1324410f2e [JIT] Reuse traced fn for jit opinfos
Previously, jit opinfos would only run the traced function once. This is a problem for NNC and NVFuser, where the fused implementation only runs on the second invocation.

This caches the traced function and calls the cached implementation, so that subsequent calls actually perform fusion and use the fused implementation.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76000

Approved by: https://github.com/eellison
2022-04-22 20:14:29 +00:00
Nikita Shulga
320e5a8268 Revert D34808051: [tensorexpr] Enabled aten::stack in the fuser pass with static shapes
Test Plan: revert-hammer

Differential Revision:
D34808051

Original commit changeset: 213e2ffdf87f

Original Phabricator Diff: D34808051

fbshipit-source-id: b618daeb346f784e8ab9525040edcb4a30a39613
(cherry picked from commit e47b973cba5c95e9410f8aecdfd5619de6d4be7c)
2022-03-31 04:25:43 +00:00
Hui Guo
90c3699cc8 [tensorexpr] Enabled aten::stack in the fuser pass with static shapes (#74077)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74077

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D34808051

Pulled By: huiguoo

fbshipit-source-id: 213e2ffdf87fb1a74104037cea7ef25e4bfd4307
(cherry picked from commit ad9e84842e5b47eda845827d325b08ba361a8286)
2022-03-31 04:25:43 +00:00
Elias Ellison
6694fdaccd Clean up profiling mode and profiling executor strategy (#73875)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73875

Previously we had a few settings:
- getExecutor - which toggled between Profiling Executor and Legacy
- getGraphOptimize - if true, overrides PE/Legacy to run with simple executor (no optimizations)
and then...
- getProfilingMode - which would set PE to 0 specializtions.

The last mode is redundant with getGraphOptimize, we should just remove it and use getGraphOptimize in these cases. It would lead to potentially invalid combinations of logic - what does mean if getProfilingMode is true but getExecutor is set to false ? This would lead to a bug in specialize_autograd_zero in this case, see: https://github.com/pytorch/pytorch/blob/master/torch%2Fcsrc%2Fjit%2Fpasses%2Fspecialize_autogradzero.cpp#L93.

The tests here are failing but get fixed with the PR above it, so i'll squash for landing.

Test Plan: Imported from OSS

Reviewed By: cpuhrsch

Differential Revision: D34938130

Pulled By: eellison

fbshipit-source-id: 1a9c0ae7f6d1cfddc2ed3499a5af611053ae5e1b
(cherry picked from commit cf69ce3d155ba7d334022c42fb2cee54bb088c23)
2022-03-29 18:38:51 +00:00
David Berard
f685dfaac1 [JIT] call super().setUp() in test_jit_fuser_te.py (#73762)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73762

TestCase.setUp() controls slowTest behavior, so calling super().setUp() will prevent fast tests from running in the slow test CI jobs.

example: https://github.com/pytorch/pytorch/runs/5413135014?check_suite_focus=true: despite PYTORCH_TEST_SKIP_FAST=1, TestTEFuserStatic tests are still running

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D34628769

Pulled By: davidberard98

fbshipit-source-id: 84311ec1db2ac60fcafb7b77f377e9ae2ef792e3
(cherry picked from commit 67fdba7fb9b73ce2b9119f4c4bc84e5b38041e21)
2022-03-11 01:03:54 +00:00
Ryan Spring
4f8b986e28 Implement Tanh Gelu Approximation (#61439)
Summary:
1. Implements https://github.com/pytorch/pytorch/issues/39853
2. Adds approximate boolean flag to Gelu
3. Enables Tanh Gelu approximation
4. Adds double backward support for Gelu
5. Enable Tanh Gelu in NvFuser

```
def gelu(x, approximate : str = 'none'):
    if approximate == 'tanh':
        # sqrt(2/pi) = 0.7978845608028654
        return 0.5 * x * (1.0 + torch.tanh(0.7978845608028654 * (x + 0.044715 * torch.pow(x, 3.0))))
    else:
        return x * normcdf(x)
```

Linking XLA PR - https://github.com/pytorch/xla/pull/3039

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61439

Reviewed By: VitalyFedyunin

Differential Revision: D33894937

Pulled By: jbschlosser

fbshipit-source-id: b65e8fb6ea66168af8f34f45ed50e92737a33851
(cherry picked from commit 6e986f91a9)
2022-02-14 03:40:32 +00:00
David Berard
2e04295790 [tensorexpr] support for fusing autocasting ops (#72478)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72478

aten::_autocast_to_reduced_precision and `aten::_autocast_to_full_precision are essentially just aten::to operations, so they can be fused the same way aten::to is fused.

Test Plan: Imported from OSS

Reviewed By: bdhirsh

Differential Revision: D34057522

Pulled By: davidberard98

fbshipit-source-id: f3b53641415702a4ac56460587801b9c76d81b3c
(cherry picked from commit 838ce5542e)
2022-02-10 18:12:36 +00:00
David Berard
bbd42c605a [JIT] Opinfo tests for nnc fusion - retry (#72486)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72486

Retry #70465.

Test Plan: Imported from OSS

Reviewed By: mikaylagawarecki

Differential Revision: D34061628

Pulled By: davidberard98

fbshipit-source-id: e27ed315bc4ad57cdbfbc9cedffcbb7886004524
(cherry picked from commit 7937808d2e)
2022-02-09 19:01:22 +00:00
Nikita Shulga
bb101ec78d Revert D33595240: [JIT] Opinfo tests for nnc fusion
Test Plan: revert-hammer

Differential Revision:
D33595240 (0b57bd4c66)

Original commit changeset: e2e17a921bc3

Original Phabricator Diff: D33595240 (0b57bd4c66)

fbshipit-source-id: 172a3ffd19d180b1b3617956b1f881be62f37bc9
(cherry picked from commit 324cfaea86)
2022-02-08 01:28:42 +00:00
David Berard
0b57bd4c66 [JIT] Opinfo tests for nnc fusion (#70465)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70465

These tests check to ensure that
(a) the result after nnc fusion (of a single op) is the same as the
unfused op
(b) for certain ops where fusion is expected to occur, ensure that
fusion does actually occur

Test Plan: Imported from OSS

Reviewed By: wenleix

Differential Revision: D33595240

Pulled By: davidberard98

fbshipit-source-id: e2e17a921bc30c313e92e8e5bbc6c1b5fcd14bc1
(cherry picked from commit b1ba221acc)
2022-02-07 20:56:21 +00:00
Elias Ellison
defde3bb04 [NNC] Use index for stride mapping in kernel.cpp (#72266)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72266

Within the kernel, we may manipulate `Value *` in `OptimizeCat`, which would invalidate the input `Value *` -> Stride mapping.

Fix for https://github.com/pytorch/pytorch/issues/72173

Test Plan: Imported from OSS

Reviewed By: dagitses, davidberard98

Differential Revision: D33986306

Pulled By: eellison

fbshipit-source-id: dc33cd2b545e49e90d1e46b9fcf1e6dbb4b829db
(cherry picked from commit 5e4555968a)
2022-02-04 00:12:38 +00:00
Elias Ellison
aa99df5cc3 Check for grad mode enabled in dynamic shape fusion check (#72161)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72161

Following logic here: 3dce68fdf4/aten/src/ATen/core/tensor_type.cpp (L329)

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D33934368

Pulled By: eellison

fbshipit-source-id: 8555ef72070559905f65c6e883a7ae49e5bbbdc3
(cherry picked from commit 1db78befd6)
2022-02-02 04:40:22 +00:00
Elias Ellison
27a4d39756 NNC Dynamic Channels last fixes (#72032)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72032

This contains a few channels last changes from benchmarking:
- dont permute back to channels last on dynamic, cpu, perf is not good, and use cases for it are exotic atm
- remove the conditional one handling in permutting channels last symbolic tensor on cuda, it's not needed in the permutation case as tests show
- removing logic in torch/csrc/jit/tensorexpr/loopnest.cpp preventing inlining. the condition in checks is always valid given valid construction of ir

I can split up as needed.

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D33864652

Pulled By: eellison

fbshipit-source-id: f16674fb02dfff22670d8a2f856c5a317fd15717
(cherry picked from commit a9a0697839)
2022-02-01 19:07:02 +00:00
Elias Ellison
59a6375639 [NNC] Add Tests for Dynamic Shape Fusion Change default fusion strategy (#71651)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71651

The only tests that regress are because chunk NYI, the other tests that I touched were passing just because the `assertAllFused` wasn't working correctly. That, and we're no longer compiling conv/matmul w dynamic shapes

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D33801500

Pulled By: eellison

fbshipit-source-id: 074118ab4a975b7db876a4fcdfb9483afb879e79
(cherry picked from commit abaa7948c1)
2022-02-01 19:07:02 +00:00
Elias Ellison
f1499d6c18 Refactor PE so fusion specializations are configurable (#71650)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71650

*

Refactors PE so there is a current fusion strategy set, which will take in a vector of e.g. [(STATIC, 2), (DYNAMIC, 10)] which means fuse two static invocations then fuse 10 dynamic ones, then stop specializing.

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D33801501

Pulled By: eellison

fbshipit-source-id: ebc7ac3c57e35a3b9bb15ab751f0aa1d25cc9bd5
(cherry picked from commit 8dd89088d3)
2022-02-01 19:07:02 +00:00
Elias Ellison
cf1833df70 [WIP] add explicit dynamic fusion arg (#71173)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71173

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D33536222

Pulled By: eellison

fbshipit-source-id: a097408ecdd6e284432de128feb297993d882d52
(cherry picked from commit 0e3419b2d3)
2022-02-01 19:07:02 +00:00
Nikita Shulga
74c44ba9d6 Revert D33850228: [pytorch][PR] Implement Tanh Gelu Approximation
Test Plan: revert-hammer

Differential Revision:
D33850228 (23d03025dc)

Original commit changeset: 3cc33fb298e4

Original Phabricator Diff: D33850228 (23d03025dc)

fbshipit-source-id: 9436e7df73c2b2e2011f321674f24973316d3692
(cherry picked from commit c9efb58223)
2022-01-31 17:44:19 +00:00
Ryan Spring
23d03025dc Implement Tanh Gelu Approximation (#61439)
Summary:
1. Implements https://github.com/pytorch/pytorch/issues/39853
2. Adds approximate boolean flag to Gelu
3. Enables Tanh Gelu approximation
4. Adds double backward support for Gelu
5. Enable Tanh Gelu in NvFuser

```
def gelu(x, approximate : str = 'none'):
    if approximate == 'tanh':
        # sqrt(2/pi) = 0.7978845608028654
        return 0.5 * x * (1.0 + torch.tanh(0.7978845608028654 * (x + 0.044715 * torch.pow(x, 3.0))))
    else:
        return x * normcdf(x)
```

Linking XLA PR - https://github.com/pytorch/xla/pull/3039

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61439

Reviewed By: cpuhrsch

Differential Revision: D33850228

Pulled By: jbschlosser

fbshipit-source-id: 3cc33fb298e480d7ecc5c67716da019d60c6ab33
(cherry picked from commit 3a53b3e94f)
2022-01-31 17:07:45 +00:00
Joel Schlosser
cb823d9f07 Revert D33744717: [pytorch][PR] Implement Tanh Gelu Approximation
Test Plan: revert-hammer

Differential Revision:
D33744717 (f499ab9cef)

Original commit changeset: d64532a562ed

Original Phabricator Diff: D33744717 (f499ab9cef)

fbshipit-source-id: 396c3f63de5865f894dbc353d0790a01a624be93
(cherry picked from commit e9fb2d1db1)
2022-01-28 18:35:01 +00:00
Ryan Spring
f499ab9cef Implement Tanh Gelu Approximation (#61439)
Summary:
1. Implements https://github.com/pytorch/pytorch/issues/39853
2. Adds approximate boolean flag to Gelu
3. Enables Tanh Gelu approximation
4. Adds double backward support for Gelu
5. Enable Tanh Gelu in NvFuser

```
def gelu(x, approximate : str = 'none'):
    if approximate == 'tanh':
        # sqrt(2/pi) = 0.7978845608028654
        return 0.5 * x * (1.0 + torch.tanh(0.7978845608028654 * (x + 0.044715 * torch.pow(x, 3.0))))
    else:
        return x * normcdf(x)
```

Linking XLA PR - https://github.com/pytorch/xla/pull/3039

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61439

Reviewed By: mikaylagawarecki

Differential Revision: D33744717

Pulled By: jbschlosser

fbshipit-source-id: d64532a562ed53247bb4fa52bb16722634d5c187
(cherry picked from commit 4713dd9cca)
2022-01-28 16:59:09 +00:00
Mikhail Zolotukhin
bd6ec4efb4 [TensorExpr] Add lowerings for scalar binary ops (+,-,*,/,&,|,^,<<,>>,cmp). (#71298)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71298

Differential Revision:
D33576534
D33576534

Test Plan: Imported from OSS

Reviewed By: anjali411

Pulled By: ZolotukhinM

fbshipit-source-id: 93787b6f11180fcbfbacbb55e1bfb79700320a0e
(cherry picked from commit b2a8e83f97)
2022-01-26 06:32:51 +00:00
David Berard
8ba1ee6aa7 [tensorexpr][easy] add missing comma to test_jit_fuser_te.py (#71642)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71642

Missing comma was causing string concatenation in a list of strings

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D33713185

Pulled By: davidberard98

fbshipit-source-id: a2458629d78202713a5bb2f8c720ff9b81939c31
(cherry picked from commit b077598f1d)
2022-01-24 22:18:37 +00:00
Raghavan Raman
70c9146c40 [nnc] Update block and thread extents in cuda_codegen to use int64_t (#71428)
Summary:
The block and thread extent calculations in `cuda_codegen` should be using `int64_t` instead of `int`. The updated test, `test_dynamic_shapes`, fails without this change.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71428

Reviewed By: samdow

Differential Revision: D33640374

Pulled By: navahgar

fbshipit-source-id: 64c340ad2a9a1fa1fe066cf1c5dfc3b546b7be6d
(cherry picked from commit 6ea546ce11)
2022-01-19 23:21:24 +00:00
Elias Ellison
5480deb183 Add support for permutting dynamic fusion group outputs to channels last format (#70656)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70656

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D33458650

Pulled By: eellison

fbshipit-source-id: f0c7d20743deac7a87f7c9176e60da8100aefe41
2022-01-12 09:11:34 -08:00
Elias Ellison
39be20f259 [JIT][NNC] Add handling of strides to dynamic shape support. (#70464)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70464

Add handling of strided input tensors to dynamic fusion. This is done with the same set of input striding specializations as https://github.com/pytorch/pytorch/pull/60684/:
```
  S_ONE, // STRIDE_ONE: packed
  S_CONT, // STRIDE_CONTIGUOUS: stride[i + 1] * sizes[i + 1]
  S_TRAN_CONT, // STRIDE_TRANSPOSED_CONTIGUOUS: stride[i-1] * sizes[i-1]
  S_AS_ARG, // STRIDE_AS_ARG: stride passed in as runtime value
```
and then two additional specializations for a) contiguous tensor and b) channels-last tensor. channels-last is a common case and we should optimize for it. additionally, tensors natively store whether they are contiguous/channels-last contiguous, which makes it faster to check if tensors follow this pattern.

Output striding will be done in a follow up.

The striding is stored on both the TensorGroup node and on the guard node. The striding descriptors are stored as a vector of strings on the node for debugability and to make use of storing ivalues as attributes on nodes.

As an example:

```

%8 : Double(10, 11, 12, 13, strides=[1716, 1, 143, 11], requires_grad=0, device=cpu) = prim::TensorExprGroup_0[symbolic_shape_inputs=[-37, -36, -35, -34], striding_inputs_desc=[["TENSOR_CONT_CHANNELS_LAST"]](%x, %24, %23, %22, %21)```
```

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D33458649

Pulled By: eellison

fbshipit-source-id: c42616d3c683d70f6258180d23d3841a31a6030d
2022-01-12 09:11:31 -08:00
Elias Ellison
0adc7cc546 Inline Fallback Functions For Debugging (#70463)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70463

Fix for https://github.com/pytorch/pytorch/issues/52940

When we call inlining on a fallback function, insert the runtime optimized version of its graph.

Test Plan: Imported from OSS

Reviewed By: jbschlosser, davidberard98

Differential Revision: D33458651

Pulled By: eellison

fbshipit-source-id: fd7e5e2b5273a1677014ba1a766538c3ee9cad76
2022-01-10 12:15:11 -08:00
Animesh Jain
6896b2d734 [NNC Testing] Randomized loop nest infrastructure (#70410)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70410

Trying again after #70174 was reverted. Earlier the env
variable was read into a static var in C++ causing state to be retained
and causing test failures. Static type is removed in this PR.

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D33321435

fbshipit-source-id: 6d108eb00cac9150a142ccc3c9a65a1867dd7de4
2022-01-06 16:21:42 -08:00
Mikhail Zolotukhin
0ee663d2fa Revert D33234529: [NNC Testing] Randomized loop nest infrastructure
Test Plan: revert-hammer

Differential Revision:
D33234529 (1d094587ea)

Original commit changeset: 9019f1f1d4ca

Original Phabricator Diff: D33234529 (1d094587ea)

fbshipit-source-id: a79deca9f186299bf884587eb7d50af2464979fb
2021-12-23 23:11:23 -08:00
Animesh Jain
1d094587ea [NNC Testing] Randomized loop nest infrastructure (#70174)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70174

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D33234529

fbshipit-source-id: 9019f1f1d4ca945c92bee401f7ec674b7d987de4
2021-12-22 22:07:39 -08:00
Mikhail Zolotukhin
3186d36972 [TensorExpr] Supress TracerWarnings in test_unsupported in test_jit_fuser_te.py. (#68757)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68757

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D32600951

Pulled By: ZolotukhinM

fbshipit-source-id: 7b9859d7dee1e9803b8fde5d071890a72d30cec9
2021-11-30 00:06:36 -08:00
Samantha Andow
e86058559a Op info for activation functions 2 (softsign, tanh, tanhshrink, threshold, celu, sigmoid, mish, hardsigmoid) (#67492)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67492

Reviewed By: zou3519

Differential Revision: D32282580

Pulled By: samdow

fbshipit-source-id: 115afe790328577357a90117bede3b6502590441
2021-11-09 12:57:38 -08:00
Natalia Gimelshein
417dc7f86c Revert D32007691: [pytorch][PR] Op info for activation functions 2 (softsign, tanh, tanhshrink, threshold, celu, sigmoid, mish, hardsigmoid)
Test Plan: revert-hammer

Differential Revision:
D32007691 (ea60e7d559)

Original commit changeset: 6cb14dc56e29

fbshipit-source-id: 9ef599ef07302fb521b1f413b989786adfa3576c
2021-11-08 21:16:53 -08:00
Samantha Andow
ea60e7d559 Op info for activation functions 2 (softsign, tanh, tanhshrink, threshold, celu, sigmoid, mish, hardsigmoid) (#67492)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67492

Reviewed By: mruberry

Differential Revision: D32007691

Pulled By: samdow

fbshipit-source-id: 6cb14dc56e296154e2f48249049c4d2fe4f4d10d
2021-11-08 14:30:50 -08:00
Richard Zou
05d1dcc14c Split channels_last test cases for tensor conversion OpInfos (#67368)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67368

This PR adds an addition test variant for the tensor conversion
functions (bfloat16, char, long, ...) that tests channels_last. This is
because some backends (mostly just functorch right now) don't have
channels last handling and may want to test that separately from the
more general case of these operations.

Test Plan: - wait for tests

Reviewed By: mruberry

Differential Revision: D31972959

Pulled By: zou3519

fbshipit-source-id: 68fea46908b2cdfeb0607908898bb8f9ef25b264
2021-11-03 07:39:41 -07:00
Ivan Kobzarev
7fbcf79684 [tensorexpr][nnc] Support quantization (#66676)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66676

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D31676329

Pulled By: IvanKobzarev

fbshipit-source-id: 288b41ff4ed603dfaacb465f296997f14bb23c22
2021-10-31 22:49:30 -07:00
Jane Xu
49251d05ec [skip ci] Set test owners for NNC tests (#66833)
Summary:
Action following https://github.com/pytorch/pytorch/issues/66232

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66833

Reviewed By: albanD

Differential Revision: D31907812

Pulled By: janeyx99

fbshipit-source-id: 5e5013b4276fd208ac68d61cf787679799695602
2021-10-26 07:46:18 -07:00
Bert Maher
bdb889aca1 [nnc] Use a descriptive name for fused kernels when profiling (#66990)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66990

NNC fusion groups currently show up as "TensorExpr" in the profiler,
which is true but not super useful since it obscures what's actually happening
in the fusion group.  This change will log them as `fused_XXX` where XXX is a
(length-limited) series of ops describing the subgraph, for instance
`fused_mul_add` to represent a group containing `aten::mul`, `aten::add`.

Test Plan: New unit test to check the output of autograd profiler.

Reviewed By: dzhulgakov

Differential Revision: D31762087

fbshipit-source-id: 3fadbdc67b054faa01aa42e5b6ea2c4a6bc3481f
2021-10-21 00:06:23 -07:00
kshitij12345
49a1d7bfcb [opinfo] elemwise parcel : isfinite, isinf, isposinf, isneginf, isnan, isreal (#66400)
Summary:
Adds OpInfo for `isfinite, isinf, isposinf, isneginf, isnan, isreal`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66400

Reviewed By: bdhirsh

Differential Revision: D31602998

Pulled By: mruberry

fbshipit-source-id: 235cc414f373f014f4822a72deb1a04a58ad4a7c
2021-10-14 10:11:57 -07:00
Richard Zou
5d4452937d OpInfos for some Tensor dtype conversion methods (#64282)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64282

OpInfos for:
- Tensor.bfloat16, Tensor.bool, Tensor.bypte, Tensor.char
- Tensor.double, Tensor.float, Tensor.half, Tensor.int
- Tensor.short, Tensor.long

None of these are supported by TorchScript. Also, the OpInfo autograd
test runner assumes that the operation is not allowed to change the
dtype of the argument, so only Tensor.double has
`supports_autograd=True` (in theory Tensor.bfloat16, Tensor.float,
Tensor.half should be differentiable).

Test Plan: - run tests

Reviewed By: dagitses

Differential Revision: D31452627

Pulled By: zou3519

fbshipit-source-id: b7f272e558558412c47aefe947af7f060dfb45c5
2021-10-14 09:13:30 -07:00
Mikhail Zolotukhin
5f1518609b [TensorExpr] Fix lowering for aten::t. (#65859)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65859

Test Plan: Imported from OSS

Reviewed By: VitalyFedyunin

Differential Revision: D31289347

Pulled By: ZolotukhinM

fbshipit-source-id: b9648416238657fe23366928e43ed63e992a8973
2021-10-12 01:26:36 -07:00
Mikhail Zolotukhin
6864146f2b [TensorExpr] Fix lowerings for aten::view and aten::reshape. (#65852)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65852

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D31286024

Pulled By: ZolotukhinM

fbshipit-source-id: eb5b5f2ed86b6f325f09904e841815b8183b4e1d
2021-10-12 01:26:34 -07:00
jjsjann123
d609957c95 patching graph_for (#55139)
Summary:
Allows individual DifferentiableGraphOp to display optimized forward graph. This improves user visibility to graph mutation via optimization pass, especially fusion.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55139

Reviewed By: albanD

Differential Revision: D31330909

Pulled By: dzhulgakov

fbshipit-source-id: c745b482fdc34876dc404cbe3bacd99dcf2ac724
2021-10-04 21:50:22 -07:00
Max Ren
0eaf081018 [JIT] canonicalize aten::rsub (#65014)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65014

ghstack-source-id: 138656948

Test Plan:
```
(pytorch) [maxren@devvm3115.atn0 ~/pytorch] python3 test/test_jit.py TestPeephole
CUDA not available, skipping tests
monkeytype is not installed. Skipping tests for Profile-Directed Typing
........s......................
----------------------------------------------------------------------
Ran 31 tests in 0.393s

OK (skipped=1)
(pytorch) [maxren@devvm3115.atn0 ~/pytorch] python3 test/test_jit.py TestPeephole.test_normalized_rsub
CUDA not available, skipping tests
monkeytype is not installed. Skipping tests for Profile-Directed Typing
.
----------------------------------------------------------------------
Ran 1 test in 0.015s

OK
```

Reviewed By: eellison

Differential Revision: D30941389

fbshipit-source-id: 03f0416d99090845c9bfb1e5fcf771d5f1d7a050
2021-09-22 17:20:46 -07:00
Raghavan Raman
cad7a4b0ea [nnc] Added an implementation of sign op (#64033)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64033

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D30579197

Pulled By: navahgar

fbshipit-source-id: f9f7fa7f2ffa109cf4e441eb1af821b8b891d4d3
2021-09-10 16:49:04 -07:00
Animesh Jain
18d24bb537 [NNC] Add Softplus operator (#64589)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64589

Adding softplus operator lowering for NNC. Enabling element wise fusion as well.

Test Plan: Added a test in test_jit_fuser.py

Reviewed By: bertmaher

Differential Revision: D30736449

fbshipit-source-id: 6c5fc3bceb5cef2322ecd4449f827e4af018ea93
2021-09-08 10:49:58 -07:00
Elias Ellison
bccbe310ef Add view with negative dim (#63516)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63516

how to review: pretty much just check that the inputs generated are a good representation of the op semantics, that should be sufficient for correctness, and then you can also double check the op size semantics by going to https://codebrowser.bddppq.com/pytorch/pytorch/ typing in native::{op_name} and looking at the op implementation as a bonus if you want

Test Plan: Imported from OSS

Reviewed By: driazati

Differential Revision: D30738143

Pulled By: eellison

fbshipit-source-id: c7cd01cb2c8a13cb2664415f3d98aedec19a8e07
2021-09-07 18:22:28 -07:00
Bert Maher
e7fb35021a [nnc] Enable fusion of bfloat16 ops (#64196)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64196

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D30643864

Pulled By: bertmaher

fbshipit-source-id: e95edeaf7089464d713ea1d1f951743d3e5f61c5
2021-08-30 20:09:36 -07:00
Bert Maher
ebc0aacf83 [nnc] Fix half2float conversion and re-enable float16 (#64199)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64199

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D30643865

Pulled By: bertmaher

fbshipit-source-id: 9de6adca53bd08839328cbaf6364f7de9550264b
2021-08-30 18:37:55 -07:00
Bert Maher
4f969db325 [nnc] Fix batchnorm implementation (#64112)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64112

Fixes #64062

Test Plan: Imported from OSS

Reviewed By: zhxchen17

Differential Revision: D30622897

Pulled By: bertmaher

fbshipit-source-id: 7d7c6131aa786e61fa1d0a517288396a0bdb1d22
2021-08-28 19:20:35 -07:00
Bert Maher
8dda299d96 Re-apply: [nnc] Support thread level parallelism in fused kernels (#63776)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63776

I reverted this out of an abundance of caution because some test
failures occurred, but they were all due to precision issues fixed lower in
this stack.  Let's try again.

I've rolled the elimination of the allow-parallelism-in-fusions toggle into
this diff since they're pretty tightly coupled.
ghstack-source-id: 136529847

Test Plan: CI

Reviewed By: huiguoo

Differential Revision: D30484555

fbshipit-source-id: 38fd33520f710585d1130c365a8c60c9ce794a59
2021-08-24 18:56:55 -07:00
Bert Maher
543130511a [nnc] Disable erf and erfc (#63775)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63775

These introduce small accuracy differences that cause some internal
tests to fail, and it's not worth fixing the tests right now because they're
slower than the ATen ops anyways.
ghstack-source-id: 136526229

Test Plan:
```
buck test mode/dev //aml/eccv/mcm/training:tests -- --exact 'aml/eccv/mcm/training:tests - test_build_torch_script_model (aml.eccv.mcm.training.tests.publish_helper_tests.TransformerPredictorPublishHelperTests)'
```

Reviewed By: navahgar

Differential Revision: D30484557

fbshipit-source-id: 095a9c810539a499105b76e1d96843dbc61b0079
2021-08-24 18:55:45 -07:00
Bert Maher
76da46ccdc Revert D30417127: Remove flag to toggle CPU fusion in the presence of parallelism
Test Plan: revert-hammer

Differential Revision:
D30417127 (6600bc9651)

Original commit changeset: b77d7c68364f

fbshipit-source-id: 6b52fb83a84fe241945e3cb3eeb71050d1d9c8f1
2021-08-21 03:38:07 -07:00
Philip Meier
70a3210eca Add BinaryUfuncOpInfo and broadcasting tests (#61964)
Summary:
As proof of concept, this PR uses the new `BinaryUfuncOpInfo` in broadcasting tests for `add`, `sub`, `mul`, `div`, `floor_div`, and `true_div`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61964

Reviewed By: ngimel

Differential Revision: D30407734

Pulled By: mruberry

fbshipit-source-id: ada28994f43b0635f279f45a02ecba18bc8ee033
2021-08-20 11:44:15 -07:00
Bert Maher
6600bc9651 Remove flag to toggle CPU fusion in the presence of parallelism (#63514)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63514

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D30417127

Pulled By: bertmaher

fbshipit-source-id: b77d7c68364f2af73570740540f3b1152313016e
2021-08-20 11:18:19 -07:00
Philip Meier
99203580a9 Updates internal assert_allclose callsites in favor of assert_close (#61841)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61841

Redo of #60863.

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D30408145

Pulled By: mruberry

fbshipit-source-id: 0b34ebc7f23ba38ecd89640b61d8aca59b7eab58
2021-08-19 12:50:41 -07:00
Shen Li
1022443168 Revert D30279364: [codemod][lint][fbcode/c*] Enable BLACK by default
Test Plan: revert-hammer

Differential Revision:
D30279364 (b004307252)

Original commit changeset: c1ed77dfe43a

fbshipit-source-id: eab50857675c51e0088391af06ec0ecb14e2347e
2021-08-12 11:45:01 -07:00
Zsolt Dollenstein
b004307252 [codemod][lint][fbcode/c*] Enable BLACK by default
Test Plan: manual inspection & sandcastle

Reviewed By: zertosh

Differential Revision: D30279364

fbshipit-source-id: c1ed77dfe43a3bde358f92737cd5535ae5d13c9a
2021-08-12 10:58:35 -07:00
Kushashwa Ravi Shrimali
a705b8f08f OpInfo for nn.functional.relu (#62076)
Summary:
See https://github.com/facebookresearch/functorch/issues/78 and https://github.com/pytorch/pytorch/issues/54261.

cc: mruberry

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62076

Reviewed By: soulitzer

Differential Revision: D30013262

Pulled By: zou3519

fbshipit-source-id: 7df5e930d1588146e09cf58c53c8860392da7348
2021-08-04 15:50:18 -07:00
Bert Maher
93772792e3 [nnc] Get rid of fuser trigger counters (#57334)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57334

Here's a possibly controversial PR.  These counters got in the way of
generalizing the fuser tests to handle arbitrary devices, and I guess I'm just
generally skeptical that they provide much value.  While true that they let us
observe whether fusion groups were created, we already have assertions based on
the shape of the graph, and I'm not sure that I trust those any less than these
counters.

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D29471484

Pulled By: bertmaher

fbshipit-source-id: f6d76f6e72dbfb581acff1d834b0c74500941b57
2021-06-29 22:22:15 -07:00
Bert Maher
1a0058f593 [nnc] Merge inconsistent profiling information (#60510)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60510

We encountered a situation where loop unrolling caused us to duplicate
profiled tensor types in a manner that wasn't logically consistent (see the
attached test case).  When applying this profiling information, we need to
merge the profiled types so that we use a conservative (unspecialized) type.
ghstack-source-id: 132160002

Test Plan: new unit test, plus local predictor using P424983338

Reviewed By: Krovatkin

Differential Revision: D29322487

fbshipit-source-id: 4c18ee69c71bb0622c2e6f6aa361ab5613cbaca4
2021-06-23 17:05:32 -07:00
Mikhail Zolotukhin
d9e7df707b [TensorExpr] Add NNC lowerings for aten::mean, aten::addmm, and aten::adaptive_avg_pool2d. (#59347)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59347

We had external call wrappers for them, but they were not used in NNC.
This PR adds lowerings using these ext calls and fixes some bugs in
them.

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D28853832

Pulled By: ZolotukhinM

fbshipit-source-id: 1718400368e1a9cf3f19180ee2290a4ed9c99d41
2021-06-18 11:56:32 -07:00
Bert Maher
469f0e42d6 [nnc] Handle more cases of excessive # of cat args (#60043)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60043

And add a unit test

Test Plan: new unit test

Reviewed By: navahgar

Differential Revision: D29146547

fbshipit-source-id: 31532926032dbef70d163930f3d8be160f5eacc3
2021-06-15 18:19:52 -07:00
Mikhail Zolotukhin
daa35141e8 Reland: "[TensorExpr] Fix handling of 0-dim tensors." (#59508)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59508

An assert that was triggering in a previous version is now relaxed to
take 0-dim tensors into account.

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D28918342

Pulled By: ZolotukhinM

fbshipit-source-id: c09b62c9725d1603b0ec11fcc051e7c932af06ae
2021-06-08 22:48:17 -07:00
kshitij12345
96ac0e0340 OpInfo: t (#59442)
Summary:
Reference: https://github.com/pytorch/pytorch/issues/54261

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59442

Reviewed By: agolynski

Differential Revision: D28898946

Pulled By: mruberry

fbshipit-source-id: be32429fa7306554e4912fdcc382593d00c9f4ad
2021-06-05 18:59:38 -07:00
Akifumi Imanishi
0a5bfa9919 Support __rmod__ (#58476)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/58035.

This PR implements `torch.Tensor.__rmod__` and `torch.remainder(scalar, tensor)` for the compatibility with NumPy’s interface.
(cc: mruberry, rgommers, emcastillo, kmaehashi)

TODO:
  - [x] Update `tensor_binary_op` in test/test_binary_ufuncs.py after https://github.com/pytorch/pytorch/issues/58216 is merged.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58476

Reviewed By: ngimel

Differential Revision: D28776810

Pulled By: mruberry

fbshipit-source-id: 74f8aea80f439ef2cc370333524e39971eeb7bf4
2021-06-05 16:19:24 -07:00
Nikita Shulga
ba3a90b55e Revert D28819780: [TensorExpr] Fix handling of 0-dim tensors.
Test Plan: revert-hammer

Differential Revision:
D28819780

Original commit changeset: f3feff35a1ce

fbshipit-source-id: 1dca4ac9cea0b67e9f02800f6d5b3c7e4ae1d81a
2021-06-04 19:25:30 -07:00
Bert Maher
6309b342c3 [nnc] Enable CPU fuser inside FB, take 5 (#59461)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59461

long tail test failues
ghstack-source-id: 130607578

Test Plan: fixed T92123560

Reviewed By: navahgar

Differential Revision: D28892885

fbshipit-source-id: 762a275b5aa14af0847e46cbf4036d3342b82189
2021-06-04 16:26:46 -07:00
Bert Maher
f5e3eae82a [nnc] Infer device type from nodes if inputs are all scalars (#59430)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59430

With constant support added, we can now have fusion groups with only
scalar inputs.  So, we need to get the device type from the nodes in the graph
rather than just the inputs.
ghstack-source-id: 130613871

Test Plan: new unit test; also see test_tracer test_trace_of_script

Reviewed By: navahgar

Differential Revision: D28891989

fbshipit-source-id: f9e824acbd4856216b85a135c8cb60a2eac3c628
2021-06-04 16:25:33 -07:00
anjali411
3607478ecd Conjugate View (#54987)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54987

Based off of ezyang (https://github.com/pytorch/pytorch/pull/44799) and bdhirsh (https://github.com/pytorch/pytorch/pull/43702) 's prototype:

Here's a summary of the changes in this PR:
This PR adds a new dispatch key called Conjugate. This enables us to make conjugate operation a view and leverage the specialized library functions that fast path with the hermitian operation (conj + transpose).

1. Conjugate operation will now return a view with conj bit (1) for complex tensors and returns self for non-complex tensors as before. This also means `torch.view_as_real` will no longer be a view on conjugated complex tensors and is hence disabled. To fill the gap, we have added `torch.view_as_real_physical` which would return the real tensor agnostic of the conjugate bit on the input complex tensor. The information about conjugation on the old tensor can be obtained by calling `.is_conj()` on the new tensor.
2. NEW API:
    a) `.conj()` -- now returning a view.
    b) `.conj_physical()` -- does the physical conjugate operation. If the conj bit for input was set, you'd get `self.clone()`, else you'll get a new tensor with conjugated value in its memory.
    c) `.conj_physical_()`, and `out=` variant
    d) `.resolve_conj()`  -- materializes the conjugation. returns self if the conj bit is unset, else returns a new tensor with conjugated values and conj bit set to 0.
    e) `.resolve_conj_()` in-place version of (d)
    f) `view_as_real_physical` -- as described in (1), it's functionally same as `view_as_real`, just that it doesn't error out on conjugated tensors.
    g) `view_as_real` -- existing function, but now errors out on conjugated tensors.
3. Conjugate Fallback
    a) Vast majority of PyTorch functions would currently use this fallback when they are called on a conjugated tensor.
    b) This fallback is well equipped to handle the following cases:
        - functional operation e.g., `torch.sin(input)`
        - Mutable inputs and in-place operations e.g., `tensor.add_(2)`
        - out-of-place operation e.g., `torch.sin(input, out=out)`
        - Tensorlist input args
        - NOTE: Meta tensors don't work with conjugate fallback.
4. Autograd
    a) `resolve_conj()` is an identity function w.r.t. autograd
    b) Everything else works as expected.
5. Testing:
    a) All method_tests run with conjugate view tensors.
    b) OpInfo tests that run with conjugate views
        - test_variant_consistency_eager/jit
        - gradcheck, gradgradcheck
        - test_conj_views (that only run for `torch.cfloat` dtype)

NOTE: functions like `empty_like`, `zero_like`, `randn_like`, `clone` don't propagate the conjugate bit.

Follow up work:
1. conjugate view RFC
2. Add neg bit to re-enable view operation on conjugated tensors
3. Update linalg functions to call into specialized functions that fast path with the hermitian operation.

Test Plan: Imported from OSS

Reviewed By: VitalyFedyunin

Differential Revision: D28227315

Pulled By: anjali411

fbshipit-source-id: acab9402b9d6a970c6d512809b627a290c8def5f
2021-06-04 14:12:41 -07:00
Mikhail Zolotukhin
d60efd8207 [TensorExpr] Fix handling of 0-dim tensors. (#59279)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59279

There were some issues with how we handle 0-dim cases in lowerings and
also in how we generate reductions in that special case. This PR fixes
those issues and reenables a bunch of tests.

Differential Revision:
D28819780
D28819780

Test Plan: Imported from OSS

Reviewed By: navahgar

Pulled By: ZolotukhinM

fbshipit-source-id: f3feff35a1ce11821ada2f8d04ae9d4be10dc736
2021-06-04 13:58:15 -07:00
Bert Maher
c3bf42e0d8 Fix symbolic derivative of hardswish (#59405)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59405

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D28879698

Pulled By: bertmaher

fbshipit-source-id: 2f2d9836bf592b18ed9a19aab4f5967e653b5898
2021-06-03 23:12:18 -07:00
Bert Maher
9ac954789d [nnc] Add hardsigmoid (#59069)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59069

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D28738166

Pulled By: bertmaher

fbshipit-source-id: d9f5b87ef1f2323a3631add79c2670ce794f911e
2021-06-03 23:10:36 -07:00
kshitij12345
ea465f7378 OpInfo: true_divide and minor fix (#59154)
Summary:
Reference: https://github.com/pytorch/pytorch/issues/54261

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59154

Reviewed By: ngimel

Differential Revision: D28780115

Pulled By: mruberry

fbshipit-source-id: 91e254698597fa0c7d4df6053ec017a85e180304
2021-05-30 18:35:30 -07:00
Mikhail Zolotukhin
27009d6129 [TensorExpr] Add NNC lowerings for aten::view, aten::reshape and aten::expand_as. (#59157)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59157

Currently view is represented as a copy since we don't support inplace
operations in NNC (similar to `aten::reshape`).  Lowering for
`aten::expand_as` is exactly the same as for the `aten::expand`, since
we're building the TE expression basing on the output shape anyway.

Differential Revision:
D28774224
D28774224

Test Plan: Imported from OSS

Reviewed By: Chillee

Pulled By: ZolotukhinM

fbshipit-source-id: 0a1593c4c6500dcc5a374213adb734180ae1f72e
2021-05-29 20:36:32 -07:00
Horace He
a427820350 [NNC] Added triangular_solve external call + fixed permute (#59131)
Summary:
The triangular_solve only returns the first input, since the second input is just a copy of the first one. Why does that exist?

Also, I fixed the permute lowering - I was previously doing the inverse application of the permute.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59131

Reviewed By: ansley

Differential Revision: D28768169

Pulled By: Chillee

fbshipit-source-id: 8e78611c6145fb2257cb409ba98c14ac55cdbccf
2021-05-28 22:29:30 -07:00
kshitij12345
c9af4c2636 OpInfo: where (#58349)
Summary:
Reference: https://github.com/pytorch/pytorch/issues/54261

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58349

Reviewed By: mrshenli

Differential Revision: D28744220

Pulled By: mruberry

fbshipit-source-id: 893a2fb88a48a60df75c7d6e2f58a42ca949daa7
2021-05-28 18:22:03 -07:00
Kushashwa Ravi Shrimali
0c1420aa3c OpInfo: fmod and remainder (#57941)
Summary:
See https://github.com/pytorch/pytorch/issues/54261

cc: mruberry Lezcano kshitij12345

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57941

Reviewed By: mrshenli

Differential Revision: D28744464

Pulled By: mruberry

fbshipit-source-id: 19847277d4f8d3a39a706c2b3c9eddf0dedcb20c
2021-05-27 20:32:56 -07:00
Bin Bao
7e4e648c2a Enable NNC fusion for relu6 (#58773)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58773

Test Plan:
```
python test/test_ops.py -k relu6
python test/test_jit_fuser_te.py
```

Reviewed By: bertmaher

Differential Revision: D28721791

Pulled By: desertfire

fbshipit-source-id: a94f711977afd080faae052f66eb8dded3cdc79e
2021-05-27 10:54:02 -07:00
Bert Maher
e24362746a [nnc] Concat input shapes must be known to fuse (#58974)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58974

I don't know how we overlooked this for so long...
ghstack-source-id: 129932134

Test Plan:
Predictor test of model 184778294_0 using multiple request replay
threads.  It's not clear to me why multithreading matters, except that perhaps
it makes it easier to get an unknown shape in the profile.

Reviewed By: navahgar

Differential Revision: D28702660

fbshipit-source-id: 565550b1d2e571d62d0c8b21150193f2a7ace334
2021-05-26 11:29:26 -07:00
Horace He
6093161158 Separated out working tests from not working tests for NNC OpInfo (#58788)
Summary:
This gets rid of a lot of the try/else rigamarole.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58788

Reviewed By: ZolotukhinM

Differential Revision: D28621054

Pulled By: Chillee

fbshipit-source-id: d0d8a1b6466eb318d939a1ed172b78f492ee0d5b
2021-05-22 02:24:23 -07:00
Horace He
e56d3b0238 Added OpInfo tests for NNC (#58719)
Summary:
Finds a couple of bugs:

1. permute needs to wrap dimensions
2. slice needs to wrap dimensions
3. frac doesn't work correctly for negative values
4. Permute has some other failures.

This PR also fixes 1 + 2.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58719

Reviewed By: SplitInfinity

Differential Revision: D28590457

Pulled By: Chillee

fbshipit-source-id: a67fce67799602f9396bfeef615e652364918fbd
2021-05-21 01:41:28 -07:00
Edvard Ghazaryan
5211eeb22b Support aten::leaky_relu for TE (#58464)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58464

Test Plan:
./bin/test_tensorexpr

python test/test_jit_fuser_te.py TestTEFuser.test_unary_ops

Reviewed By: Krovatkin

Differential Revision: D28499776

fbshipit-source-id: 20094a1bc78aa485f76aec4e065ff69e43d692d7
2021-05-20 16:12:03 -07:00
Bert Maher
3d20ddfe92 [nnc] Do not fuse unsqueeze with variable dim (#58346)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58346

If `dim` is a variable, NNC doesn't know how to translate the result,
since the shape is unknown.  This issue manifested as a `bad_variant_access`
when we try to pull an int constant out of that arg.

Note that, while the PE will pick up the resultant shape, it won't set guards accordingly.
ghstack-source-id: 129078971

Test Plan: new fuser test

Reviewed By: navahgar

Differential Revision: D28460956

fbshipit-source-id: 57ef918ef309ee57bfdf86717b910b6549750454
2021-05-18 21:44:37 -07:00
Bert Maher
6b8b591a84 [nnc] Fix output restriding of size-1 dimensions (#58256)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58256

Size-1 dims mess up our output restriding logic, because they're
technically "dense" no matter what stride the dimension has.  In this example a
size-1 dim has stride 1, which causes all the indices to be taken mod 1 (i.e.,
all indices become 0).  We work around this peculiar case by skipping size-1 in
our layout logic, since it has no impact on the rest of the tensor's indexing.
ghstack-source-id: 128932739

Test Plan:
new unit test, plus
```
buck test mode/dev //langtech/mobile/audio_stream_processor:audio_stream_processor_test -- --exact 'langtech/mobile/audio_stream_processor:audio_stream_processor_test - AudioStreamProcessorTest.DemucsReadWriteFloat'
```

Reviewed By: eellison

Differential Revision: D28424388

fbshipit-source-id: e33e39eef2a5bf2797bee78a5987558308b6d110
2021-05-14 00:09:12 -07:00
Nick Korovaiko
c524448dd1 init hardshrink (#57749)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57749

add to a fx test

Test Plan: Imported from OSS

Reviewed By: huiguoo

Differential Revision: D28425974

fbshipit-source-id: 195c7a1944decb7a2a99c2831cab38485f32be17
2021-05-13 19:38:05 -07:00
Mikhail Zolotukhin
470cd64514 [TensorExpr] Remove disabled tests that we do not plan to re-enable. (#58207)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58207

We probably don't even know what these tests check and there are no
plans on re-enabling them - let's just nuke them to keep the code clean.

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D28403251

Pulled By: ZolotukhinM

fbshipit-source-id: fe12e978636a74f309f57e3408ab78d459fe4d29
2021-05-13 09:19:20 -07:00
Mikhail Zolotukhin
a0f4b7cd48 [TensorExpr] Re-enable skipped tests, they seem to be working now. (#58206)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58206

Tested on CUDA with and without `PYTORCH_TENSOREXPR_DONT_USE_LLVM=1`.

Closes #48053.

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D28403250

Pulled By: ZolotukhinM

fbshipit-source-id: 1ae1cfed691e0077a37db646937e580fbd32b23f
2021-05-13 09:18:09 -07:00
Bert Maher
6955d4d0f7 [nnc] Handle only the first argument of aten::to (#58028)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58028

We were trying to translate the device argument and thus throwing an
unsupported dtype.
ghstack-source-id: 128748658

Test Plan: predictor models

Reviewed By: navahgar

Differential Revision: D28347704

fbshipit-source-id: 331a5786339e01f9df1b1878970b0c5983a92980
2021-05-12 12:52:29 -07:00
Bert Maher
f97650e70b [nnc] Fix float->bool conversion on cpu (#57798)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57798

Our instruction sequence was just plain wrong, instead of `fcmp une %x, +0.0`
(unordered equal 0.0) we were doing `fcmp uno`, which is just an unordered check
(i.e., is either side NaN).
ghstack-source-id: 128586464

Test Plan: New unit test against the full cross-product of dtypes.

Reviewed By: navahgar

Differential Revision: D28276269

fbshipit-source-id: ba5e59778e07770fb78ef02309f10edde333a800
2021-05-10 18:31:38 -07:00
Elias Ellison
241c2f4496 Add Gelu To NNC (#57753)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57753

I'm not adding symbolic gradient because that is being added in https://github.com/pytorch/pytorch/pull/46785.

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D28262765

Pulled By: eellison

fbshipit-source-id: be365a2d392d7ac4bcc099a184762249ec2e18a6
2021-05-06 16:04:50 -07:00
Elias Ellison
7627dd568a hardswish reland (#57652)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57652

Test Plan: Imported from OSS

Reviewed By: Krovatkin

Differential Revision: D28226724

Pulled By: eellison

fbshipit-source-id: 585a91ffab7a855b5600e79130a37be25ef9b354
2021-05-05 17:21:43 -07:00
Shen Li
887d0e5657 Revert D28197820: [JIT][NNC] add hardswish symbolic gradient and NNC lowering
Test Plan: revert-hammer

Differential Revision:
D28197820 (0142fd0b57)

Original commit changeset: 05305d85c5bb

fbshipit-source-id: 2e1d9699515982ba2a9be06e83a2ce043ec857ee
2021-05-05 07:53:30 -07:00
eellison
0142fd0b57 [JIT][NNC] add hardswish symbolic gradient and NNC lowering (#57383)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57383

Notes: I picked up an activation from https://github.com/pytorch/pytorch/issues/56969. You can look at the [activations.cpp](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/cpu/Activation.cpp#L429) file which has both forward and backward kernel code to help you write the NNC lowering and the symbolic gradient.

I added a test in test_jit_fuser_te for the fusion, and I added an OpInfo and asserted that we expect to see autodiffable nodes to test the symbolic gradient.

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D28197820

Pulled By: eellison

fbshipit-source-id: 05305d85c5bb0847c8f911b95ba47b137dca7e90
2021-05-04 23:39:59 -07:00
Bert Maher
151e81b7bc [nnc][tests] Skip long running tests when using TE interpreter (#57568)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57568

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D28202740

Pulled By: bertmaher

fbshipit-source-id: 3f88aed91cd92c270ea5e6b504ae5ebc6810aa2b
2021-05-04 16:57:48 -07:00
Bert Maher
7c8a7efe3f [nnc] Enable all fuser tests for cpu (#57332)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57332

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D28113481

Pulled By: bertmaher

fbshipit-source-id: b55e4bbcc25a09614b37985873b72337fdefc6b0
2021-04-30 10:11:06 -07:00
Bert Maher
17b8a4db1c [nnc] Support pow on CPU (#56308)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56308

But only for float tensors.  Even on CUDA, int tensors just have weird
behavior with pow, and I bet FP is so much more common that it's just not worth
trying to fuse ints here.
ghstack-source-id: 126769637

Test Plan: `pytest test_jit_fuser_te.py -k test_binary_pow`

Reviewed By: navahgar

Differential Revision: D27834694

fbshipit-source-id: 7274d72cf02ab95d63574b6c17995b8f34560810
2021-04-20 15:13:03 -07:00
Mikhail Zolotukhin
5f19385588 [TensorExpr] Add aten::matmuls to TE fuser. (#54605)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54605

For small sizes we generate a naive 3-layer loopnest, for bigger sizes
we generate an external call.

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D27298364

Pulled By: ZolotukhinM

fbshipit-source-id: 2ddf275ff68d6fca16a3befca5ce5c26aef462b5
2021-04-16 12:54:38 -07:00
Bert Maher
8e82e932f3 Reland: D27652485: [nnc] Enable CPU fusion only when num_threads == 1" (#56120)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56120

This reverts commit ad17fadbfc (D27786457).

The big annoyance here is that depending on the threading mode you may not be
able to toggle num_threads at will, so the fusion tests won't fail.

I hate this solution, but I'm adding a secondary override for the TE fuser.
Now you need to both turn on fusion (_jit_override_can_fuse_on_cpu), and you're
OK if you're running with 1 thread, or you can add
`_jit_set_texpr_parallel_cpu_enabled` to enable it anyways.

This is (a) mainly for tests, since a real user probably won't fiddle aimlessly
with the thread count, and (b) will go away once NNC's threading support is
fully baked.

Test Plan: Imported from OSS

Reviewed By: Krovatkin

Differential Revision: D27788199

Pulled By: bertmaher

fbshipit-source-id: 070d04474f15e9689dbdf8cc1fde43050c6506b1
2021-04-15 15:50:18 -07:00