Commit Graph

67 Commits

Author SHA1 Message Date
Wenlei Xie
22ecb8885f Disable device check for foreach kernels (#56871)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56871

foreach kernels fall back to slow path when tensor are on different devices

Generated by codemod:
```
fastmod '(- func: _foreach.*)' '${1}
  device_check: NoCheck   # foreach kernels fall back to slow path when tensor are on different devices'   aten/src/ATen/native/native_functions.yaml
```
ghstack-source-id: 127914017

Test Plan: autotest

Reviewed By: ezyang

Differential Revision: D27986560

fbshipit-source-id: b0cd963cdba04b4e1589bbf369eb26b48d523968
2021-05-01 12:02:07 -07:00
Wenlei Xie
183320df96 Add device_check place holder for functions (#56870)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56870

Automatic generation of device check code will be supported in
following PRs.

Changed are generaetd via:

1. Codemod
```
fastmod '  device_guard: False' '  device_check: NoCheck
  device_guard: False' aten/src/ATen/native/native_functions.yaml
```

2. Python script: https://gist.github.com/wenleix/be20c34bbbfcee0b289cdea2cf15b16c
ghstack-source-id: 127914016

Test Plan: auto test

Reviewed By: ezyang

Differential Revision: D27986427

fbshipit-source-id: 4e598a30306b80b5ade27af70d3e58770e401fc2
2021-05-01 12:02:05 -07:00
Peter Bell
8b3bf98cb8 Tell codegen that SparseCsrCUDA is cuda (#56602)
Summary:
Follow up to https://github.com/pytorch/pytorch/issues/50937, Fixes build failures in https://github.com/pytorch/pytorch/issues/56561

Currently SparseCsrCUDA is included in cpu build and also doesn't get code-generated device guards. This fixed both issues.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56602

Reviewed By: albanD

Differential Revision: D27921001

Pulled By: ezyang

fbshipit-source-id: 2b3b0b66d0a7c5ef96e0817d8852d511dd954ae4
2021-04-22 11:57:10 -07:00
Brian Hirsh
76fbd755c1 Reland of "D27708346: generate xla codegen in-tree" (#56601)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56601

Updating it to ensure that RegistrationDeclarations.yaml is completely
unchanged

This reverts commit 90e532f3ef.

Test Plan: Imported from OSS

Reviewed By: ailzhang

Differential Revision: D27915305

Pulled By: bdhirsh

fbshipit-source-id: 491a025c44221690dad849f9a2166934130c0fec
2021-04-21 19:36:31 -07:00
Scott Wolchok
1211bccc65 [PyTorch] Fix const correctness for resize native functions (#55351)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55351

We incorrectly used `Tensor&` to mean "the underlying
TensorImpl cannot be changed", as explained in
https://github.com/zdevito/ATen/issues/27#issuecomment-330717839 .
This diff gets us on the path to fixing this problem: we have an
incremental way to fix individual native functions so that we can
apply any handwritten fixes a few at a time. It gets the migration
started with the `resize` family of native functions.
ghstack-source-id: 127092677

Test Plan: fitsships

Reviewed By: ezyang

Differential Revision: D27583983

fbshipit-source-id: 4eeeec85f5d268e9d0f1645eb9396914a9f9557f
2021-04-21 14:51:41 -07:00
Brian Hirsh
90e532f3ef Revert D27708346: generate xla codegen in-tree
Test Plan: revert-hammer

Differential Revision:
D27708346 (51d0212d0f)

Original commit changeset: 2289edd641f3

fbshipit-source-id: 86711c07db19833b9e772c558e12accba1432499
2021-04-21 11:07:45 -07:00
Brian Hirsh
51d0212d0f generate xla codegen in-tree (#55050)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55050

not ready for review yet

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D27708346

Pulled By: bdhirsh

fbshipit-source-id: 2289edd641f30277d7561cf2d48ec69c6a2137a9
2021-04-21 08:19:08 -07:00
Edward Yang
f17c9ea2ed Port all unary float functions to structured (#56082)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56082

The native_functions.yaml changes were done by codemod using the
following script:

```
import ruamel.yaml
from ruamel.yaml.tokens import CommentToken
from ruamel.yaml.error import CommentMark
from tools.codegen.model import *  # noqa: F403

with open("aten/src/ATen/native/native_functions.yaml", "r") as f:
    contents = f.read()

yaml = ruamel.yaml.YAML()
yaml.preserve_quotes = True
yaml.width = 1000
yaml.boolean_representation = ['False', 'True']
r = yaml.load(contents)

convert = '''\
acos
acosh
asin
asinh
atan
atanh
cos
cosh
digamma
erf
erfc
erfinv
exp
expm1
exp2
lgamma
log
log10
log1p
log2
reciprocal
sigmoid
sin
sinc
sinh
special_entr
sqrt
tan
tanh'''.split()

for e in r:
    f = NativeFunction.from_yaml(e, Location("", 0))
    if f.structured or f.structured_delegate is not None:
        continue
    n = f.func.name.name.base
    if n not in convert:
        continue
    # mutate e to make changes
    if f.func.kind() == SchemaKind.out:
        e.insert(1, 'structured', True)
        e.insert(2, 'structured_inherits', 'TensorIteratorBase')
    else:
        # TODO: The .out overload assumption is not sound in general
        e.insert(1, 'structured_delegate', f'{n}.out')

        e['dispatch'].pop('CPU', None)
        e['dispatch'].pop('CUDA', None)
        e['dispatch'].pop('CPU, CUDA', None)
        e['dispatch'].pop('CompositeExplicitAutograd', None)

        *_, last_k = e.keys()
        needs_fixup = False

        if not e['dispatch']:
            if last_k == 'dispatch':
                needs_fixup = True
            del e['dispatch']

        # Manually fix up newlines at the end, because ruamel
        # made some bad life choices about where to associate trailing
        # whitespace for nested dicts; see
        # https://stackoverflow.com/questions/42172399/modifying-yaml-using-ruamel-yaml-adds-extra-new-lines
        if needs_fixup:
            *_, last_k = e.keys()
            # post_key, pre_key, post_value, pre_value
            e.ca.items[last_k] = [None, None, CommentToken('\n\n', CommentMark(0), None), None]

with open("aten/src/ATen/native/native_functions.yaml.new", "w") as f:
    yaml.dump(r, f)
```

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: bhosmer

Differential Revision: D27777769

Pulled By: ezyang

fbshipit-source-id: 1ecbac7cb3e0093167bb61c7d2b1ecb95b8ae17c
2021-04-15 16:06:42 -07:00
Sam Estep
4753100a3b Un-ignore F403 in .flake8 (#55838)
Summary:
Generally wildcard imports are bad for the reasons described here: https://www.flake8rules.com/rules/F403.html

This PR replaces wildcard imports with an explicit list of imported items where possible, and adds a `# noqa: F403` comment in the other cases (mostly re-exports in `__init__.py` files).

This is a prerequisite for https://github.com/pytorch/pytorch/issues/55816, because currently [`tools/codegen/dest/register_dispatch_key.py` simply fails if you sort its imports](https://github.com/pytorch/pytorch/actions/runs/742505908).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55838

Test Plan: CI. You can also run `flake8` locally.

Reviewed By: jbschlosser

Differential Revision: D27724232

Pulled By: samestep

fbshipit-source-id: 269fb09cb4168f8a51fd65bfaacc6cda7fb87c34
2021-04-13 09:24:07 -07:00
Sameer Deshmukh
5fb1142702 Add CSR (compressed sparse row) layout for sparse tensors (#50937)
Summary:
Implement compressed sparse row format. Derived from the GCS implementation at https://github.com/pytorch/pytorch/pull/44190

Pull Request resolved: https://github.com/pytorch/pytorch/pull/50937

Reviewed By: mrshenli

Differential Revision: D27439865

Pulled By: ezyang

fbshipit-source-id: 3ba3dcb9679505b980ff6a5f513e913bbae2fb1d
2021-04-12 10:09:12 -07:00
Ailing Zhang
6842da6251 [WIP]Relax some limitations of InferenceMode. (#54403)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54403

A few important points about InferenceMode behavior:
1. All tensors created in InferenceMode are inference tensors except for view ops.
   - view ops produce output has the same is_inference_tensor property as their input.
     Namely view of normal tensor inside InferenceMode produce a normal tensor, which is
     exactly the same as creating a view inside NoGradMode. And view of
     inference tensor outside InferenceMode produce inference tensor as output.
2. All ops are allowed inside InferenceMode, faster than normal mode.
3. Inference tensor cannot be saved for backward.

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D27316483

Pulled By: ailzhang

fbshipit-source-id: e03248a66d42e2d43cfe7ccb61e49cc4afb2923b
2021-04-09 14:40:37 -07:00
Wenlei Xie
70af5db7ca Remove use_c10_dispatcher option (#54969)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54969

With all use cases to hacky wrapper removed, all kernels will be
dispatched with c10 full dispatcher.
ghstack-source-id: 125434790

Test Plan: buck build //caffe2/aten/...

Reviewed By: ezyang, walterddr

Differential Revision: D27436596

fbshipit-source-id: 7a146d1f4a983b4a81f8552be4eec6c482b6bea2
2021-03-31 16:24:24 -07:00
Ailing Zhang
43d4f3b8d0 Implement public API InferenceMode and its error handling (#55008)
Summary:
https://www.internalfb.com/phabricator/paste/view/P360377337Pull Request resolved: https://github.com/pytorch/pytorch/pull/53343

For easier review, here's a diff between the version before revert. https://www.internalfb.com/phabricator/paste/view/P360750919

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55008

Test Plan: Imported from OSS

Pulled By: ailzhang

Reviewed By: bhosmer

Differential Revision: D27443229

fbshipit-source-id: 01b03446a1f6373f43dd5c7170d26226b50f363c
2021-03-31 10:48:00 -07:00
Ailing Zhang
263180d7fc Revert D26973911: Implement public API InferenceMode and its error handling
Test Plan: revert-hammer

Differential Revision:
D26973911 (7caa464631)

Original commit changeset: 0ebdac7a3cd5

fbshipit-source-id: afd37a3785bc694e8ffbd679eba1cfed89ef2273
2021-03-29 11:17:49 -07:00
Ailing Zhang
7caa464631 Implement public API InferenceMode and its error handling (#53343)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53343

Test Plan: Imported from OSS

Reviewed By: ezyang, nikithamalgifb

Differential Revision: D26973911

Pulled By: ailzhang

fbshipit-source-id: 0ebdac7a3cd554822d26d5a40f539b6e2aaec61d
2021-03-27 13:44:23 -07:00
Edward Yang
13b1ca9466 Rename DefaultBackend to CompositeExplicitAutograd (#54470)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54470

```
git grep -l 'DefaultBackend' | xargs sed -i 's/DefaultBackend/CompositeExplicitAutograd/g'
```

Plus a quick fixup in native/README.md

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: bdhirsh

Differential Revision: D27253240

Pulled By: ezyang

fbshipit-source-id: 964df951ea8b52fa72937f3cc66aeaf49a702e6f
2021-03-26 10:53:30 -07:00
Edward Yang
145bc5cd51 Rename Math to CompositeImplicitAutograd (#54466)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54466

I had to very carefully audit all the use sites since there are a lot
of other uses of the string Math; I did most of the conversion by
grepping for all occurrences of Math and then doing a search
replace.

I also updated documentation for clarity.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D27253239

Pulled By: ezyang

fbshipit-source-id: afb485d07ff39575742a4f0e1e205179b60bc953
2021-03-24 13:49:24 -07:00
Edward Yang
6d0027197c Delete all unnecessary singular Math entries (#54436)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54436

An operator entry with no dispatch table implicitly generates a Math
entry, so you don't need to define one yourself.  I also added
some asserts in the codegen to fail on these cases.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: ailzhang

Differential Revision: D27235381

Pulled By: ezyang

fbshipit-source-id: f8c905090b863120f4f3656c37e2b7f26e8bb9ef
2021-03-23 00:44:01 -07:00
Edward Yang
6e8c4ad7fd s/StructuredNativeFunctions/NativeFunctionsGroup/ (#54427)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54427

A StructuredNativeFunctions is no longer guaranteed to actually
be structured (test structured property for that), so we rename
this to a more neutral name.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: ailzhang

Differential Revision: D27235380

Pulled By: ezyang

fbshipit-source-id: 2b438d615bf06a47fc9c7bf6eb66fd8b4df31bc8
2021-03-23 00:43:57 -07:00
Edward Yang
bf2ca35f35 Rejigger to use NativeFunctionsGroup even without structured: True (#54426)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54426

Previously, we only put NativeFunctions in StructuredNativeFunctions
if the out variant advertised that the kernel was structured.  However,
there are a few code generation things that can take advantage of
this trio structure, even if the kernel itself hasn't been ported
to be structured.  So better to always group things when they are
related, and then let clients decide whether or not to use the
structure or throw it away.

While doing this, I had hoped that there weren't any functional/inplace
pairs that didn't also have an out variant.  This turned out to not
be true.  These are probably all oversights and should get fixed at
some point.

Bill of changes:

- The actual operational change happens in
  StructuredNativeFunctions.from_dict; then I need to relax some
  __post_init__ invariants.  To tell if a StructuredNativeFunctions
  is actually structured, there is a new structured property, which
  is queried from a few new locations in code
- Refactor native_functions.py into gen_structured/gen_unstructured
  functions so I can easily call gen_unstructured from two contexts

I intend to s/StructuredNativeFunctions/NativeFunctionsGroup/ but
for ease of review this rename hasn't been done in this PR.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: ailzhang

Differential Revision: D27235379

Pulled By: ezyang

fbshipit-source-id: d8a15de9abb75b365348ab94e67b830704e30cf0
2021-03-23 00:43:54 -07:00
Edward Yang
282eefebf3 Delete defunct ComplexCPU/ComplexCUDA dispatch keys (#54013)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54013

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D27051837

Pulled By: ezyang

fbshipit-source-id: c2a20737b6bd4a1317905bafceb2d8cb39f37e76
2021-03-16 15:20:04 -07:00
Ailing Zhang
274b96b878 Move as_view/increment_version to its separate key. (#53342)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53342

Test Plan: Imported from OSS

Reviewed By: nikithamalgifb

Differential Revision: D26973913

Pulled By: ailzhang

fbshipit-source-id: bc7fc25d1a3a1f20cdfa1d7126fa559a84d194a4
2021-03-15 14:47:12 -07:00
Edward Yang
93c4f9f972 Split out RegisterDispatchKey to its own file (#51508)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51508

No substantive changes.  The codegen for this file was getting a
bit long so I moved it off into tools.codegen.dest submodule (I
wanted to do tools.codegen.gen but that conflicts with the existing
module; oy vey!)  To do this I had to move some other functions around
so that they were more generally accessible.  Otherwise
self-explanatory.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: ljk53

Differential Revision: D26187856

Pulled By: ezyang

fbshipit-source-id: fd3784571d03d01c4acb7ca589fcde4492526408
2021-02-04 09:19:32 -08:00
Jiakai Liu
83287a6f2b [pytorch] change codegen dispatch key from string to enum (#51115)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51115

Add enum type for dispatch key. Prepare to implement the DispatchTable
computation logic in python for static dispatch.

Verified byte-for-byte compatibility of the codegen output.

Test Plan: Imported from OSS

Reviewed By: bhosmer

Differential Revision: D26077430

Pulled By: ljk53

fbshipit-source-id: 86e74f3eb32266f31622a2ff6350b91668c8ff42
2021-01-27 22:28:52 -08:00
Edward Yang
5e79b8e06d Back out "Revert D25903846: [pytorch][PR] Structured kernel definition for upsample_nearest2d" (#50794)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50794

Original commit changeset: b4a7948088c0

There are some subtle extra tweaks on top of the original. I can unbundle them, but I've opted to keep it with the port because it's the easiest way to make sure the changes are exercised.

* There's a bugfix in the codegen to test if a dispatch key is structured *before* short circuiting because the dispatch key was missing in the table. This accounts for mixed structured-nonstructured situations where the dispatch table is present, but the relevant structured key isn't (because the dispatch table only exists to register, e.g., QuantizedCPU)
* Dispatch tables for functions which delegate to structured kernels don't have Math entries from generated for them.
* It's now illegal to specify a structured dispatch key in a delegated structured kernel (it will be ignored!) add is now fixed to follow this
* There are some extra sanity checks for NativeFunctions validation
* Finally, unlike the original PR, I switched the .vec variant of upsample_nearest2d to also be DefaultBackend, bringing it inline with upsample_nearest1d.
ghstack-source-id: 120038038

Test Plan:
```
buck test mode/dev //coreai/tiefenrausch:python_tests -- --exact 'coreai/tiefenrausch:python_tests - test_can_run_local_async_inference_cpu (coreai.tiefenrausch.tests.python_test.TiefenrauschPY)' --run-disabled
```

Reviewed By: ngimel

Differential Revision: D25962873

fbshipit-source-id: d29a9c97f15151db3066ae5efe7a0701e6dc05a3
2021-01-25 10:43:53 -08:00
Sebastian Messmer
d46210958e Remove use_c10_dispatcher: full lines added in the last couple days (#50769)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50769

There were a couple new of these lines added in the last couple of days but they're not necessary anymore.
This PR removes them and also adds an assertion to make sure we don't add any more.
ghstack-source-id: 120133715

Test Plan: waitforsandcastle

Reviewed By: bhosmer

Differential Revision: D25961316

fbshipit-source-id: e2befc5b6215b42decb2acedcacfb50734857e2f
2021-01-21 20:35:26 -08:00
Sebastian Messmer
e4c41b6936 Remove codegen logic to support non-c10-full ops (#49164)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49164

This PR removes the logic paths in codegen that were responsible for handling non-c10-full ops.
This only goes through our basic codegen. It does not simplify C++ code yet and it does not remove the codegen for generated unboxing wrappers yet.
ghstack-source-id: 119450487

Test Plan: waitforsandcastle

Reviewed By: ezyang

Differential Revision: D25462977

fbshipit-source-id: 7e70d14bea96948f5056d98125f3e6ba6bd78285
2021-01-06 14:17:36 -08:00
Jiakai Liu
e71a13e8a3 [pytorch][codegen] migrate gen_variable_type to new data model (#49735)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49735

This is the final wave of autograd codegen data model migration.

After this PR:
- autograd codegen no longer depends on Declarations.yaml;
- autograd codegen sources are fully type annotated and pass mypy-strict check;

To avoid potential merge conflicts with other pending PRs, some structural
changes are intentionally avoided, e.g. didn't move inner methods out, didn't
change all inner methods to avoid reading outer function's variables, and etc.

Confirmed byte-for-byte compatible with the old codegen:
```
Run it before and after this PR:
  .jenkins/pytorch/codegen-test.sh <baseline_output_dir>
  .jenkins/pytorch/codegen-test.sh <test_output_dir>

Then run diff to compare the generated files:
  diff -Naur <baseline_output_dir> <test_output_dir>
```

Confirmed clean mypy-strict run:
```
mypy --config mypy-strict.ini
```

Test Plan: Imported from OSS

Reviewed By: ezyang, bhosmer

Differential Revision: D25678879

Pulled By: ljk53

fbshipit-source-id: ba6e2eb6b9fb744208f7f79a922d933fcc3bde9f
2021-01-05 14:12:39 -08:00
Edward Yang
0216366f0d Make use_c10_dispatcher: full mandatory for structured kernels (#49490)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49490

No reason to let people to do the legacy thing for the brand new kernel.
This simplifies the codegen.  I have to port the two structured kernels
to this new format.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: smessmer

Differential Revision: D25595406

Pulled By: ezyang

fbshipit-source-id: b5931873379afdd0f3b00a012e0066af05de0a69
2021-01-04 11:59:24 -08:00
Edward Yang
8eee8460f8 codegen: Resolve overload ambiguities created by defaulted arguments (#49348)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49348

This is a redux of #45666 post refactor, based off of
d534f7d4c5
Credit goes to peterbell10 for the implementation.

Fixes #43945.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: smessmer

Differential Revision: D25594004

Pulled By: ezyang

fbshipit-source-id: c8eb876bb3348308d6dc8ba7bf091a2a3389450f
2021-01-04 11:59:16 -08:00
Edward Yang
7202c0ec50 Tighten up error checking on manual_kernel_registration (#49341)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49341

I noticed that #49097 was using manual_kernel_registration incorrectly,
so this diff tightens up the testing so that:

1. We don't generate useless wrapper functions when manual_kernel_registration
is on (it's not going to be registered, so it does nothing).

2. manual_kernel_registration shouldn't affect generation of functions in
Functions.h; if you need to stop bindings, use manual_cpp_binding

3. Structured and manual_kernel_registration are a hard error

4. We raise an error if you set dispatch and manual_kernel_registration at the
same time.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: smessmer

Differential Revision: D25594003

Pulled By: ezyang

fbshipit-source-id: 655b10e9befdfd8bc95f1631b2f48f995a31a59a
2021-01-04 11:59:12 -08:00
Sebastian Messmer
69ca5e1397 Enforce c10-fullness for all ops (#49619)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49619

This is a minimal-change PR that enforces that all operators are c10-full by making it the default.

This does not clean up any code yet, that will happen in PRs stacked on top. But this PR already ensures
that there are no non-c10-full ops left and there will be no non-c10-full ops introduced anymore.
ghstack-source-id: 119269182

Test Plan: waitforsandcastle

Reviewed By: bhosmer

Differential Revision: D25650198

fbshipit-source-id: efc53e884cb53193bf58a4834bf148453e689ea1
2021-01-04 11:26:53 -08:00
Iurii Zdebskyi
6230e337d5 Add torch._foreach_zero_ API (#47286)
Summary:
**In this PR**
- add `_foreach_zero_` API
- Update all optimizers under /_multi_tensor/ to use `_foreach_zero_` in `zero_grad` method

Performance improvement
----------------- OP:  zero_  -----------------
for-loop: 630.36 us
foreach: 90.84 us

script

```
import torch
import torch.optim as optim
import torch.nn as nn
import torchvision
import torch.utils.benchmark as benchmark_utils

inputs = [torch.rand(3, 200, 200, device="cuda") for _ in range(100)]

def main():
    for op in [
            "zero_"
        ]:
        print("\n\n----------------- OP: ", op, " -----------------")
        stmt = "[torch.{op}(t) for t in inputs]"
        timer = benchmark_utils.Timer(
            stmt=stmt.format(op = op),
            globals=globals(),
            label="str(optimizer)",
        )
        print(f"autorange:\n{timer.blocked_autorange()}\n\n")

        stmt = "torch._foreach_{op}(inputs)"
        timer_mta = benchmark_utils.Timer(
            stmt=stmt.format(op = op),
            globals=globals(),
            label="str(optimizer_mta)",
        )
        print(f"autorange:\n{timer_mta.blocked_autorange()}\n\n")

if __name__ == "__main__":
    main()

```
**TODO**
- Refactor zero_grad once foreach APIs are stable.

**Tested** via unit tests

Pull Request resolved: https://github.com/pytorch/pytorch/pull/47286

Reviewed By: ngimel

Differential Revision: D24706240

Pulled By: izdeby

fbshipit-source-id: aac69d6d134d65126ae8e5916f3627b73d8a94bf
2020-12-16 20:04:25 -08:00
Edward Yang
59e822026c Add manual_cpp_binding to native_functions.yaml (#49092)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49092

Functions which specify manual_cpp_binding don't automatically
get C++ bindings generated for them in TensorBody.h or
Functions.h.  This lets end users manually define the bindings
themselves, which may be helpful if there is a way to
short circuit the dispatcher entirely.  contiguous() is switched
to use this mechanism.

Although manual_cpp_binding suggests that we don't generate the
binding at all, it is often the case that there is some "fast
path", but when this path is not satisfied, we should go back
to the slow dispatch.  So we still generate a fallback method/function
which the user-defined binding can call into in case that we
have to go slowpath.

The correctness conditions for bindings manually written in this
way are subtle.  Here are the ones I can think of off the top
of my head:

- Whatever condition is tested in the C++ body, must ALSO be
  tested again in the native:: implementation on the other
  side of the dispatcher.  This is because you are NOT GUARANTEED
  to hit the native:: implementation through the C++ binding,
  you may go straight to the implementation via a boxed call.

- If a binding is written in this way, it is only safe to
  skip dispatch if you would have returned the same tensor as
  before.  In any situation you would return a fresh tensor,
  you MUST go to the slow path, because you need to actually
  get to the autograd kernel.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: bhosmer

Differential Revision: D25428440

Pulled By: swolchok

fbshipit-source-id: 6e71767cb8d1086d56cd827c1d2d56cac8f6f5fe
2020-12-10 21:56:53 -08:00
Edward Yang
9b0ffb9fb3 Delete cpp.group_arguments (#49043)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49043

Previously, this function had nontrivial algorithmic content,
but after #48195, this was just a swiss army knife for pasting
together arguments while maintaining structure.  I added some
more properties for Arguments for convenient access in this way,
and then inlined the implementation of group_arguments into all of its call
sites, simplifying whenever contextual.  This might be controversial, but I
think the resulting code is easier to understand.

You may notice that there is some modest code duplication between
dispatcher.cpparguments_exprs and CppSignature.argument_packs.
This is a known problem and I will be attempting to fix it in
a follow up PR.

Confirmed to be byte-for-byte compatible.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: H-Huang

Differential Revision: D25455885

Pulled By: ezyang

fbshipit-source-id: 8fbe066e8c3cb7ee8adb5b87296ec5bd7b49e01f
2020-12-10 18:20:46 -08:00
Edward Yang
267641a245 Rename positional and kwarg_only to have flat prefix (#49042)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49042

I want the names positional and kwarg_only to give the unflat
representation (e.g., preserving TensorOptionsArguments in the
returned Union).  So I regret my original naming choice when
I moved grouping to model.  This renames them to have flat_ prefix
and also adds a flat_non_out argument for cases where you just
want to look at non-out arguments.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: H-Huang

Differential Revision: D25455884

Pulled By: ezyang

fbshipit-source-id: f923f8881267a3e3e8e9521519412f7cc25034fc
2020-12-10 18:20:43 -08:00
Edward Yang
16b8e6ab01 Class-based structured kernels, with migration of add to framework (#48718)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48718

This PR rewrites structured kernels to do the class-based mechanism (instead of defining a meta and impl function, they are methods on a class), and adds enough customizability on the class to support TensorIterator. To show it works, add is made a structured kernel. Don't forget to check https://github.com/pytorch/rfcs/pull/9 for a mostly up-to-date high level description of what's going on here.

High level structure of this PR (the order you should review files):
* TensorMeta.h - TensorMeta is deleted entirely; instead, meta functions will call `set_output` to allocate/resize their outputs. MetaBase gets a new `maybe_get_output` virtual method for retrieving the (possibly non-existent) output tensor in a meta function; this makes it easier to do special promotion behavior, e.g., as in TensorIterator.
* TensorIterator.cpp - Two major changes: first, we add TensorIteratorBase::set_output, which is a "light" version of TensorIterator::set_output; it sets up the internal data structures in TensorIterator, but it doesn't do allocation (that is assumed to have been handled by the structured kernels framework). The control flow here is someone will call the subclassed set_output, which will allocate output, and then we will call the parent class (TensorIteratorBase) to populate the fields in TensorIterator so that other TensorIterator phases can keep track of it. Second, we add some tests for meta tensors, and skip parts of TensorIterator which are not necessary when data is not available.
* tools/codegen/model.py - One new field in native_functions.yaml, structured_inherits. This lets you override the parent class of a structured meta class; normally it's MetaBase, but you can make it point at TensorIteratorBase instead for TensorIterator based kernels
* tools/codegen/gen.py - Now generate all of the classes we promised. It's kind of hairy because this is the first draft. Check the RFC for what the output looks like, and then follow the logic here. There are some complications: I need to continue to generate old style wrapper functions even if an operator is structured, because SparseCPU/SparseCUDA/etc won't actually use structured kernels to start. The most complicated code generation is the instantiation of `set_output`, which by in large replicates the logic in `TensorIterator::set_output`. This will continue to live in codegen for the forseeable future as we would like to specialize this logic per device.
* aten/src/ATen/native/UpSampleNearest1d.cpp - The previous structured kernel is ported to the new format. The changes are very modest.
* aten/src/ATen/native/BinaryOps.cpp - Add is ported to structured.

TODO:
* Work out an appropriate entry point for static runtime, since native:: function stubs no longer are generated
* Refactor TensorIteratorConfig construction into helper functions, like before
* Make Tensor-Scalar addition structured to fix perf regression
* Fix `verify_api_visibility.cpp`
* Refactor tools/codegen/gen.py for clarity
* Figure out why header changes resulted in undefined reference to `at::Tensor::operator[](long) const`

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: bhosmer

Differential Revision: D25278031

Pulled By: ezyang

fbshipit-source-id: 57c43a6e5df21929b68964d485995fbbae4d1f7b
2020-12-09 15:39:12 -08:00
Iurii Zdebskyi
dad74e58fc [WIP] Added foreach_trunc, foreahc_reciprocal, foreach_sigmoid APIs (#47385)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47385

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D24737051

Pulled By: izdeby

fbshipit-source-id: ed259d9184b2b784d8cc1983a8b85cc6cbf930ba
2020-12-07 10:47:23 -08:00
Edward Yang
742903c0df Move argument grouping into FunctionSchema (#48195)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48195

The general approach is to change Arguments, splitting `positional`, `kwarg_only` and `out`, into `pre_self_positional`, `self_arg`, `post_self_positional`, and `pre_tensor_options_kwarg_only`, `tensor_options` and `post_tensor_options_kwarg_only`. The splits are as you'd expect: we extract out the self argument and the tensor options arguments, and record the other arguments that came before and after. To do this, we move the logic in `group_arguments` to the parsing process.

Some fuzz in the process:
* I renamed `ThisArgument` to `SelfArgument`, since we don't actually use the terminology "this" outside of C++ (and the model is Python biased)
* I kept the `group_arguments` function, which now just reads out the arguments from the structured model in the correct order. In the long term, we should get rid of this function entirely, but for now I kept it as is to reduce churn.
* I decided to arbitrarily say that when self is missing, everything goes in "post-self", but when tensor options is missing, everything goes in "pre-tensor-options". This was based on where you typically find the argument in question: self is usually at front (so most args are after it), while tensor options are typically at the end (so most args go before it).

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: zhangguanheng66

Differential Revision: D25231166

Pulled By: ezyang

fbshipit-source-id: 25d77ad8319c4ce0bba4ad82e451bf536ef823ad
2020-12-02 07:57:11 -08:00
Edward Yang
ba5686f8c5 Refactor argument fields in FunctionSchema to Arguments (#48182)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48182

I'm planning to add a bunch more argument fields following
https://github.com/pytorch/pytorch/pull/45890#discussion_r503646917 and
it will be a lot more convenient if the arguments get to live
in their own dedicated struct.  Type checker will tell you if
I've done it wrong.  No change to output.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: ljk53

Differential Revision: D25057897

Pulled By: ezyang

fbshipit-source-id: dd377181dad6ab0c894d19d83408b7812775a691
2020-12-02 07:57:06 -08:00
Jiakai Liu
de284b6d35 [pytorch][codegen] add autograd data model (#48249)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48249

Introduced autograd related data models at tools.codegen.api.autograd.

Migrated load_derivatives.py to produce the new data models from derivatives.yaml.
It has clean mypy-strict result.

Changed both gen_autograd_functions.py and gen_variable_type.py to consume
the new data model.

Added type annotations to gen_autograd_functions.py - it has clean mypy-strict
result except for the .gen_autograd import (so haven't added it to the strict
config in this PR).

To limit the scope of the PR, gen_variable_type.py is not refactored, and the
main structure of load_derivatives.py / gen_autograd_functions.py is kept. We
only make necessary changes to make it work.

Confirmed byte-for-byte compatible with the old codegen:

```
Run it before and after this PR:
  .jenkins/pytorch/codegen-test.sh <baseline_output_dir>
  .jenkins/pytorch/codegen-test.sh <test_output_dir>

Then run diff to compare the generated files:
  diff -Naur <baseline_output_dir> <test_output_dir>
```

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D25086561

Pulled By: ljk53

fbshipit-source-id: 1f43ab0931d9814c24683b9a48ca497c5fc3d729
2020-11-19 21:47:05 -08:00
Jiakai Liu
f98ab18445 [pytorch][codegen] move is_abstract property to NativeFunction model (#48252)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48252

Moved to a shared place so that gen_variable_type.py can reuse it.

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D25087808

Pulled By: ljk53

fbshipit-source-id: 1f32e506956fc4eb08734cfde0add47b3e666bd9
2020-11-19 12:30:13 -08:00
Iurii Zdebskyi
94cd048bda Added foreach_frac API (#47384)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47384

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D24737052

Pulled By: izdeby

fbshipit-source-id: 8c94cc42bf22bfbb8f78bfeb2017a5756045763a
2020-11-17 16:56:30 -08:00
Iurii Zdebskyi
134bce7cd0 Adding bunch of unary foreach APIs (#47875)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47875

Implementing several unary operators for _foreach_ APIs.
### Planned list of ops
- [x]  abs
- [x]  acos
- [x]  asin
- [x]  atan
- [x]  ceil
- [x]  cos
- [x]  cosh
- [x]  erf
- [x]  erfc
- [x]  exp
- [x]  expm1
- [x]  floor
- [x]  log
- [x]  log10
- [x]  log1p
- [x]  log2
- [ ]  frac
- [x]  neg
- [ ]  reciprocal
- [x]  round
- [ ]  rsqrt
- [ ]  sigmoid
- [x]  sin
- [x]  sinh
- [x]  sqrt
- [x]  tan
- [x]  tanh
- [ ]  trunc
- [x]  lgamma
- [ ]  digamma
- [ ]  erfinv
- [ ]  sign
- [ ]  mvlgamma
- [ ]  clamp
- [ ]  clamp_min
- [ ]  clamp_max

### Perf results
```
----------------- OP:  sin  -----------------
  Median: 998.79 us
  300.84 us

----------------- OP:  abs  -----------------
  Median: 1.19 ms
  294.97 us

----------------- OP:  acos  -----------------
  Median: 982.30 us
  299.40 us

----------------- OP:  asin  -----------------
  Median: 1.16 ms
  298.09 us

----------------- OP:  atan  -----------------
  Median: 986.92 us
  295.64 us

----------------- OP:  ceil  -----------------
  Median: 1.17 ms
  297.25 us

----------------- OP:  cos  -----------------
  Median: 972.72 us
  294.41 us

----------------- OP:  cosh  -----------------
  Median: 1.17 ms
  294.97 us

----------------- OP:  erf  -----------------
  Median: 1.17 ms
  297.02 us

----------------- OP:  erfc  -----------------
  Median: 1.14 ms
  299.23 us

----------------- OP:  exp  -----------------
  Median: 1.15 ms
  298.79 us

----------------- OP:  expm1  -----------------
  Median: 1.17 ms
  291.79 us

----------------- OP:  floor  -----------------
  Median: 1.17 ms
  293.51 us

----------------- OP:  log  -----------------
  Median: 1.13 ms
  318.01 us

----------------- OP:  log10  -----------------
  Median: 987.17 us
  295.57 us

----------------- OP:  log1p  -----------------
  Median: 1.13 ms
  297.15 us

----------------- OP:  log2  -----------------
  Median: 974.21 us
  295.01 us

----------------- OP:  frac  -----------------
  Median: 1.15 ms
  296.01 us

----------------- OP:  neg  -----------------
  Median: 1.13 ms
  294.98 us

----------------- OP:  reciprocal  -----------------
  Median: 1.16 ms
  293.69 us

----------------- OP:  round  -----------------
  Median: 1.12 ms
  297.48 us

----------------- OP:  sigmoid  -----------------
  Median: 1.13 ms
  296.53 us

----------------- OP:  sin  -----------------
  Median: 991.02 us
  295.78 us

----------------- OP:  sinh  -----------------
  Median: 1.15 ms
  295.70 us

----------------- OP:  sqrt  -----------------
  Median: 1.17 ms
  297.75 us

----------------- OP:  tan  -----------------
  978.20 us
  297.99 us

----------------- OP:  tanh  -----------------
  Median: 967.84 us
  297.29 us

----------------- OP:  trunc  -----------------
  Median: 1.14 ms
  298.72 us

----------------- OP:  lgamma  -----------------
  Median: 1.14 ms
  317.53 us
```

### Script

```

import torch
import torch.optim as optim
import torch.nn as nn
import torchvision
import torch.utils.benchmark as benchmark_utils

inputs = [torch.rand(3, 200, 200, device="cuda") for _ in range(100)]

def main():
    for op in [
            "sin", "abs", "acos", "asin", "atan", "ceil",
            "cos", "cosh", "erf", "erfc",
            "exp", "expm1", "floor", "log",
            "log10", "log1p", "log2", "frac",
            "neg", "reciprocal", "round",
            "sigmoid", "sin", "sinh", "sqrt",
            "tan", "tanh", "trunc", "lgamma"
        ]:
        print("\n\n----------------- OP: ", op, " -----------------")
        stmt = "[torch.{op}(t) for t in inputs]"
        timer = benchmark_utils.Timer(
            stmt=stmt.format(op = op),
            globals=globals(),
            label="str(optimizer)",
        )
        print(f"autorange:\n{timer.blocked_autorange()}\n\n")

        stmt = "torch._foreach_{op}(inputs)"
        timer_mta = benchmark_utils.Timer(
            stmt=stmt.format(op = op),
            globals=globals(),
            label="str(optimizer_mta)",
        )
        print(f"autorange:\n{timer_mta.blocked_autorange()}\n\n")

if __name__ == "__main__":
    main()

```

Test Plan: Imported from OSS

Reviewed By: nikithamalgifb

Differential Revision: D24948801

Pulled By: izdeby

fbshipit-source-id: defec3c0394d6816d9a8b05a42a057348f1b4d96
2020-11-17 16:51:54 -08:00
Edward Yang
cdc2d2843b Structured kernel definitions (#45277)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45277

Implements structured kernels as per https://github.com/pytorch/rfcs/pull/9 and ports upsample_nearest1d to use the framework.

The general structure of this diff:

- Define a new syntax for specifying structured kernels in `native_functions.yaml`. You put `structured: True` on the `out` function (that's what you implement) and `structured_delegate: foo.out` on the functional/inplace variants to define them in terms of the `out` function. There's a bunch of new consistency checking to see if you've done this right, though the error messages are of varying quality. This is most of what's going on in tools.codegen.model
- NativeFunctionGroup turns into StructuredNativeFunctions. Previously I thought that maybe we would use this grouping mechanism for both structured and unstructured kernels, but it turned out that Jiakai needed to make his own grouping structure. So now I've specialized it for structured kernels, which also means I get to add a bunch of invariants, like requiring structured kernels to have both a functional and an out variant. This is the lower bundle of changes in tools.codegen.model
- When you make an out kernel structured, this induces us to generate a new meta function signature for you to write shape checking and output allocation code. The signatures of these is defined by `tools.codegen.api.meta` and generated into `MetaFunctions.h`. Coverage here is very bare bones and will be driven by actual operators we port as we go.
- The meaty part of code generation is what we do when we have some grouped StructuredNativeFunctions. We continue to generate a wrapper per function type, but they're are a bit different as the call your meta functions, and make reference to the actual implementations in out.
- Then there's a port of `upsample_nearest1d`; easiest to review by just looking at what the final code looks like.

Missing pieces:

- Stride calculation in TensorMeta
- Sufficient sanity checking for inplace/out variants
- Enough rope to make TensorIterator work

This PR improves instruction counts on `upsample_nearest1d` because it eliminates an extra redispatch. Testing `at::upsample_nearest1d(x, {10});`

* Functional: before 1314105, after 1150705
* Out: before 915705, after 838405

These numbers may be jittered up to +-16400 (which is the difference when I tested against an unaffected operator `at::upsample_linear1d`), though that may also because unrelated changes affected all operators globally.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: D24253555

Test Plan: Imported from OSS

Reviewed By: smessmer

Pulled By: ezyang

fbshipit-source-id: 4ef58dd911991060f13576864c8171f9cc614456
2020-11-17 15:24:43 -08:00
Iurii Zdebskyi
1c45631f10 Revert D24737050: [WIP] Adding bunch of unary foreach APIs
Test Plan: revert-hammer

Differential Revision:
D24737050 (b6a2444eff)

Original commit changeset: deb59b41ad1c

fbshipit-source-id: 76cd85028114cfc8fc5b7bb49cd27efc2e315aa5
2020-11-10 09:41:41 -08:00
Iurii Zdebskyi
b6a2444eff [WIP] Adding bunch of unary foreach APIs (#47383)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47383

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D24737050

Pulled By: izdeby

fbshipit-source-id: deb59b41ad1c79b66cafbd9a9d3d6b069794e743
2020-11-09 14:14:28 -08:00
Brian Hirsh
7a0f0d24d0 Codegen - error when an argument that looks like an out argument isn't a kwarg (fix #43273) (#47284)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47284

Test Plan: Imported from OSS

Reviewed By: nikithamalgifb

Differential Revision: D24706763

Pulled By: bdhirsh

fbshipit-source-id: 60fbe81a0dff7e07aa8c169235d15b84151d3ed7
2020-11-03 16:30:01 -08:00
Edward Yang
54d83296a9 Desugar missing dispatch field into singleton Math entry (#46970)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46970

Now that catchall declarations are reinterpreted as registrations to
dispatch key Math, we can now simplify code generation logic by directly
generating to Math, and bypasing logic for catchall.  This also helps
avoid bugs where we incorrectly classify some kernels as Math and others
as not, even though they get registered in the same way.

Bill of changes:
- Give Math its own unique TORCH_LIBRARY_IMPL
- Make it so NativeFunction.dispatch is always non-None.  Simplify
  downstream conditionals accordingly
- When parsing NativeFunction, fill in missing dispatch with a
  singleton Math entry (pointing to the cpp.name!)

One thing that is a little big about this change is a lot of kernels
which previously didn't report as "math" now report as math.  I picked
a setting for these booleans that made sense to me, but I'm not sure
if e.g. XLA will handle it 100% correctly.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D24592391

Pulled By: ezyang

fbshipit-source-id: 2e3355f19f9525698864312418df08411f30a85d
2020-10-29 14:43:44 -07:00
Alban Desmaison
46b252b83a Revert D24262885: [pytorch][PR] Added foreach_zero_ API
Test Plan: revert-hammer

Differential Revision:
D24262885 (8e37dcb1f3)

Original commit changeset: 144c283dd009

fbshipit-source-id: 451b202e23bc1fcb11b20d26c11d9a1329789d22
2020-10-28 06:48:59 -07:00