Commit Graph

70 Commits

Author SHA1 Message Date
Brian Hirsh
eca98fedb5 split out NamedCType from CType. Remove direct string comparison from autograd codegen (#55334)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55334

The goal of this PR is to clean up some of the autograd codegen to compare C++ types using `CType` objects instead of raw strings. My last PR in the stack made that string comparison a little more fragile, since the raw C++ strings needed to be namespace-aware.

I confirmed byte-for-byte no codegen changes vs. the last PR (which added namespaces to the codegen) by running `diff -qr ../pytorch-common_test/torch/csrc/autograd/generated/ ../pytorch-callgrind_test_after2/torch/csrc/autograd/generated/` and `diff -qr ../pytorch-common_test/build/aten/src/ATen/ ../pytorch-callgrind_test_after2/build/aten/src/ATen/`

Note that a better end-state for the autograd codegen would be to do all of its type pattern matching directly off of JIT types, instead of off of CType’s (which are really just generated from JIT types, incorporating C++ specific semantics). That looks like it’ll require a pretty substantial change though, so I’m not doing it in this PR.

As part of this change (and after talking with ezyang), I split off the `CType` data class into a separate `NamedCType` class, which holds a name and a `CType`. This way, `CType` only knows about actual C++ types, making it easier to compare CType’s to each other in the codegen when we only care about the type. The core change is in `types.py`, but it required a bunch of downstream changes to update all of the places where we create `CType`s to create `NamedCType`s instead.

The main change in the autograd codegen was that I updated `SavedAttribute` to store a `NamedCType`. The other autograd changes all pretty much came from that change.

Test Plan: Imported from OSS

Reviewed By: bhosmer

Differential Revision: D27708347

Pulled By: bdhirsh

fbshipit-source-id: 3e07c80569c7b229c638f389e76e319bff6315f9
2021-04-16 11:43:08 -07:00
Brian Hirsh
947c7a8215 add C++ namespacing logic to ctypes (#55047)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55047

Added namespaces to all of the `CTypes` printed in the codegen. This is pretty much required if we want to use codegen externally, since we can no longer assume that we're inside of the `at::` namespace.

Important changes are in `types.py`.

How do we add the notion of namespaces to C++ types without people having to write "at::Tensor" everywhere? Before this PR, `CType` held a raw string representing the type, i.e. `BaseCType("Tensor", binds)`. This PR introduces a set of singleton base C++ types in `types.py`, that know how to print their namespace. Instead, we'd write `BaseCType(tensorT, binds)`, where printing `tensorT` will properly print out "at::Tensor".

This also means that you can't create arbitrary `CTypes`. If we need a new C++ type in the codegen, we need to add it to the list in `types.py`.

One blip in the design: we don't want to change `RegistrationDeclarations.yaml`, since that'll break external backends that ingest it. I added separate functions to display types without the namespace that are used to create RegistrationDeclarations.yaml`. With an external codegen API though, we can eventually kill it :)

I also didn't realize until this PR that `Declarations.yaml` is still directly in use, by some python/autograd codegen. Rather than keep that yaml byte-for-byte compatible, I just updated the callsites in the autograd codegen to work with namespaces. In the NEXT pr, I try to clean up some of the autograd codegen to stop using raw strings to match against C++ types.

Test Plan: Imported from OSS

Reviewed By: bhosmer

Differential Revision: D27708349

Pulled By: bdhirsh

fbshipit-source-id: 56a4f81fc101795bcb9ee1f722121480fb2356ad
2021-04-16 11:43:06 -07:00
Brian Hirsh
164bee1d09 Return a CType instead of a string for returns, beef up CType (#55046)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55046

Updating `returns` in the codegen to return a CType instead of a raw string.

This has benefit of putting all stringifying logic through CType, which is useful in the followup PR when I add namespaces.

I also added new CTypes for other templated C++ types: array, vector and tuple. Mostly because it makes the namespacing logic in the next PR significantly easier. It also seems more natural to me that `BaseCType` shouldn't represent specializations of templated types.

There's a little bit of weirdness, types that are currently *only* used for returns, i.e. `TupleCType`. Returns aren't named, so I opted not to give it one- so we can add it in later if we discover that we need it.

Test Plan: Imported from OSS

Reviewed By: bhosmer

Differential Revision: D27708348

Pulled By: bdhirsh

fbshipit-source-id: 230b210c3e53be1bd362105fbea8451055dc59a8
2021-04-16 11:41:46 -07:00
Edward Yang
6ec71ed4f9 Replace all direct cdata access with THPVariable_Unpack (#55799)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55799

I'm going to change the implementation of cdata soon so I need to
abstract over cdata access with a function.  Additionally, many
users are casting manually casting to THPVariable to access
the member so I can remove these unsafe casts in the client code
(the implementation, of course, is still doing an unsafe cast.)

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D27712130

Pulled By: ezyang

fbshipit-source-id: 95fcc013bf3913d67f2c634068eb5b3aab144cb3
2021-04-15 08:57:04 -07:00
albanD
1d49fd31c4 [reland] Add formulas and basic tests (#56083)
Summary:
Reland of https://github.com/pytorch/pytorch/pull/49098
See original issue for details.

The only difference with previous PR is the fix of the _embedding_bag_dense_backward formula to stop declaring a backward formula for an argument that does not exists.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56083

Reviewed By: samestep

Differential Revision: D27778221

Pulled By: albanD

fbshipit-source-id: 159ef91ca931ef2ccfbc3d1c46c7880c32919dc9
2021-04-15 07:52:43 -07:00
Sam Estep
817fd932ac Revert D25607505: Add formulas and basic tests
Test Plan: revert-hammer

Differential Revision:
D25607505 (70f5905565)

Original commit changeset: fe2315d58768

fbshipit-source-id: 519d7426a6f32f0db51c4f360e5d5a79dbaac99d
2021-04-14 14:50:43 -07:00
albanD
70f5905565 Add formulas and basic tests (#49098)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49098

RFC: https://github.com/pytorch/rfcs/pull/11

This PR adds:
- Codegen support to define forward grad formulas and few manual formulas
- Codegen support to automatically generate formulas as well as few usage
- Tests for basic forward grad components

Codegen generated examples.
For each of them, the only part that is changed is the if statement before the return checking for fw grad defined.

- For manual entry:
```yaml
- name: max(Tensor self) -> Tensor
  self: evenly_distribute_backward(grad, self, result)
  result: max_forward(self_fw_grad, self, result)
```

```cpp
Tensor max(const Tensor & self) {
  auto& self_ = unpack(self, "self", 0);
  auto _any_requires_grad = compute_requires_grad( self );
  std::shared_ptr<MaxBackward1> grad_fn;
  if (_any_requires_grad) {
    grad_fn = std::shared_ptr<MaxBackward1>(new MaxBackward1(), deleteNode);
    grad_fn->set_next_edges(collect_next_edges( self ));
    grad_fn->self_ = SavedVariable(self, false);
  }
  #ifndef NDEBUG
  c10::optional<Storage> self__storage_saved =
    self_.has_storage() ? c10::optional<Storage>(self_.storage()) : c10::nullopt;
  c10::intrusive_ptr<TensorImpl> self__impl_saved;
  if (self_.defined()) self__impl_saved = self_.getIntrusivePtr();
  #endif
  auto tmp = ([&]() {
    at::AutoNonVariableTypeMode non_var_type_mode(true);
    return at::max(self_);
  })();
  auto result = std::move(tmp);
  #ifndef NDEBUG
  if (self__storage_saved.has_value())
    AT_ASSERT(self__storage_saved.value().is_alias_of(self_.storage()));
  if (self__impl_saved) AT_ASSERT(self__impl_saved == self_.getIntrusivePtr());
  #endif
  if (grad_fn) {
      set_history(flatten_tensor_args( result ), grad_fn);
  }
  throw_error_for_complex_autograd(result, "max");
  if (isFwGradDefined(self)) {
      auto self_fw_grad = toLegacyFwGrad(self);
      auto self_primal = toLegacyPrimal(self);
      auto result_new_fw_grad = max_forward(self_fw_grad, self_primal, result);
      if (result_new_fw_grad.defined()) {
        result.set_fw_grad(result_new_fw_grad, /* level */ 0, /* is_inplace_op */ false);
      }
  }
  if (grad_fn) {
    grad_fn->result_ = SavedVariable(result, true);
  }
  return result;
}
```

- For element wise entry:
```yaml
- name: abs(Tensor self) -> Tensor
  self: grad * self.sgn()
  result: auto_element_wise
```

```cpp
Tensor abs(const Tensor & self) {
  auto& self_ = unpack(self, "self", 0);
  auto _any_requires_grad = compute_requires_grad( self );
  std::shared_ptr<AbsBackward> grad_fn;
  if (_any_requires_grad) {
    grad_fn = std::shared_ptr<AbsBackward>(new AbsBackward(), deleteNode);
    grad_fn->set_next_edges(collect_next_edges( self ));
    grad_fn->self_ = SavedVariable(self, false);
  }
  #ifndef NDEBUG
  c10::optional<Storage> self__storage_saved =
    self_.has_storage() ? c10::optional<Storage>(self_.storage()) : c10::nullopt;
  c10::intrusive_ptr<TensorImpl> self__impl_saved;
  if (self_.defined()) self__impl_saved = self_.getIntrusivePtr();
  #endif
  auto tmp = ([&]() {
    at::AutoNonVariableTypeMode non_var_type_mode(true);
    return at::abs(self_);
  })();
  auto result = std::move(tmp);
  #ifndef NDEBUG
  if (self__storage_saved.has_value())
    AT_ASSERT(self__storage_saved.value().is_alias_of(self_.storage()));
  if (self__impl_saved) AT_ASSERT(self__impl_saved == self_.getIntrusivePtr());
  #endif
  if (grad_fn) {
      set_history(flatten_tensor_args( result ), grad_fn);
  }
  throw_error_for_complex_autograd(result, "abs");
  if (isFwGradDefined(self)) {
      auto self_fw_grad = toLegacyFwGrad(self);
      auto self_primal = toLegacyPrimal(self);
      auto result_new_fw_grad = self_fw_grad * self_primal.sgn();
      if (result_new_fw_grad.defined()) {
        result.set_fw_grad(result_new_fw_grad, /* level */ 0, /* is_inplace_op */ false);
      }
  }
  return result;
}
```
- For linear entry:
```yaml
- name: clone(Tensor self, *, MemoryFormat? memory_format=None) -> Tensor
  self: grad
  result: auto_linear
```

```cpp
Tensor clone(const Tensor & self, c10::optional<MemoryFormat> memory_format) {
  auto& self_ = unpack(self, "self", 0);
  auto _any_requires_grad = compute_requires_grad( self );
  std::shared_ptr<CloneBackward> grad_fn;
  if (_any_requires_grad) {
    grad_fn = std::shared_ptr<CloneBackward>(new CloneBackward(), deleteNode);
    grad_fn->set_next_edges(collect_next_edges( self ));
  }
  #ifndef NDEBUG
  c10::optional<Storage> self__storage_saved =
    self_.has_storage() ? c10::optional<Storage>(self_.storage()) : c10::nullopt;
  c10::intrusive_ptr<TensorImpl> self__impl_saved;
  if (self_.defined()) self__impl_saved = self_.getIntrusivePtr();
  #endif
  auto tmp = ([&]() {
    at::AutoNonVariableTypeMode non_var_type_mode(true);
    return at::clone(self_, memory_format);
  })();
  auto result = std::move(tmp);
  #ifndef NDEBUG
  if (self__storage_saved.has_value())
    AT_ASSERT(self__storage_saved.value().is_alias_of(self_.storage()));
  if (self__impl_saved) AT_ASSERT(self__impl_saved == self_.getIntrusivePtr());
  #endif
  if (grad_fn) {
      set_history(flatten_tensor_args( result ), grad_fn);
  }
  if (isFwGradDefined(self)) {
      auto self_fw_grad = toLegacyFwGrad(self);
      auto result_new_fw_grad = at::clone(self_fw_grad, memory_format);
      if (result_new_fw_grad.defined()) {
        result.set_fw_grad(result_new_fw_grad, /* level */ 0, /* is_inplace_op */ false);
      }
  }
  return result;
}
```

- For no entry:
```yaml
- name: angle(Tensor self) -> Tensor
  self: angle_backward(grad, self)
```

```cpp
Tensor angle(const Tensor & self) {
  auto& self_ = unpack(self, "self", 0);
  auto _any_requires_grad = compute_requires_grad( self );
  std::shared_ptr<AngleBackward> grad_fn;
  if (_any_requires_grad) {
    grad_fn = std::shared_ptr<AngleBackward>(new AngleBackward(), deleteNode);
    grad_fn->set_next_edges(collect_next_edges( self ));
    grad_fn->self_ = SavedVariable(self, false);
  }
  #ifndef NDEBUG
  c10::optional<Storage> self__storage_saved =
    self_.has_storage() ? c10::optional<Storage>(self_.storage()) : c10::nullopt;
  c10::intrusive_ptr<TensorImpl> self__impl_saved;
  if (self_.defined()) self__impl_saved = self_.getIntrusivePtr();
  #endif
  auto tmp = ([&]() {
    at::AutoNonVariableTypeMode non_var_type_mode(true);
    return at::angle(self_);
  })();
  auto result = std::move(tmp);
  #ifndef NDEBUG
  if (self__storage_saved.has_value())
    AT_ASSERT(self__storage_saved.value().is_alias_of(self_.storage()));
  if (self__impl_saved) AT_ASSERT(self__impl_saved == self_.getIntrusivePtr());
  #endif
  if (grad_fn) {
      set_history(flatten_tensor_args( result ), grad_fn);
  }
  throw_error_for_complex_autograd(result, "angle");
  TORCH_CHECK(!(isFwGradDefined(self)), "Trying to use forward prop with angle that does not support it.");
  return result;
}
```

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D25607505

Pulled By: albanD

fbshipit-source-id: fe2315d587689af1cd5968536fa26c680b8b8829
2021-04-14 14:13:30 -07:00
Sam Estep
4753100a3b Un-ignore F403 in .flake8 (#55838)
Summary:
Generally wildcard imports are bad for the reasons described here: https://www.flake8rules.com/rules/F403.html

This PR replaces wildcard imports with an explicit list of imported items where possible, and adds a `# noqa: F403` comment in the other cases (mostly re-exports in `__init__.py` files).

This is a prerequisite for https://github.com/pytorch/pytorch/issues/55816, because currently [`tools/codegen/dest/register_dispatch_key.py` simply fails if you sort its imports](https://github.com/pytorch/pytorch/actions/runs/742505908).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55838

Test Plan: CI. You can also run `flake8` locally.

Reviewed By: jbschlosser

Differential Revision: D27724232

Pulled By: samestep

fbshipit-source-id: 269fb09cb4168f8a51fd65bfaacc6cda7fb87c34
2021-04-13 09:24:07 -07:00
Wenlei Xie
70af5db7ca Remove use_c10_dispatcher option (#54969)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54969

With all use cases to hacky wrapper removed, all kernels will be
dispatched with c10 full dispatcher.
ghstack-source-id: 125434790

Test Plan: buck build //caffe2/aten/...

Reviewed By: ezyang, walterddr

Differential Revision: D27436596

fbshipit-source-id: 7a146d1f4a983b4a81f8552be4eec6c482b6bea2
2021-03-31 16:24:24 -07:00
Edward Yang
6e8c4ad7fd s/StructuredNativeFunctions/NativeFunctionsGroup/ (#54427)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54427

A StructuredNativeFunctions is no longer guaranteed to actually
be structured (test structured property for that), so we rename
this to a more neutral name.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: ailzhang

Differential Revision: D27235380

Pulled By: ezyang

fbshipit-source-id: 2b438d615bf06a47fc9c7bf6eb66fd8b4df31bc8
2021-03-23 00:43:57 -07:00
Edward Yang
d226985257 Read out layout from options directly, rather than via backend (#54074)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54074

I don't see why this shouldn't work.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: ljk53

Differential Revision: D27086594

Pulled By: ezyang

fbshipit-source-id: 1d5f1997017ec48c4140f43e44f0d8a3df28ac7f
2021-03-22 08:20:13 -07:00
Wenlei Xie
2ecb2c7931 Pass Scalar by reference (#53583)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53583

`Scalar` takes 32 bytes due to `c10::complex<double>`
requires aligning to 16 bytes. Passing Scalar by reference
shows about 1% improvements on instruction count.

All the changes in this commit are codemoded except for
the following 4 files (which code-gen signatures):
```
tools/codegen/api/cpp.py
tools/codegen/api/native.py
tools/codegen/api/structured.py
caffe2/contrib/aten/gen_op.py
```

# Codemode

## Main Step

For the codemod part, here is the main command used:
```
fastmod --extensions h '([a-zA-Z_+]\([^)]*,?\s*)Scalar (\w+)' '${1}const Scalar& ${2}'
fastmod --extensions h '([a-zA-Z_+]\([^)]*,?\s*)optional<Scalar> (\w+)' '${1}const optional<Scalar>& ${2}'
fastmod --extensions cpp '([a-zA-Z_+]\([^)]*,?\s*)Scalar (\w+)' '${1}const Scalar& ${2}'
fastmod --extensions cpp '([a-zA-Z_+]\([^)]*,?\s*)optional<Scalar> (\w+)' '${1}const optional<Scalar>& ${2}'
```

As you can tell, it codemods both `Scalar` and `optional<Scalar>`.  Apply these commands iteratively until reaching a fix-point (since one method signature might contain multiple `Scalar` parameter).

In retrospect, excluding `thrid_party` and `torch/csrc/jit` would be a good idea. (I revert it manually later, see https://github.com/pytorch/pytorch/pull/53479 as an reference).

## Pre-Step

Prior to applying the main command,  as some `Scalar` are presented as `at::Scalar` or `c10::Scalar`, so I codemod some of them in advance. Here is an incomplete list:
```
fastmod --extensions h '([a-zA-Z_+]\([^)]*,?\s*)at::Scalar (\w+)' '${1}const at::Scalar& ${2}'
fastmod --extensions cpp '([a-zA-Z_+]\([^)]*,?\s*)at::Scalar (\w+)' '${1}const at::Scalar& ${2}'
fastmod --extensions h '([a-zA-Z_+]\([^)]*,?\s*)c10::optional<Scalar> (\w+)' '${1}const c10::optional<Scalar>& ${2}'
fastmod --extensions cpp '([a-zA-Z_+]\([^)]*,?\s*)c10::optional<Scalar> (\w+)' '${1}const c10::optional<Scalar>& ${2}'
```

## Fixup
There are a couple of post codemod fixup. For example, `const Scalar` will be codemoded into `const const Scalar&`. `at:Scalar` will be codemoded into `at::const Scalar&`  (if `Pre-step` is not done comprehensively). Here is an incomplete list:
```
fastmod --extensions cpp 'const const Scalar' 'const Scalar'
fastmod --extensions h 'const const c10::optional<Scalar>' 'const c10::optional<Scalar>'
fastmod --extensions cpp 'const const c10::optional<Scalar>' 'const c10::optional<Scalar>'
fastmod 'at::const Scalar&' 'const at::Scalar&'
```

## Supplementary

`cu` and `mm` files also need to be codemoded, for example:

```
fastmod --extensions cu 'at::const Scalar&' 'const at::Scalar&'
fastmod --extensions mm '([a-zA-Z_+]\([^)]*,?\s*)Scalar (\w+)' '${1}const Scalar& ${2}'
```

Function pointers are not codemoded. Here is an incomplete list:

```
# Cover case: using index_fill_fn = void(*)(TensorIterator & iter, int64_t dim, int64_t self_dim_size, int64_t self_dim_stride, Scalar source);
fastmod --extensions h '(void\s*\(\s*\*\s*\)\([^)]*,?\s*)Scalar (\w+)' '${1}const Scalar& ${2}'

# Cover case: using softplus_fn = void (*)(TensorIterator&, Scalar, Scalar);
fastmod --extensions h '(void\s*\(\s*\*\s*\)\([^)]*,?\s*)Scalar([, \)])' '${1}const Scalar&${2}'
fastmod --extensions cpp '(void\s*\(\s*\*\s*\)\([^)]*,?\s*)Scalar([, \)])' '${1}const Scalar&${2}'
fastmod --extensions h '(void\s*\(\s*\*\s*\)\([^)]*,?\s*)optional<Scalar>([, \)])' '${1}const optional<Scalar>&${2}'
```

Some corner cases needs to be manually fixed.

ghstack-source-id: 123970306

Test Plan: Imported from OSS

Reviewed By: smessmer

Differential Revision: D26904445

fbshipit-source-id: 8d8a002af4b5125f153a32f03c6956be7ae5671d
2021-03-15 23:17:06 -07:00
Ailing Zhang
aeb3e93351 Move view handling logic to gen_inplace_or_view_type.py (#53341)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53341

Test Plan: Imported from OSS

Reviewed By: nikithamalgifb

Differential Revision: D26973912

Pulled By: ailzhang

fbshipit-source-id: ea31bdef0beac6996d509f5d45ebefa3ea8e2b89
2021-03-11 21:25:15 -08:00
Ailing Zhang
9f75de278f Move common autograd utils functions from gen_variable_type.py to api/autograd.py. (#53340)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53340

Test Plan: Imported from OSS

Reviewed By: nikithamalgifb

Differential Revision: D26973914

Pulled By: ailzhang

fbshipit-source-id: 8367a08b27b25808782c77aadc3c67d07c354957
2021-03-11 19:58:45 -08:00
Edward Yang
37bf6c134b Register DefaultBackend implementations for functional/inplace structured operators (#53037)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53037

As remarked in #52277 it is easy to give an (inefficient, due to extra
redispatches) DefaultBackend implementation of foo and foo_ in terms of
foo_out.  This patch enables code generation for DefaultBackend in these
cases by default for all structured kernels.  You can see the payoff
in MSNPU extension: it only has to register a kernel for add.out, and it
gets add and add_ kernels automatically.

The actual code changes are very modest:
- When DefaultBackend, call the dispatched (not direct native::)
  functions to allocate tensors, change device guard, etc
- Don't call impl() for DefaultBackend (as it doesn't exist); instead,
  directly generate a call to at::foo_out to do the actual work.
- Do NOT generate DefaultBackend implementation for foo_out.  Actually,
  there is a case to be made for this being a good idea with more infra;
  see comments inside.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: bdhirsh

Differential Revision: D26731225

Pulled By: ezyang

fbshipit-source-id: 939da7cb69f694722ec293e5e42e74a755dd0985
2021-03-02 14:13:08 -08:00
Edward Yang
c5a67f1675 Fix minor inaccuracy in translate error reporting (#53032)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53032

Previously, you could get this error message:

```
Failed to synthesize the expression "Tensor & out".
When I failed, the following bindings were available in the context:

  const Tensor & self;
  const Tensor & other;
  Scalar alpha;
  const Tensor & op.outputs_[0];
```

There's a problem with this error message: it doesn't seem like there
is any 'out' argument available, but actually there is: the last
binding in the context is it.  We printed the *expression*, not
the *ctype name*.

After this patch, the context now prints as:

```
  const Tensor & self; // self
  const Tensor & other; // other
  Scalar alpha; // alpha
  const Tensor & out; // op.outputs_[0]
```

Now it becomes clear that it's a const mismatch.  Maybe we could also
beef up the error message so it points out near misses, but I'll leave
that to future work.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: ljk53

Differential Revision: D26729768

Pulled By: ezyang

fbshipit-source-id: adb363551a7145eac788943c20969c86b1f8a81b
2021-03-02 14:11:28 -08:00
Brian Hirsh
d02a2bd5d1 codegen'd API for redispatching (#52008)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52008

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D26356079

Pulled By: bdhirsh

fbshipit-source-id: 1fd34fbb4dbc48cc8390cad99e30e0d04fc75a4f
2021-02-22 10:55:38 -08:00
Jiakai Liu
c9c4b871a5 [pytorch] reintroduce static dispatch (#51957)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51957

This is a simplified version of #51554.

Compared to #51554, this version only supports statically dispatching to
a specific backend. The benefit is that it skipped the dispatch key
computation logic thus has less framework overhead. The downside is that
if input tensors do not match the specified backend it will throw error
instead of falling back to regular dispatch.

Sample code:
```
Tensor empty(IntArrayRef size, TensorOptions options, c10::optional<MemoryFormat> memory_format) {
    return at::cpu::empty(size, options, memory_format);
}

// aten::conj(Tensor(a) self) -> Tensor(a)
Tensor conj(const Tensor & self) {
    return at::math::conj(self);
}

// aten::conj.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
Tensor & conj_out(Tensor & out, const Tensor & self) {
    return at::cpu::conj_out(out, self);
}

// aten::conj.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
Tensor & conj_outf(const Tensor & self, Tensor & out) {
    return at::cpu::conj_out(out, self);
}

// aten::_conj(Tensor self) -> Tensor
Tensor _conj(const Tensor & self) {
    return at::defaultbackend::_conj(self);
}
```

For ops without the specific backend dispatch, it will throw error:
```
// aten::_use_cudnn_ctc_loss(Tensor log_probs, Tensor targets, int[] input_lengths, int[] target_lengths, int blank) -> bool
bool _use_cudnn_ctc_loss(const Tensor & log_probs, const Tensor & targets, IntArrayRef input_lengths, IntArrayRef target_lengths, int64_t blank) {
    TORCH_CHECK(false, "Static dispatch does not support _use_cudnn_ctc_loss for CPU.");
}
```

Differential Revision: D26337857

Test Plan: Imported from OSS

Reviewed By: bhosmer

Pulled By: ljk53

fbshipit-source-id: a8e95799115c349de3c09f04a26b01d21a679364
2021-02-19 11:41:39 -08:00
Edward Yang
4d85e30133 Support at::cpu on non-structured kernels (#51590)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51590

This PR backports a subset of Jiakai's changes from
https://github.com/pytorch/pytorch/pull/51554 that adds support
for at::cpu in non-structured kernels.

The unusual bits:

- Need to add a new forward inference rule for doing conversions
  of const optional<Tensor>& to const Tensor&
- Need to give the wrapper functions a prefix so that the call to
  wrapper is not ambiguous

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: ljk53

Differential Revision: D26209871

Pulled By: ezyang

fbshipit-source-id: 8162686039675ab92a2af7a14f6b18941f8944df
2021-02-04 09:19:45 -08:00
Edward Yang
81c7c3bae5 Add api.structured; switch structured kernels to use const Tensor& everywhere (#51490)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51490

Mutable Tensor ref is a source of endless confusion for kernel writers;
if we're going to make everyone rewrite their kernels, might as well
also get rid of mutable Tensor& while we're at it.

This is a refactor-then-small-update double whammy.  The refactor
is to separate tools.codegen.api.structured from api.native for
describing the type signatures of structured kernels (previously,
I was naughtily reusing native for this purpose--now I need it to
behave differently as Tensor).  This started off as a copy paste, but
since there are not that many structured kernels so far I could delete
all of the legacy logic from native that didn't make sense (without
having to go out and fix all the use sites all at once).

One more small addition was teaching translate to convert Tensor& to const Tensor&.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: bhosmer

Differential Revision: D26182413

Pulled By: ezyang

fbshipit-source-id: ed636866add3581179669cf9283f9835fcaddc06
2021-02-03 14:03:46 -08:00
Edward Yang
648cdb7d0a Relax type signature for tools.codegen.api.translate (#51477)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51477

Passing in a full binding is still OK, but if you have less
(e.g., an Expr/CType), that will do too. I'll need this for
some codegen patches I'm doing later.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: bhosmer

Differential Revision: D26179560

Pulled By: ezyang

fbshipit-source-id: 5730dfb2c91bf5325496e57b0c91eb6823c9194d
2021-02-03 14:00:47 -08:00
Scott Wolchok
4495b49ffa [PyTorch] Pass TensorOptions by value (#51165)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51165

`TensorOptions` does not have a non-trivial copy, move, or
destroy operation and is small enough to fit in a register, so it
seems like we should pass it by value.
ghstack-source-id: 120697498

Test Plan:
Measured timing for empty framework overhead benchmark before & after this change:

Before:
```
I0126 16:02:50.662864 2137574 bench.cpp:139] Mean 0.268645
I0126 16:02:50.662891 2137574 bench.cpp:140] Median 0.267485
I0126 16:02:50.662896 2137574 bench.cpp:141] Min 0.266485
I0126 16:02:50.662901 2137574 bench.cpp:142] stddev 0.00219359
I0126 16:02:50.662915 2137574 bench.cpp:143] stddev / mean 0.00816537

          2,968.37 msec task-clock                #    0.997 CPUs utilized            ( +-  0.03% )
               250      context-switches          #    0.084 K/sec                    ( +-  2.21% )
                 1      cpu-migrations            #    0.000 K/sec
            11,403      page-faults               #    0.004 M/sec                    ( +-  0.28% )
     5,898,481,882      cycles                    #    1.987 GHz                      ( +-  0.03% )  (50.05%)
    16,169,242,938      instructions              #    2.74  insn per cycle           ( +-  0.03% )  (50.06%)
     3,076,546,626      branches                  # 1036.443 M/sec                    ( +-  0.05% )  (50.05%)
         2,531,859      branch-misses             #    0.08% of all branches          ( +-  0.89% )  (50.03%)
```

After:
```
I0126 16:23:20.010062 2244624 bench.cpp:139] Mean 0.266814
I0126 16:23:20.010092 2244624 bench.cpp:140] Median 0.265759
I0126 16:23:20.010099 2244624 bench.cpp:141] Min 0.260291
I0126 16:23:20.010107 2244624 bench.cpp:142] stddev 0.00548279
I0126 16:23:20.010118 2244624 bench.cpp:143] stddev / mean 0.0205491

          2,983.75 msec task-clock                #    0.995 CPUs utilized            ( +-  0.36% )
               243      context-switches          #    0.082 K/sec                    ( +-  1.26% )
                 1      cpu-migrations            #    0.000 K/sec
            11,422      page-faults               #    0.004 M/sec                    ( +-  0.18% )
     5,928,639,486      cycles                    #    1.987 GHz                      ( +-  0.36% )  (50.02%)
    16,105,928,210      instructions              #    2.72  insn per cycle           ( +-  0.05% )  (50.02%)
     3,150,273,453      branches                  # 1055.809 M/sec                    ( +-  0.03% )  (50.05%)
         3,713,617      branch-misses             #    0.12% of all branches          ( +-  0.83% )  (50.07%)

```

It looked close to neutral, so I used `perf stat` to confirm it's about a 1% instruction count win.

For deciding whether this stack is worth it, I went back and ran `perf stat` on the baseline diff before I started touching the dispatcher:

```
          2,968.37 msec task-clock                #    0.997 CPUs utilized            ( +-  0.03% )
               250      context-switches          #    0.084 K/sec                    ( +-  2.21% )
                 1      cpu-migrations            #    0.000 K/sec
            11,403      page-faults               #    0.004 M/sec                    ( +-  0.28% )
     5,898,481,882      cycles                    #    1.987 GHz                      ( +-  0.03% )  (50.05%)
    16,169,242,938      instructions              #    2.74  insn per cycle           ( +-  0.03% )  (50.06%)
     3,076,546,626      branches                  # 1036.443 M/sec                    ( +-  0.05% )  (50.05%)
         2,531,859      branch-misses             #    0.08% of all branches          ( +-  0.89% )  (50.03%)
```

If I've done the arithmetic correctly, we have an 0.39% instruction count win.

Reviewed By: ezyang

Differential Revision: D25983863

fbshipit-source-id: 87d1451a01ead25738ea6b80db270d344bc583b2
2021-02-01 12:40:08 -08:00
Edward Yang
2ab497012f Add at::cpu namespace of functions for structured kernels (#49505)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49505

I have a problem which is that static runtime needs a way to bypass
dispatch and call into kernels directly.  Previously, it used
native:: bindings to do this; but these bindings no longer exist
for structured kernels!  Enter at::cpu: a namespace of exactly
at:: compatible functions that assume all of their arguments are
CPU and non-autograd!  The header looks like this:

```
namespace at {
namespace cpu {

CAFFE2_API Tensor & add_out(Tensor & out, const Tensor & self, const Tensor & other, Scalar alpha=1);
CAFFE2_API Tensor add(const Tensor & self, const Tensor & other, Scalar alpha=1);
CAFFE2_API Tensor & add_(Tensor & self, const Tensor & other, Scalar alpha=1);
CAFFE2_API Tensor & upsample_nearest1d_out(Tensor & out, const Tensor & self, IntArrayRef output_size, c10::optional<double> scales=c10::nullopt);
CAFFE2_API Tensor upsample_nearest1d(const Tensor & self, IntArrayRef output_size, c10::optional<double> scales=c10::nullopt);
CAFFE2_API Tensor & upsample_nearest1d_backward_out(Tensor & grad_input, const Tensor & grad_output, IntArrayRef output_size, IntArrayRef input_size, c10::optional<double> scales=c10::nullopt);
CAFFE2_API Tensor upsample_nearest1d_backward(const Tensor & grad_output, IntArrayRef output_size, IntArrayRef input_size, c10::optional<double> scales=c10::nullopt);

}}
```

This slows down static runtime because these are not the "allow
resize of nonzero tensor" variant binding (unlike the ones I had manually
written).  We can restore this: it's a matter of adding codegen smarts to
do this, but I haven't done it just yet since it's marginally more
complicated.

In principle, non-structured kernels could get this treatment too.
But, like an evil mastermind, I'm withholding it from this patch, as an extra
carrot to get people to migrate to structured muahahahaha.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: smessmer

Differential Revision: D25616105

Pulled By: ezyang

fbshipit-source-id: 84955ae09d0b373ca1ed05e0e4e0074a18d1a0b5
2021-01-22 13:11:59 -08:00
Jiakai Liu
24fd84313f [pytorch] fix ConstRefCType usage in codegen/api/native.py (#50742)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50742

Fixed the other usage of `BaseCType('const ...&)` on #49138.

Checked byte-for-byte compatibility of the codegen output.

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D25955565

Pulled By: ljk53

fbshipit-source-id: 83ebd6b039892b805444867ed97a6e2fa6e72225
2021-01-20 15:01:37 -08:00
Sebastian Messmer
47c57b8836 Fix Native signature for optional Tensor arguments (#50767)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50767

The native signature for optional tensor arguments wrongly produced "Tensor" instead of "optional<Tensor>". We didn't notice this because all internal ops currently use hacky_wrapper, and for hacky_wrapper, "Tensor" is correct.

This PR fixes that and ports one op (batch_norm) to not use hacky_wrapper anymore as a proof of fix.
ghstack-source-id: 120017543

Test Plan: waitforsandcastle

Reviewed By: bhosmer

Differential Revision: D25960941

fbshipit-source-id: ca6fe133109b5d85cff52390792cf552f12d9590
2021-01-19 21:55:46 -08:00
Sebastian Messmer
e4c41b6936 Remove codegen logic to support non-c10-full ops (#49164)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49164

This PR removes the logic paths in codegen that were responsible for handling non-c10-full ops.
This only goes through our basic codegen. It does not simplify C++ code yet and it does not remove the codegen for generated unboxing wrappers yet.
ghstack-source-id: 119450487

Test Plan: waitforsandcastle

Reviewed By: ezyang

Differential Revision: D25462977

fbshipit-source-id: 7e70d14bea96948f5056d98125f3e6ba6bd78285
2021-01-06 14:17:36 -08:00
Jiakai Liu
e71a13e8a3 [pytorch][codegen] migrate gen_variable_type to new data model (#49735)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49735

This is the final wave of autograd codegen data model migration.

After this PR:
- autograd codegen no longer depends on Declarations.yaml;
- autograd codegen sources are fully type annotated and pass mypy-strict check;

To avoid potential merge conflicts with other pending PRs, some structural
changes are intentionally avoided, e.g. didn't move inner methods out, didn't
change all inner methods to avoid reading outer function's variables, and etc.

Confirmed byte-for-byte compatible with the old codegen:
```
Run it before and after this PR:
  .jenkins/pytorch/codegen-test.sh <baseline_output_dir>
  .jenkins/pytorch/codegen-test.sh <test_output_dir>

Then run diff to compare the generated files:
  diff -Naur <baseline_output_dir> <test_output_dir>
```

Confirmed clean mypy-strict run:
```
mypy --config mypy-strict.ini
```

Test Plan: Imported from OSS

Reviewed By: ezyang, bhosmer

Differential Revision: D25678879

Pulled By: ljk53

fbshipit-source-id: ba6e2eb6b9fb744208f7f79a922d933fcc3bde9f
2021-01-05 14:12:39 -08:00
Edward Yang
6c833efd65 Move default or no default logic into native.argument (#49489)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49489

Previously, it was done at a use site, but that meant other use
sites don't get the right logic.  Pushing it in makes sure everyone
gets it.

I also fixed one case of confusion where defn() was used to define a decl().
If you want to define a declaration with no defaults, say no_default().decl()
which is more direct and will give us code reviewers a clue if you should
have pushed this logic in.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: smessmer

Differential Revision: D25595407

Pulled By: ezyang

fbshipit-source-id: 89c664f0ed4d95699794a0d3123d54d0f7e4cba4
2021-01-04 11:59:20 -08:00
Edward Yang
8eee8460f8 codegen: Resolve overload ambiguities created by defaulted arguments (#49348)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49348

This is a redux of #45666 post refactor, based off of
d534f7d4c5
Credit goes to peterbell10 for the implementation.

Fixes #43945.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: smessmer

Differential Revision: D25594004

Pulled By: ezyang

fbshipit-source-id: c8eb876bb3348308d6dc8ba7bf091a2a3389450f
2021-01-04 11:59:16 -08:00
Edward Yang
8e20594b38 Construct CppSignatureGroup from NativeFunction (#49245)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49245

This will make it easier to implement the POC in
d534f7d4c5
see also https://github.com/pytorch/pytorch/pull/45666

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: smessmer

Differential Revision: D25594005

Pulled By: ezyang

fbshipit-source-id: e458d3dc3a765ec77425761b9b17f23769cecf9e
2021-01-04 11:55:28 -08:00
Sebastian Messmer
c7e9abb66a Making ops c10-full: list of optional tensors (#49138)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49138

See for details: https://fb.quip.com/QRtJAin66lPN

We need to model optional types explicitly, mostly for schema inference. So we cannot pass a `Tensor?[]` as `ArrayRef<Tensor>`, instead we need to pass it as an optional type. This PR changes it to `torch::List<c10::optional<Tensor>>`. It also makes the ops c10-full that were blocked by this.

## Backwards Compatibility

- This should not break the Python API because the representation in Python is the same and python_arg_parser just transforms the python list into a `List<optional<Tensor>>` instead of into a `List<Tensor>`.
- This should not break serialized models because there's some logic that allows loading a serialized `List<Tensor>` as `List<optional<Tensor>>`, see https://github.com/pytorch/pytorch/pull/49138/files#diff-9315f5dd045f47114c677174dcaa2f982721233eee1aa19068a42ff3ef775315R57
- This will break backwards compatibility for the C++ API. There is no implicit conversion from `ArrayRef<Tensor>` (which was the old argument type) to `List<optional<Tensor>>`. One common call pattern is `tensor.index({indices_tensor})`, where indices_tensor is another `Tensor`, and that will continue working because the `{}` initializer_list constructor for `List<optional<Tensor>>` can take `Tensor` elements that are implicitly converted to `optional<Tensor>`, but another common call pattern was `tensor.index(indices_tensor)`, where previously, the `Tensor` got implicitly converted to an `ArrayRef<Tensor>`, and to implicitly convert `Tensor -> optional<Tensor> -> List<optional<Tensor>>` would be two implicit conversions. C++ doesn't allow chaining. two implicit conversions. So those call sites have to be rewritten to `tensor.index({indices_tensor})`.

ghstack-source-id: 119269131

Test Plan:
## Benchmarks (C++ instruction counts):
### Forward
#### Script
```py
from torch.utils.benchmark import Timer

counts = Timer(
    stmt="""
        auto t = {{op call to measure}};
    """,
    setup="""
        using namespace torch::indexing;
        auto x = torch::ones({4, 4, 4});
    """,
    language="cpp",
).collect_callgrind(number=1_000)
print(counts)
```
#### Results
|  Op call                                                              |before   |after   |delta  |      |
|------------------------------------------------------------------------|---------|--------|-------|------|
|x[0] = 1                                                                |11566015 |11566015|0      |0.00% |
|x.index({0})                                                            |6807019  |6801019 |-6000  |-0.09%|
|x.index({0, 0})                                                         |13529019 |13557019|28000  |0.21% |
|x.index({0, 0, 0})                                                      |10677004 |10692004|15000  |0.14% |
|x.index({"..."})                                                        |5512015  |5506015 |-6000  |-0.11%|
|x.index({Slice(None, None, None)})                                      |6866016  |6936016 |70000  |1.02% |
|x.index({None})                                                         |8554015  |8548015 |-6000  |-0.07%|
|x.index({false})                                                        |22400000 |22744000|344000 |1.54% |
|x.index({true})                                                         |27624088 |27264393|-359695|-1.30%|
|x.index({"...", 0, true, Slice(1, None, 2), torch::tensor({1, 2})})|123472000|123463306|-8694|-0.01%|

### Autograd
#### Script
```py
from torch.utils.benchmark import Timer

counts = Timer(
    stmt="""
        auto t = {{op call to measure}};
    """,
    setup="""
        using namespace torch::indexing;
        auto x = torch::ones({4, 4, 4}, torch::requires_grad());
    """,
    language="cpp",
).collect_callgrind(number=1_000)
print(counts)
```
Note: the script measures the **forward** path of an op call with autograd enabled (i.e. calls into VariableType). It does not measure the backward path.

#### Results
|  Op call                                                              |before   |after   |delta  |      |
|------------------------------------------------------------------------|---------|--------|-------|------|
|x.index({0})                                                            |14839019|14833019|-6000| 0.00% |
|x.index({0, 0})                                                         |28342019|28370019|28000| 0.00% |
|x.index({0, 0, 0})                                                      |24434004|24449004|15000| 0.00% |
|x.index({"..."})                                                       |12773015|12767015|-6000| 0.00% |
|x.index({Slice(None, None, None)})                                      |14837016|14907016|70000| 0.47% |
|x.index({None})                                                        |15926015|15920015|-6000| 0.00% |
|x.index({false})                                                        |36958000|37477000|519000| 1.40% |
|x.index({true})                                                         |41971408|42426094|454686| 1.08% |
|x.index({"...", 0, true, Slice(1, None, 2), torch::tensor({1, 2})}) |168184392|164545682|-3638710| -2.16% |

Reviewed By: bhosmer

Differential Revision: D25454632

fbshipit-source-id: 28ab0cffbbdbdff1c40b4130ca62ee72f981b76d
2021-01-04 05:04:02 -08:00
Edward Yang
3efd5d8f01 Introduce tools.codegen.api.translate (#49122)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49122

cpparguments_exprs has induced a lot of head scratching in many recent PRs for how to structure the code in a good way.  This PR eliminates the old algorithm for an entirely new algorithm inspired by logic programming.  The net result is shorter, cleaner and should be more robust to future changes.

This PR is a bit of a whopper.  Here is the order to review it.

- tools/codegen/api/types.py
  - Deleted CppArgument, CppArgumentPackIface (and subclasses), CppExpr, DispatcherExpr, DispatcherArgument, NativeExpr, NativeArgument, MetaArgument. All things previously called XArgument are now Binding. All things previously called XExpr are now Expr. I deleted the `__str__` implementation on Binding and fixed all call sites not to use it. On Binding, I renamed `str_no_default` and `str_default` to `defn` and `decl` for better symmetry with the corresponding signature concepts, although I'm open to naming them back to their original versions.
  - Obviously, things are less type safe without the class distinctions. So I introduce a new ADT called CType. CType represents the *semantic C++ type* of a binding: it is both the C++ type (e.g., `const Tensor&`) as well as the argument name that specifies what the  binding denotes (e.g., `other`). Every binding now records its CType. The key observation here is that you don't actually care if a given expression is from the cpp or dispatcher or native API; what you care is having enough information to know what the expression means, so you can use it appropriately. CType has this information. For the most part, ArgNames are just the string names of the arguments as you see them in JIT schema, but there is one case (`possibly_redundant_memory_format`) where we encode a little extra information. Unlike the plain strings we previously used to represent C++ types, CType have a little bit of structure around optional and references, because the translation code needs to work around these concepts.
  - I took the opportunity to kill all of the private fields like `_arguments` and `_returns_type` (since the argument types don't make sense anymore). Everything is computed for you on the fly. If this is a perf problem in codegen we can start using `cached_property` decorator.
  - All of the heavy lifting in CppSignature.argument_packs has been moved to the cpp module. We'll head over there next. Similarly, all of the exprs methods are now calling translate, the new functionality which we haven't gotten to yet
- tools/codegen/api/cpp.py
   - We refactor all of the type computation functions to return CType instead of str. Because CTypes need to know the denotation, there is a new `binds: ArgName` argument to most functions that provides the denotation, so we can slot it in. (An alternative would have been to construct CTypes without denotations and then fill them in post-facto, but I didn't do it this way. One downside is there are some places where I need a CType without denotation, so I fill these in with `__placeholder__` whenever this happens).
  - `argument` and `arguments` are now extremely simple. There is no more Pack business, just produce one or more Bindings. The one thing of note is that when both a `memory_format` and `options` are in scope, we label the memory format as `possibly_redundant_memory_format`. This will be used in translation
- tools/codegen/api/dispatcher.py and tools/codegen/api/native.py - same deal as cpp.py. One thing is that `cpparguments_exprs` is deleted; that is in the translator
- tools/codegen/api/translate.py - the translator! It uses a very simple backwards deduction engine to work out how to fill in the arguments of functions. There are comments in the file that explain how it works.
- Everything else: just some small call site tweaks for places when I changed API.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: ljk53

Differential Revision: D25455887

Pulled By: ezyang

fbshipit-source-id: 90dc58d420d4cc49281aa8647987c69f3ed42fa6
2020-12-16 16:18:40 -08:00
Iurii Zdebskyi
5716b7db72 Enabled Scalar lists (#48222)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48222

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D25074765

Pulled By: izdeby

fbshipit-source-id: 96ebe3c9907178c9338c03fb7993b2ecb26db8f4
2020-12-11 16:04:50 -08:00
Brian Hirsh
b94ec8c9f7 pyi codegen - removing byte-for-byte compatibility hacks (#49055)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49055

Removed the majority of the TODO hacks that I added to the original pyi PR to maintain byte-for-byte compatibility.

I left a few of the divergences between pyi deprecated vs. native signatures, since (a) they're smaller and (b) it might make more sense to kill the deprecated functions at some point entirely.

Test Plan: Imported from OSS

Reviewed By: ljk53

Differential Revision: D25410847

Pulled By: bdhirsh

fbshipit-source-id: cf07cdda92f7492cd83d363cbb810e3810f6b8c8
2020-12-11 13:29:19 -08:00
Brian Hirsh
9920adebfd pyi cleanup (#49054)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49054

These are some followups from the first pyi codegen PR. Still maintaining byte-for-byte compatibility in this one.

- Separated `argument_str() with a pyi flag into two functions, `argument_str()` and `argument_str_pyi()`
- Added a notes section for pyi at the top of `python.py`
- Added a `Python Interface` section that I moved the free-standing pyi functions to

Test Plan: Imported from OSS

Reviewed By: ljk53

Differential Revision: D25410848

Pulled By: bdhirsh

fbshipit-source-id: db83a80af900c32b5e32d67ce27767f6e7c2adfb
2020-12-11 13:27:41 -08:00
Edward Yang
59e822026c Add manual_cpp_binding to native_functions.yaml (#49092)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49092

Functions which specify manual_cpp_binding don't automatically
get C++ bindings generated for them in TensorBody.h or
Functions.h.  This lets end users manually define the bindings
themselves, which may be helpful if there is a way to
short circuit the dispatcher entirely.  contiguous() is switched
to use this mechanism.

Although manual_cpp_binding suggests that we don't generate the
binding at all, it is often the case that there is some "fast
path", but when this path is not satisfied, we should go back
to the slow dispatch.  So we still generate a fallback method/function
which the user-defined binding can call into in case that we
have to go slowpath.

The correctness conditions for bindings manually written in this
way are subtle.  Here are the ones I can think of off the top
of my head:

- Whatever condition is tested in the C++ body, must ALSO be
  tested again in the native:: implementation on the other
  side of the dispatcher.  This is because you are NOT GUARANTEED
  to hit the native:: implementation through the C++ binding,
  you may go straight to the implementation via a boxed call.

- If a binding is written in this way, it is only safe to
  skip dispatch if you would have returned the same tensor as
  before.  In any situation you would return a fresh tensor,
  you MUST go to the slow path, because you need to actually
  get to the autograd kernel.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: bhosmer

Differential Revision: D25428440

Pulled By: swolchok

fbshipit-source-id: 6e71767cb8d1086d56cd827c1d2d56cac8f6f5fe
2020-12-10 21:56:53 -08:00
Edward Yang
9b0ffb9fb3 Delete cpp.group_arguments (#49043)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49043

Previously, this function had nontrivial algorithmic content,
but after #48195, this was just a swiss army knife for pasting
together arguments while maintaining structure.  I added some
more properties for Arguments for convenient access in this way,
and then inlined the implementation of group_arguments into all of its call
sites, simplifying whenever contextual.  This might be controversial, but I
think the resulting code is easier to understand.

You may notice that there is some modest code duplication between
dispatcher.cpparguments_exprs and CppSignature.argument_packs.
This is a known problem and I will be attempting to fix it in
a follow up PR.

Confirmed to be byte-for-byte compatible.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: H-Huang

Differential Revision: D25455885

Pulled By: ezyang

fbshipit-source-id: 8fbe066e8c3cb7ee8adb5b87296ec5bd7b49e01f
2020-12-10 18:20:46 -08:00
Edward Yang
267641a245 Rename positional and kwarg_only to have flat prefix (#49042)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49042

I want the names positional and kwarg_only to give the unflat
representation (e.g., preserving TensorOptionsArguments in the
returned Union).  So I regret my original naming choice when
I moved grouping to model.  This renames them to have flat_ prefix
and also adds a flat_non_out argument for cases where you just
want to look at non-out arguments.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: H-Huang

Differential Revision: D25455884

Pulled By: ezyang

fbshipit-source-id: f923f8881267a3e3e8e9521519412f7cc25034fc
2020-12-10 18:20:43 -08:00
Edward Yang
0dea76ecda Delete some dead functions from tools.codegen.api.meta (#49041)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49041

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: H-Huang

Differential Revision: D25455886

Pulled By: ezyang

fbshipit-source-id: 5d7834d52f7032820ac2c73358bda77187c17224
2020-12-10 18:16:09 -08:00
Edward Yang
16b8e6ab01 Class-based structured kernels, with migration of add to framework (#48718)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48718

This PR rewrites structured kernels to do the class-based mechanism (instead of defining a meta and impl function, they are methods on a class), and adds enough customizability on the class to support TensorIterator. To show it works, add is made a structured kernel. Don't forget to check https://github.com/pytorch/rfcs/pull/9 for a mostly up-to-date high level description of what's going on here.

High level structure of this PR (the order you should review files):
* TensorMeta.h - TensorMeta is deleted entirely; instead, meta functions will call `set_output` to allocate/resize their outputs. MetaBase gets a new `maybe_get_output` virtual method for retrieving the (possibly non-existent) output tensor in a meta function; this makes it easier to do special promotion behavior, e.g., as in TensorIterator.
* TensorIterator.cpp - Two major changes: first, we add TensorIteratorBase::set_output, which is a "light" version of TensorIterator::set_output; it sets up the internal data structures in TensorIterator, but it doesn't do allocation (that is assumed to have been handled by the structured kernels framework). The control flow here is someone will call the subclassed set_output, which will allocate output, and then we will call the parent class (TensorIteratorBase) to populate the fields in TensorIterator so that other TensorIterator phases can keep track of it. Second, we add some tests for meta tensors, and skip parts of TensorIterator which are not necessary when data is not available.
* tools/codegen/model.py - One new field in native_functions.yaml, structured_inherits. This lets you override the parent class of a structured meta class; normally it's MetaBase, but you can make it point at TensorIteratorBase instead for TensorIterator based kernels
* tools/codegen/gen.py - Now generate all of the classes we promised. It's kind of hairy because this is the first draft. Check the RFC for what the output looks like, and then follow the logic here. There are some complications: I need to continue to generate old style wrapper functions even if an operator is structured, because SparseCPU/SparseCUDA/etc won't actually use structured kernels to start. The most complicated code generation is the instantiation of `set_output`, which by in large replicates the logic in `TensorIterator::set_output`. This will continue to live in codegen for the forseeable future as we would like to specialize this logic per device.
* aten/src/ATen/native/UpSampleNearest1d.cpp - The previous structured kernel is ported to the new format. The changes are very modest.
* aten/src/ATen/native/BinaryOps.cpp - Add is ported to structured.

TODO:
* Work out an appropriate entry point for static runtime, since native:: function stubs no longer are generated
* Refactor TensorIteratorConfig construction into helper functions, like before
* Make Tensor-Scalar addition structured to fix perf regression
* Fix `verify_api_visibility.cpp`
* Refactor tools/codegen/gen.py for clarity
* Figure out why header changes resulted in undefined reference to `at::Tensor::operator[](long) const`

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: bhosmer

Differential Revision: D25278031

Pulled By: ezyang

fbshipit-source-id: 57c43a6e5df21929b68964d485995fbbae4d1f7b
2020-12-09 15:39:12 -08:00
Sebastian Messmer
3ef36dca8e Faithful out arguments (#47712)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47712

This adds a faithful API for ops with out arguments, as described in https://docs.google.com/document/d/1h7nBibRwkRLQ8rsPhfALlwWR0QbkdQm30u4ZBwmaps8/edit# .

After this, an op will generate the following overloads for the C++ API:

```cpp
// Generated from the aten::abs operator (NOT from aten::abs.out)
Tensor at::abs(Tensor& self)

// Generated from the aten::abs.out operator
Tensor& at::abs(Tensor& self, Tensor& out)
Tensor& at::abs_out(Tensor& out, Tensor& self)

```

This is an important step towards making those ops c10-full (it allows VariableType, XLA and other backends to ignore reordering and just call through with the same argument order), but this does not make any of those ops c10-full yet.
It enables the faithful API independent from c10-fullness. That means the API is more consistent with the same API for all ops and making an op c10-full in the future will not trigger future C++ API changes.
ghstack-source-id: 118068091

Test Plan: waitforsandcastle

Reviewed By: ezyang

Differential Revision: D24835252

fbshipit-source-id: dedfabd07140fc8347bbf16ff219aad3b20f2870
2020-12-08 03:48:42 -08:00
Brian Hirsh
ba6511b304 pyi codegen update - remove Declarations.yaml (#48754)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48754

The goal of this PR is to kill Declarations.yaml in the pyi codegen, in favor of native_functions + the existing python object model.

**High-level design**

Since the python signatures used by the `python_arg_parser` are “supposed” to resemble the corresponding pyi type hint signatures, I re-used the existing python object model that Jiakai defined in `tools/codegen/api/python.py`. This means that the pyi codegen now reads `native_functions.yaml`, parses it into a bunch of `PythonSignatureGroup` objects, and emits corresponding method + function variants of type-hint signatures for each one, respectively into `__init__.pyi` and `_VariableFunctions.pyi`.

What makes this uglier is that pyi and the python arg parser have a number of differences in how they’re emitted. I expressed that through a `pyi` flag on the `PythonSignature` dataclass, that tells it whether or not to print itself as a pyi vs. arg_parser signature.

One thing worth noting is how pyi generates signatures differently for native / deprecated op signatures.

For native ops:
- The pyi codegen fuses functional and out variants of each op into a single signature with an optional `out` argument. Ops without an `out` variant just get an ordinary functional signature.
- Some ops that fit certain criteria also get a second “varargs” signature - basically ops with a single positional argument of type List[int].

For deprecated signatures:
- Functional and out variants are not fused - they each get their own signature entry
- There are no varargs signatures

This is currently implemented through the `signature_str()` and `signature_str_vararg()` methods on the `PythonSignature`/`PythonSignatureDeprecated` classes.  `signature_str()` knows how to print itself with/without out arguments, differently for native/deprecated ops. `signature_str_vararg()` optionally returns a vararg variant of the signature if one exists.

**Calling out the gap between python_arg_parser vs. pyi**

The two formats are notably different, so I don’t think we can expect to unify them completely. That said, I encountered a number of differences in the pyi codegen that looked wrong- I tried to call them out in the PR, to be removed later. Just as an example, looking at the `svd` signature in the python_arg_parser vs. the pyi type hint:

python_arg_parser
```
Static PythonArgParser parser({
  “svd(Tensor input, bool some=True, bool compute_uv=True, *, TensorList[3] out=None”,
}, /*traceable=*/true);
```

Pyi
```
def svd(input: Tensor, some: _bool=True, compute_uv: _bool=True, *, out: Optional[Tensor]=None) -> namedtuple_U_S_V: …
```

The two have obvious syntactic differences that we probably don’t plan on changing: the python_arg_parser doesn’t include `def` or return types, and it includes the type hint before the variable name. But the type of `out` in pyi is probably wrong, since `svd` has multiple output params. I tried to clearly call out any instances of the pyi codegen diverging in a way that looks buggy, so we can clean it up in a later PR (see the comments for details).

Another particularly ugly “bug” that I kept in to maintain byte-for-byte compatibility is the fact that the pyi codegen groups operator overloads together. It turns out that the only reason it does this (as far as I can tell) is because is tacks on an out argument to signatures that don’t have one, if ANY overloads of that op have an out variant.

E.g. consider the pyi type hints generated for `nanmedian` in `_VF.pyi`:
```
overload
def nanmedian(input: Tensor, *, out: Optional[Tensor]=None) -> Tensor: ...
overload
def nanmedian(input: Tensor, dim: _int, keepdim: _bool=False, *, out: Optional[Tensor]=None) -> namedtuple_values_indices: ...
overload
def nanmedian(input: Tensor, dim: Union[str, ellipsis, None], keepdim: _bool=False, *, out: Optional[Tensor]=None) -> namedtuple_values_indices: ...
```

And the corresponding native_functions.yaml entries:
```
- func: nanmedian(Tensor self) -> Tensor
- func: nanmedian.dim(Tensor self, int dim, bool keepdim=False) -> (Tensor values, Tensor indices)
- func: nanmedian.dim_values(Tensor self, int dim, bool keepdim=False, *, Tensor(a!) values, Tensor(b!) indices) -> (Tensor(a!) values, Tensor(b!) indices)
- func: nanmedian.names_dim(Tensor self, Dimname dim, bool keepdim=False) -> (Tensor values, Tensor indices)
- func: nanmedian.names_dim_values(Tensor self, Dimname dim, bool keepdim=False, *, Tensor(a!) values, Tensor(b!) indices) -> (Tensor(a!) values, Tensor(b!)
```

Signature 2 corresponds to entries 2 and 3 in native_functions, and Signature 3 corresponds to entries 4 and 5. But signature 1 has an optional out argument, even though entry 1 in native_functions.yaml has no out variant.

I’d like to delete that logic in a later PR- that will also have the added benefit no longer requiring to group overloads together in the pyi codegen. We can just operate independently on each PythonSignatureGroup.

**More detailed accounting of the changes**

Per file:

gen_python_functions.py
- `load_signatures()` can now skip deprecated signatures. Needed because pyi only includes deprecated functions, and skips their method variants (maybe we should add them in…?)
- Moved `namedtuple_fieldnames` into python.cpp
- `group_overloads()` can now opt to not sort the overloads (needed for byte-for-byte compact, pyi doesn’t sort for some reason)

Python.py:
- Gave `PythonSignature`and `PythonSignatureDeprecated` a `pyi` flag that tells it whether or not to print itself in pyi vs. python_arg_parser format
- Added a `PythonReturns` dataclass , which is now a member of PythonSignature. It is only used by pyi. I found this useful because python returns need to know how to deal with named tuple returns properly. I also moved `namedtuple_fieldnames` into this file from gen_python_functions

gen_pyi.py
- Merged `get_py_torch_functions` and `get_py_variable_methods` into a single function, since they’re very similar
- Lifted out all of the pyi type hint type-mapping mess and dropped it into python.py. This required updating the mapping to deal with NativeFunction objects instead of the outputs of Declarations.yaml (this was most of the logic in `type_to_python`, `arg_to_type_hint`, and `generate_type_hints`).  `generate_type_hints` is now a small orchestration function that gathers the different signatures for each PythonSignatureGroup.
- NamedTuples are now generated by calling `PythonReturn.named_tuple()` (in `generate_named_tuples()`), rather than appending to a global list

A lot of hardcoded pyi signatures still live in `gen_pyi.py`. I didn’t look to closely into whether or not any of that can be removed as part of this PR.

Test Plan: Imported from OSS

Reviewed By: ljk53

Differential Revision: D25343802

Pulled By: bdhirsh

fbshipit-source-id: f73e99e1afef934ff41e4aca3dabf34273459a52
2020-12-07 10:39:38 -08:00
Edward Yang
742903c0df Move argument grouping into FunctionSchema (#48195)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48195

The general approach is to change Arguments, splitting `positional`, `kwarg_only` and `out`, into `pre_self_positional`, `self_arg`, `post_self_positional`, and `pre_tensor_options_kwarg_only`, `tensor_options` and `post_tensor_options_kwarg_only`. The splits are as you'd expect: we extract out the self argument and the tensor options arguments, and record the other arguments that came before and after. To do this, we move the logic in `group_arguments` to the parsing process.

Some fuzz in the process:
* I renamed `ThisArgument` to `SelfArgument`, since we don't actually use the terminology "this" outside of C++ (and the model is Python biased)
* I kept the `group_arguments` function, which now just reads out the arguments from the structured model in the correct order. In the long term, we should get rid of this function entirely, but for now I kept it as is to reduce churn.
* I decided to arbitrarily say that when self is missing, everything goes in "post-self", but when tensor options is missing, everything goes in "pre-tensor-options". This was based on where you typically find the argument in question: self is usually at front (so most args are after it), while tensor options are typically at the end (so most args go before it).

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: zhangguanheng66

Differential Revision: D25231166

Pulled By: ezyang

fbshipit-source-id: 25d77ad8319c4ce0bba4ad82e451bf536ef823ad
2020-12-02 07:57:11 -08:00
Edward Yang
ba5686f8c5 Refactor argument fields in FunctionSchema to Arguments (#48182)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48182

I'm planning to add a bunch more argument fields following
https://github.com/pytorch/pytorch/pull/45890#discussion_r503646917 and
it will be a lot more convenient if the arguments get to live
in their own dedicated struct.  Type checker will tell you if
I've done it wrong.  No change to output.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: ljk53

Differential Revision: D25057897

Pulled By: ezyang

fbshipit-source-id: dd377181dad6ab0c894d19d83408b7812775a691
2020-12-02 07:57:06 -08:00
Jiakai Liu
de284b6d35 [pytorch][codegen] add autograd data model (#48249)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48249

Introduced autograd related data models at tools.codegen.api.autograd.

Migrated load_derivatives.py to produce the new data models from derivatives.yaml.
It has clean mypy-strict result.

Changed both gen_autograd_functions.py and gen_variable_type.py to consume
the new data model.

Added type annotations to gen_autograd_functions.py - it has clean mypy-strict
result except for the .gen_autograd import (so haven't added it to the strict
config in this PR).

To limit the scope of the PR, gen_variable_type.py is not refactored, and the
main structure of load_derivatives.py / gen_autograd_functions.py is kept. We
only make necessary changes to make it work.

Confirmed byte-for-byte compatible with the old codegen:

```
Run it before and after this PR:
  .jenkins/pytorch/codegen-test.sh <baseline_output_dir>
  .jenkins/pytorch/codegen-test.sh <test_output_dir>

Then run diff to compare the generated files:
  diff -Naur <baseline_output_dir> <test_output_dir>
```

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D25086561

Pulled By: ljk53

fbshipit-source-id: 1f43ab0931d9814c24683b9a48ca497c5fc3d729
2020-11-19 21:47:05 -08:00
Jiakai Liu
a36e646878 [pytorch][codegen] simplify python signature creation logic (#47977)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47977

Avoid calling CppSignatureGroup api - python signature shouldn't depend on
cpp signature. Still use cpp.group_arguments() to group TensorOptions.

Confirmed byte-for-byte compatible with the old codegen:

```
Run it before and after this PR:
  .jenkins/pytorch/codegen-test.sh <baseline_output_dir>
  .jenkins/pytorch/codegen-test.sh <test_output_dir>

Then run diff to compare the generated files:
  diff -Naur <baseline_output_dir> <test_output_dir>
```

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D24976334

Pulled By: ljk53

fbshipit-source-id: 5df5a7bbfd2b8cb460153e5bea4d91e65f716390
2020-11-18 12:26:50 -08:00
Edward Yang
cdc2d2843b Structured kernel definitions (#45277)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45277

Implements structured kernels as per https://github.com/pytorch/rfcs/pull/9 and ports upsample_nearest1d to use the framework.

The general structure of this diff:

- Define a new syntax for specifying structured kernels in `native_functions.yaml`. You put `structured: True` on the `out` function (that's what you implement) and `structured_delegate: foo.out` on the functional/inplace variants to define them in terms of the `out` function. There's a bunch of new consistency checking to see if you've done this right, though the error messages are of varying quality. This is most of what's going on in tools.codegen.model
- NativeFunctionGroup turns into StructuredNativeFunctions. Previously I thought that maybe we would use this grouping mechanism for both structured and unstructured kernels, but it turned out that Jiakai needed to make his own grouping structure. So now I've specialized it for structured kernels, which also means I get to add a bunch of invariants, like requiring structured kernels to have both a functional and an out variant. This is the lower bundle of changes in tools.codegen.model
- When you make an out kernel structured, this induces us to generate a new meta function signature for you to write shape checking and output allocation code. The signatures of these is defined by `tools.codegen.api.meta` and generated into `MetaFunctions.h`. Coverage here is very bare bones and will be driven by actual operators we port as we go.
- The meaty part of code generation is what we do when we have some grouped StructuredNativeFunctions. We continue to generate a wrapper per function type, but they're are a bit different as the call your meta functions, and make reference to the actual implementations in out.
- Then there's a port of `upsample_nearest1d`; easiest to review by just looking at what the final code looks like.

Missing pieces:

- Stride calculation in TensorMeta
- Sufficient sanity checking for inplace/out variants
- Enough rope to make TensorIterator work

This PR improves instruction counts on `upsample_nearest1d` because it eliminates an extra redispatch. Testing `at::upsample_nearest1d(x, {10});`

* Functional: before 1314105, after 1150705
* Out: before 915705, after 838405

These numbers may be jittered up to +-16400 (which is the difference when I tested against an unaffected operator `at::upsample_linear1d`), though that may also because unrelated changes affected all operators globally.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: D24253555

Test Plan: Imported from OSS

Reviewed By: smessmer

Pulled By: ezyang

fbshipit-source-id: 4ef58dd911991060f13576864c8171f9cc614456
2020-11-17 15:24:43 -08:00
Jiakai Liu
d91cefb0d8 [pytorch][codegen] migrate gen_annotated_fn_args.py to new codegen model (#47745)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47745

This is a relatively small codegen. Reintroduced 'simple_type' to preserve
old codegen output.

It depends on some methods defined in gen_python_functions.py - next PR will
clean up the remaining Declarations.yaml methods in gen_python_functions.py.

Confirmed byte-for-byte compatible with the old codegen:
```
Run it before and after this PR:
  .jenkins/pytorch/codegen-test.sh <baseline_output_dir>
  .jenkins/pytorch/codegen-test.sh <test_output_dir>

Then run diff to compare the generated files:
  diff -Naur <baseline_output_dir> <test_output_dir>
```

Differential Revision: D24885068

Test Plan: Imported from OSS

Reviewed By: ezyang

Pulled By: ljk53

fbshipit-source-id: c0fbd726bcc450c3c7fe232c23e5b31779d0b65f
2020-11-14 02:24:39 -08:00
Jiakai Liu
499d2fad98 [pytorch] factor out return_names api (#47437)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47437

Test Plan: Imported from OSS

Reviewed By: bhosmer

Differential Revision: D24808213

Pulled By: ljk53

fbshipit-source-id: 8ec6d58952fd677ab2d97e63b060cafda052411a
2020-11-09 12:39:37 -08:00
Jiakai Liu
16c72a5a6b [pytorch] continue to rewrite gen_python_functions.py with typed models (#46978)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46978

Refactored and added type annotations to the most part of the file.

Some top-level codegen functions are called by other codegen scripts.
Will migrate them in subsequent PRs.

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D24589210

Pulled By: ljk53

fbshipit-source-id: e0c7e5b3672b41983f321400c2e2330d1462e76e
2020-11-08 01:34:12 -08:00