Commit Graph

344 Commits

Author SHA1 Message Date
Edward Z. Yang
ec91f42d79 Sync up DispatchKey in model with C++ (#81770)
I also dropped some keys that are not useful for codegen.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81770
Approved by: https://github.com/bdhirsh
2022-07-21 21:23:54 +00:00
Peter Bell
5f2e31797a Replace _dtype_default_type_hack (#81479)
Currently any function with a default dtype other than None has to be
manually entered into this function. Instead, this reads the default
directly from `native_functions.yaml`. In order to do this, I also
change `PythonSignatureGroup` to take `tensor_options_args` from the
functional variant since the out variant doesn't actually have tensor
options arguments to take the default values from.

Also note that we need to use `default_init` instead of `default`
because the out argument version doesn't have a `tensor_options`
argument to extract the default value from and so the PythonSignature
objects wouldn't match.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81479
Approved by: https://github.com/albanD
2022-07-21 16:42:49 +00:00
Brian Hirsh
ed0091f8db fix view_copy kernel striding check logic (#81553)
The composite kernel for `view_copy` that we generate is special-cased a bit for efficiency to avoid having to do extra clones in some cases.

That logic was slightly wrong though, and is fixed here (it needs to mirror the logic in `reshape()`).

It manifested as a debug assert firing for Lazy Tensor, which I confirmed no longer fires when running this script:
```
# ran with "python test_ltc_only_torch.py --device=lazy --sync=1 --nvtx=1"
import torch

import torch._lazy
from torch._lazy.ts_backend import init as init_ts_backend
init_ts_backend()
torch.manual_seed(42)
from transformers import BertForSequenceClassification

def parse_args():
  import argparse
  parser = argparse.ArgumentParser(description='')
  parser.add_argument('--device', type=str, default='cuda')
  parser.add_argument('--sync', type=bool, default=False)
  parser.add_argument('--nvtx', type=bool, default=False)
  return parser.parse_args()

args = parse_args()

device = args.device
model = BertForSequenceClassification.from_pretrained('bert-base-uncased', return_dict=True)

from transformers import AdamW
from transformers import BertTokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
text_batch = ["I love Pixar.", "I don't care for Pixar."]
encoding = tokenizer(text_batch, return_tensors='pt', padding=True, truncation=True)
input_ids = encoding['input_ids'].to(device)
attention_mask = encoding['attention_mask'].to(device)

model = model.to(device)
model.train()

no_decay = ['bias', 'LayerNorm.weight']
optimizer_grouped_parameters = [
    {'params': [p for n, p in model.named_parameters() if not any(nd in n for nd in no_decay)], 'weight_decay': 0.01},
    {'params': [p for n, p in model.named_parameters() if any(nd in n for nd in no_decay)], 'weight_decay': 0.0}
]
optimizer = AdamW(optimizer_grouped_parameters, lr=1e-5)
labels = torch.tensor([1,0]).unsqueeze(0).to(device)
for _ in range(6):
  torch.cuda.nvtx.range_push(f'Iter{_}')

  torch.cuda.nvtx.range_push('F')
  outputs = model(input_ids, attention_mask=attention_mask, labels=labels)
  if args.sync:
    torch._lazy.mark_step()
    torch._lazy.wait_device_ops()
  torch.cuda.nvtx.range_pop()

  loss = outputs.loss

  torch.cuda.nvtx.range_push('B')
  optimizer.zero_grad()
  loss.backward()
  if args.sync:
    torch._lazy.mark_step()
    torch._lazy.wait_device_ops()
  torch.cuda.nvtx.range_pop()

  torch.cuda.nvtx.range_push('O')
  optimizer.step()
  if args.sync:
    torch._lazy.mark_step()
    torch._lazy.wait_device_ops()
  torch.cuda.nvtx.range_pop()

  torch.cuda.nvtx.range_pop()
torch._lazy.mark_step()
torch._lazy.wait_device_ops()
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/81553
Approved by: https://github.com/ezyang
2022-07-19 13:47:05 +00:00
Mengwei Liu
9f873ed7c8 [torchgen] support codegen'd C++ API for a mixture of namespaces (#81581)
Summary:
In #77710 I introduces some hack to allow static dispatch to take namespaces. After we introduced namespace into ops and kernels, we don't have to pass namespace into `static_dispatch()`; instead we will generate ops with the kernel namespace for `Functions.h`. After this diff:

If we have a yaml file looking like this:
```
- func: op_1(Tensor(a) self) -> Tensor(a)
  dispatch:
    CPU: at::op_1_kernel # ATen kernel

- func: op_2(Tensor(a) self) -> Tensor(a)
  dispatch:
    CPU: custom::op_2_kernel # custom kernel
```
`Functions.h` will contain the following C++ APIs:
```
TORCH_API inline at::Tensor & op_1(at::Tensor & self) {
  return at::cpu::op_1_kernel(self);
}

TORCH_API inline at::Tensor & op_2(at::Tensor & self) {
  return custom::cpu::op_2_kernel(self);
}
```

Test Plan: Rely on CI

Differential Revision: D37900753

Pull Request resolved: https://github.com/pytorch/pytorch/pull/81581
Approved by: https://github.com/iseeyuan
2022-07-19 07:46:36 +00:00
Sergii Dymchenko
19a296486b Remove unused local variables from generator.py (#81505)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81505
Approved by: https://github.com/seemethere
2022-07-18 16:26:59 +00:00
Huy Do
a4647cc1fa Apply ufmt linter to all py files under torchgen (#81570)
Previous batches:
* https://github.com/pytorch/pytorch/pull/81285
* https://github.com/pytorch/pytorch/pull/81335

We have multiple batches here to minimize merge conflicts and reviewing process. Once everything has been formatted by ufmt (black+usort), the current black linter will be removed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/81570
Approved by: https://github.com/ezyang
2022-07-16 03:52:25 +00:00
Edward Z. Yang
fca03eeec1 Make proxy tensor support item() calls on torch.tensor constants (#81192)
This PR is doing a few interrelated things, all of which are necessary to get correctness. Read the comment in torch/fx/experimental/proxy_tensor.py for the high level overview.

Let's break down the parts of this PR:

* Bug fix where `enable_torch_dispatch_mode` with `None` doesn't work. This make `enable_torch_dispatch_mode(current_mode.inner)` work which is the basis for how we temporarily disable fake tensor mode.
* Bug fix for when fake tensor mode is combined with a non-mode tensor subclass. This actually could be ablated from this PR but it affects where the logic for allowing non fake tensor inputs with lift goes, so it's all in here in one go. There are some relevant tests for the fix in fake tensor, but it turns out I didn't need this because I'm always using proxy tensors as a mode (which ensures the ordering is right.)
* New `lift_fresh` view operator.  Note that like lift, we have to manually write the functionalize kernel for these functions.
* The actual change, which is to save constants when we see them in the proxy tensor mode, and then propagate them as we go (because otherwise you'll handle mutations on constants incorrectly--see test.)

This is mildly BC-breaking if anyone was previously interposing on
at::lift, but this operator was relatively new and I checked
functorch which has no explicit reference to lift.  So I think it
should not be too disruptive.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81192
Approved by: https://github.com/samdow, https://github.com/bdhirsh
2022-07-15 03:53:40 +00:00
Sergii Dymchenko
3dea7fe6f3 Remove unused local variables from gen.py (#81508)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81508
Approved by: https://github.com/huydhn
2022-07-15 01:26:32 +00:00
Tim Gates
3a87b47de9 docs: Fix a few typos (#81435)
There are small typos in:
- caffe2/python/recurrent.py
- test/distributed/test_c10d_nccl.py
- test/test_fx.py
- torch/csrc/jit/runtime/autodiff.cpp
- torchgen/gen.py

Fixes:
- Should read `propagation` rather than `propogation`.
- Should read `multiplied` rather than `multuplied`.
- Should read `eliminate` rather than `elminate`.
- Should read `dispatcher` rather than `disaptcher`.

Semi-automated pull request generated by
https://github.com/timgates42/meticulous/blob/master/docs/NOTE.md
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81435
Approved by: https://github.com/ngimel
2022-07-14 04:20:26 +00:00
Huy Do
8f07b7a069 Fix circular import error in torchgen (#81355)
This also formats `tools/pyi/gen_pyi.py` with `usort` to test the fix because that is how the bug was discovered. The
usort-formatted `gen_pyi.py` should work now without any issues

Fixes #81294

Pull Request resolved: https://github.com/pytorch/pytorch/pull/81355
Approved by: https://github.com/ezyang
2022-07-13 03:16:38 +00:00
Mengwei Liu
80f6d2e9e6 [torchgen] Extract out schema registration logic into a function (#80780)
Summary:
A followup to  #78015 and #79733. In those PRs I introduced custom namespace support into:
* `Register<DispatchKey>.cpp`
* `RegisterSchema.cpp`
* `NativeFunctions.h`

This PR extracts out logic that generates schema registration code (used in `RegisterSchema.cpp`) into a function so that it can be easily tested and reused. Added unit test to cover the logic as well.

Test Plan: Rely on newly added unit tests.

Differential Revision: D37581186

Pull Request resolved: https://github.com/pytorch/pytorch/pull/80780
Approved by: https://github.com/iseeyuan
2022-07-12 21:52:42 +00:00
Brian Hirsh
f84b30f790 fix functionalization regression introduced by ProxyTorchDispatchMode, migrate testing to make_fx (#80416)
`ProxyTorchDispatchMode` was added recently as part of `make_fx`, which was secretly causing the meta tensor calls used inside of functionalization to get baked into the graph. It also wasn't caught because the functionalization tests in core don't use `make_fx`, and the tests in functorch aren't as comprehensive.

Now that `make_fx` is in core, I also ported the functionalization test infra over to use it, which would have caught the regression. This also makes the tests cleaner, since mode-based tracing lets us pick up factory functions in the trace output.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80416
Approved by: https://github.com/ezyang, https://github.com/albanD
2022-07-12 01:46:16 +00:00
Mengwei Liu
5c8a9803c8 [torchgen] Support multiple namespace in NativeFunctions.h (#79733)
Summary:
This is a follow up to #78015. This PR
* introduces namespace logic for generating `NativeFunctions.h`.
* adds helper function to extract namespace from string
* relaxes the constraint on the levels we support for custom kernel namespace to 2

Test Plan:
Yaml entry:
```
- func: unsqueeze.out(Tensor(a) self, int dim, *, Tensor(a!) out) -> Tensor(a!)
  variants: function
  device_check: NoCheck
  dispatch:
    CPU: custom_1::custom_2::unsqueeze
```

Generated `NativeFunctions.h`:

```
namespace custom_1 {
namespace custom_2 {
namespace native {
    TORCH_API at::Tensor & unsqueeze(const at::Tensor & self, int64_t dim, at::Tensor & out);
} // namespace native
} // namespace custom_2
} // namespace custom_1

```

Differential Revision: D37198111

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79733
Approved by: https://github.com/bdhirsh
2022-07-08 21:56:52 +00:00
Brian Hirsh
960758b0b7 fix overload ambiguity with functional ops; fix _foreach op grouping (#80556)
This should fix the last issue that @anijain2305 hit when running ResNet with TorchDynamo <> functionalization.

Today if you try to call an `OpOverloadPacket` from python with some arguments, we will use the types of those arguments to perform overload resolution. With some functional variants of ops, this can be ambiguous.

Today this affects just one op: `_fused_moving_avg_obs_fq_helper`, although it would potentially affect e.g. `native_batch_norm` in the future.

Example:
```
# There are technically two overloads:
# torch.ops.aten._fused_moving_avg_obs_fq_helper.default (returns 2 argument, mutates 4 of its inputs inplace)
# torch.ops.aten._fused_moving_avg_obs_fq_helper.functional (returns 6 argument, mutates none of its inputs)

# We pick the wrong one - no way to know that we should pick the functional one, just from the call site.
outs = torch.ops.aten._fused_moving_avg_obs_fq_helper(a, a, a, a, a, a, a, 1.0, 0, 1, 0)
# raises an error - tries to call the overload with only 2 returns
return _fused_moving_avg_obs_fq_helper_functional[5]
```

Specifically, functionalization will bake `_fused_moving_avg_obs_fq_helper.functional` into the graph, but when AOTAutograd tries to compile with TorchScript, it needs to remove the overload name (TS doesn't know how to parse overload names directly, so we need to remove the overload name and let it infer the right overload at runtime later- so it picks the wrong one).

The situation is pretty similar to inplace; `ops.aten.add` and `ops.aten.add_` represent two different `OverloadPacket` objects; they can't be overloads of the same op, because their schemas would be ambiguous - the alias annotations are different, but that isn't enough to disambiguate).

In this PR, I try to fix the situation in a pretty similar way to how we handle `inplace` in the data model: `inplace` ops get their own base operator name, but they are represented as a flag inside of `BaseOperatorName` in the data model.

Two other important changes that I made as part of this PR:

(1) Originally, there were ~100 different `*_functional` operators: e.g. we had operators named `resize.functional` and `zero.functional`. The `_functional` bit isn't actually necessary in most cases: it's only necessary for operators that **also** have a `SchemaKind.mutable` variant, where `_fused_moving_avg_obs_fq_helper` is the only op that fits that description today. So I removed the unnecessary notion of "functional" from those other ops. I also added a bunch of assertions to force this restriction.

I think that makes more sense in the long run, because it eliminates an unnecessary difference in the model. E.g. we don't have `add_.Tensor` and `add.Tensor_functional`. We just have `add_.Tensor` and `add.Tensor`.

(2) I noticed that we actually still weren't pairing up a bunch of `_foreach` operators correctly, because their input arguments were different (`self` vs. `tensors`). Since they're private API's, I went ahead and changed the argument names directly so they get matched up. Before this PR, we were generating a separate `_foreach_add` and `_foreach_add.functional` variant in a bunch of cases, that really did the same thing (but happened to have a different name for the first argument).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/80556
Approved by: https://github.com/ezyang, https://github.com/albanD
2022-07-06 12:45:11 +00:00
Edward Z. Yang
805120ab57 See if we can elide TORCH_API from inline functions. (#80609)
See https://github.com/pytorch/pytorch/issues/80604

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80609
Approved by: https://github.com/malfet
2022-06-30 23:31:38 +00:00
Brian Hirsh
c2d395cf8e functionalization <> LTC integration (take 3) (#80251)
new PR for https://github.com/pytorch/pytorch/pull/75527.

It looks like there's a bug in the windows CI scripts that was causing
flaky failures, that disappear when I create a new PR. example failure:
https://github.com/pytorch/pytorch/runs/6999272635?check_suite_focus=true
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80251
Approved by: https://github.com/wconstab
2022-06-26 23:10:21 +00:00
Nikita Shulga
f11cce309b [MPS] Add equal operator (#80195)
Which is, in essence is composite of `eq`->`all`->`item`
`native/mps/operators/Equal.cpp` is an almost verbatim copy of `native/cuda/Equal.cpp`

Fix codegen by generating MPSFunctions headers

Pull Request resolved: https://github.com/pytorch/pytorch/pull/80195
Approved by: https://github.com/albanD
2022-06-25 12:40:52 +00:00
Peter Bell
2c43876f64 AT_DISPATCH: Expose switch-case like macro syntax (#79978)
This expands the `AT_DISPATCH` macros to enable writing your own
`AT_DISPATCH_SWITCH` statements with multiple `AT_DISPATCH_CASE`
labels. So, where previously you may have written:

```cpp
if (iter.common_dtype() == kBool) {
  my_bool_kernel(iter);
} else {
  AT_DISPATCH_INTEGRAL_TYPES(iter.common_dtype(), "my_kernel", [&] {
    ...
  });
}
```

You can now instead write

```cpp
AT_DISPATCH_SWITCH(iter.common_dtype(), "my_kernel",
  AT_DISPATCH_CASE(kBool, [&] { my_kernel_bool(iter); })
  AT_DISPATCH_CASE_INTEGRAL_TYPES([&] { ... })
);
```

The macro syntax is a bit ugly, however the benefits are:
- Greater flexibility, as the kernel code doesn't have to be shared
  for all dtypes.
- Selective build and RECORD_KERNEL_FUNCTION work even for single
  dtype specializations such as the bool case in the example.
- The compiler sees a single switch for all types, which should be
  easier to optimize into a jump table.
- We also now get errors if the same scalar type is handled twice.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79978
Approved by: https://github.com/ezyang
2022-06-25 04:06:56 +00:00
Henry Tu
fc6b645fe2 Prevent out of bounds access to null LTC operands (#80060)
When constructing a lazy::Node, [null operands (optional values that aren't included) are dropped](30fb2c4aba/torch/csrc/lazy/core/ir.cpp (L82-L84)), so it’s possible for the stored operand list to be a different length than the one that was passed into the constructor.

This can become a problem during the call to `CanBeReused` in the autogen `LazyIr.h` code. For example:

```
  bool CanBeReused(const torch::lazy::Value& input, const c10::optional<torch::lazy::Value>& weight, const c10::optional<torch::lazy::Value>& bias, const c10::optional<torch::lazy::Value>& running_mean, const c10::optional<torch::lazy::Value>& running_var, const bool& training, const double& momentum, const double& eps) const {
    size_t i = 0;
    std::cout << "Num operands: " << operands().size() << std::endl;
    return (operand(i++) == input &&
        operand(i++) == weight.value_or(kNullValue) &&
        operand(i++) == bias.value_or(kNullValue) &&
        operand(i++) == running_mean.value_or(kNullValue) &&
        operand(i++) == running_var.value_or(kNullValue) &&
        this->training == training &&
        this->momentum == momentum &&
        this->eps == eps);
  }
```

Here we operate under the assumption that the number of operands stored in the `lazy::Node` is equal to the number of operands originally passed into the constructor. Recall that we drop any null operands though, so it’s possible to inadvertently access an invalid index at this point.

This PR addresses this issue by adding a new nullable_operand method which falls back to a null value instead of producing an index error when going out of bounds.

This should solve the issue found at https://github.com/pytorch/pytorch/pull/79637#issuecomment-1162044545

cc: @antoniojkim @ke1337 @wconstab @desertfire
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80060
Approved by: https://github.com/desertfire
2022-06-24 20:39:37 +00:00
Max Podkorytov
bf75708ce4 [static-runtime] add nnc codegen for aten::div (#76903)
Differential Revision: D36151087

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76903
Approved by: https://github.com/mikeiovine
2022-06-22 05:47:44 +00:00
Nikolay Korovaiko
efc7343743 Revert "Revert "Put symint overloads on a different name"" (#79680)
This relands https://github.com/pytorch/pytorch/pull/79281

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79680
Approved by: https://github.com/malfet
2022-06-21 07:06:33 +00:00
Shunting Zhang
0d909d3cff add a new FunctionSchema kind called scratch
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79659

The scratch op expose intermetidate/scratch tensors used in kernel implementation
as kernel input arguments so a memory planning algorithm can plan memory for those
tensors.

Differential Revision: [D37194429](https://our.internmc.facebook.com/intern/diff/D37194429/)

Approved by: https://github.com/bdhirsh
2022-06-20 16:34:16 +00:00
Brian Hirsh
adf8060600 add a new alias key for functional to view op decompositions
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79615

Approved by: https://github.com/zou3519
2022-06-15 23:18:09 +00:00
PyTorch MergeBot
b9bb52d97b Revert "Put symint overloads on a different name"
This reverts commit 213a8fc992.

Reverted https://github.com/pytorch/pytorch/pull/79281 on behalf of https://github.com/bigfootjon due to Diff reverted internally
2022-06-15 17:15:21 +00:00
Edward Z. Yang
213a8fc992 Put symint overloads on a different name
Due to implicit conversion shenanigans, having both IntArrayRef
and SymIntArrayRef overloads makes {} ambiguous.  While we could
fix this by making a single unified type that accepts all the overloads
we want, an easier fix was to just push the SymIntArrayRef overload
to its own name.

Signed-off-by: Edward Z. Yang <ezyangfb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79281

Approved by: https://github.com/suo
2022-06-12 14:36:39 +00:00
anjali411
38350acf8f Autogen Tags enum, and allow specifying tags while defining an op
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79322

Approved by: https://github.com/albanD
2022-06-11 00:29:32 +00:00
Mengwei Liu
24050a5801 [RFC][Codegen] Add custom namespace support (#78015)
Summary:
Adding a feature to allow user to specify namespaces for operator and kernels.

# Feature
There's a feature request to allow DSL to:
1. take in an operator namespace other than `aten`.
2. take in a kernel that is in a different namespace than `at::native`.

For both features, we only allow user to have a single layer of namespace for the sake of simplicity. If user specify `custom::function` as kernel, the codegen will depend on `custom::native::function` where `native` is hardcoded.

# Proposal

For feature 1, add a `namespace` attribute to data class `NativeFunction`. The namespace will be extract out by matching pattern "::" on the `func` variable. For `NativeFunctionsGroup` there's an assumption that all variants (function, inplace, out) will have the same namespace. By default (if not specified) the namespace will be "aten".

For feature 2, add a `namespace` attribute to `BackendMetadata` class, similarly match pattern "::" on the kernel field. Remove the `cpp_namespace` field from `register_dispatch_key` data class. By default (if not specified) the namespace for a kernel would be "at::native".

Test Plan:
Example yaml entries:
```
- func: custom::gelu.out(Tensor self, *, str approximate='none', Tensor(a!) out) -> Tensor(a!)
  structured: True
  structured_inherits: TensorIteratorBase
  device_check: NoCheck   # TensorIterator
  python_module: nn
  dispatch:
    CPU: custom::gelu_out_cpu
    CUDA: custom::gelu_out_cuda
    MPS: custom::gelu_out_mps

- func: custom::gelu_(Tensor(a!) self, *, str approximate='none') -> Tensor(a!)
  structured_delegate: gelu.out
  device_check: NoCheck   # TensorIterator
  python_module: nn
  dispatch:
    NestedTensorCPU, NestedTensorCUDA: custom::NestedTensor_gelu_

- func: custom::gelu(Tensor self, *, str approximate='none') -> Tensor
  structured_delegate: gelu.out
  device_check: NoCheck   # TensorIterator
  python_module: nn
  dispatch:
    MkldnnCPU: custom::mkldnn_gelu
    QuantizedCPU: custom::gelu_quantized_cpu
    NestedTensorCPU, NestedTensorCUDA: custom::NestedTensor_gelu
```

see generated code:

`RegisterCPU.cpp`:
```
TORCH_LIBRARY_IMPL(aten, CPU, m) {
  ...
}
TORCH_LIBRARY_IMPL(custom, CPU, m) {
    m.impl("gelu", TORCH_FN(wrapper_gelu));
    m.impl("gelu.out", TORCH_FN(wrapper_gelu_out_out));
    m.impl("gelu_", TORCH_FN(wrapper_gelu_));
};
```
```
struct structured_gelu_out_cpu_inplace final : public custom::native::structured_gelu_out_cpu {
    structured_gelu_out_cpu_inplace(Tensor& self) : outputs_{std::ref(self)} {}

    void set_output_strided(
        int64_t output_idx, IntArrayRef sizes, IntArrayRef strides,
        TensorOptions options, DimnameList names
    ) override {

        const auto& out = outputs_[output_idx].get();
        check_inplace(out, sizes, options);

        auto maybe_proxy = maybe_create_proxy(out, sizes, strides, options);
        if (C10_UNLIKELY(maybe_proxy.has_value())) {
            proxy_outputs_[output_idx] = c10::ExclusivelyOwned<Tensor>(std::move(maybe_proxy).value());
        }

        if (!names.empty()) {
          namedinference::propagate_names(outputs_[output_idx], names);
        }
        // super must happen after, so that downstream can use maybe_get_output
        // to retrieve the output
        custom::native::structured_gelu_out_cpu::set_output_raw_strided(output_idx, sizes, strides, options, names);
    }

    void set_output_raw_strided(
        int64_t output_idx, IntArrayRef sizes, IntArrayRef strides,
        TensorOptions options, DimnameList names
    ) override {

        const auto& out = outputs_[output_idx].get();
        check_inplace(out, sizes, options);

        if (!names.empty()) {
          namedinference::propagate_names(outputs_[output_idx], names);
        }
        // super must happen after, so that downstream can use maybe_get_output
        // to retrieve the output
        custom::native::structured_gelu_out_cpu::set_output_raw_strided(output_idx, sizes, strides, options, names);
    }

    const Tensor& maybe_get_output(int64_t output_idx) override {
      return proxy_outputs_[output_idx].has_value() ? **proxy_outputs_[output_idx] : outputs_[output_idx].get();

    }
    std::array<std::reference_wrapper<Tensor>, 1> outputs_;
    std::array<c10::optional<c10::ExclusivelyOwned<Tensor>>, 1> proxy_outputs_;
};
```

`RegisterSchema.cpp`
```
TORCH_LIBRARY(aten, m) {
  ...
}
TORCH_LIBRARY(custom, m) {
    m.def("gelu.out(Tensor self, *, str approximate='none', Tensor(a!) out) -> Tensor(a!)");

    m.def("gelu_(Tensor(a!) self, *, str approximate='none') -> Tensor(a!)");

    m.def("gelu(Tensor self, *, str approximate='none') -> Tensor");
};
```

Differential Revision: D36558459

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78015
Approved by: https://github.com/bdhirsh
2022-06-10 21:04:36 +00:00
Brian Hirsh
7b3a0ff87a Port index.Tensor to structured kernels.
Tracking issue: #55070

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69607

Approved by: https://github.com/bdhirsh
2022-06-10 17:27:47 +00:00
George Qi
a90f006fe5 add strides to slow path
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78610

Approved by: https://github.com/ezyang
2022-06-10 16:59:14 +00:00
PyTorch MergeBot
4b82ef7928 Revert "Port index.Tensor to structured kernels."
This reverts commit cfd84125bd.

Reverted https://github.com/pytorch/pytorch/pull/69607 on behalf of https://github.com/zengk95 due to This is breaking mac trunk tests cfd84125bd
2022-06-08 20:16:10 +00:00
Brian Hirsh
cfd84125bd Port index.Tensor to structured kernels.
Tracking issue: #55070

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69607

Approved by: https://github.com/bdhirsh
2022-06-08 18:17:52 +00:00
Richard Zou
9da5defff6 Package config/template files with torchgen (#78942)
Package config/template files with torchgen

This PR packages native_functions.yaml, tags.yaml and ATen/templates
with torchgen.

This PR:
- adds a step to setup.py to copy the relevant files over into torchgen
- adds a docstring for torchgen (so `import torchgen; help(torchgen)`
says something)
- adds a helper function in torchgen so you can get the torchgen root
directory (and figure out where the packaged files are)
- changes some scripts to explicitly pass the location of torchgen,
which will be helpful for the first item in the Future section.

Future
======

- torchgen, when invoked from the command line, should use sources
in torchgen/packaged instead of aten/src. I'm unable to do this because
people (aka PyTorch CI) invokes `python -m torchgen.gen` without
installing torchgen.
- the source of truth for all of these files should be in torchgen.
This is a bit annoying to execute on due to potential merge conflicts
and dealing with merge systems
- CI and testing. The way things are set up right now is really fragile,
we should have a CI job for torchgen.

Test Plan
=========
I ran the following locally:

```
python -m torchgen.gen -s torchgen/packaged
```
and verified that it outputted files.

Furthermore, I did a setup.py install and checked that the files are
actually being packaged with torchgen.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78942
Approved by: https://github.com/ezyang
2022-06-07 13:33:55 +00:00
Sergii Dymchenko
0fdc1caf02 Cleanup some Python2-related code (#78864)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78864
Approved by: https://github.com/janeyx99, https://github.com/jbschlosser
2022-06-06 17:40:02 +00:00
Brian Hirsh
67b27a7bae generate kernels for codegend out= operators
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78626

Approved by: https://github.com/ezyang, https://github.com/JacobSzwejbka, https://github.com/larryliu0820
2022-06-06 15:36:28 +00:00
PyTorch MergeBot
bcb424c8cf Fix #78675
Signed-off-by: Edward Z. Yang <ezyangfb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78699

Approved by: https://github.com/tugsbayasgalan
2022-06-04 01:07:24 +00:00
Linbin Yu
1683a2618d rename BUILD.buck to BUCK.oss (#78792)
rename BUILD.buck to BUCK.oss to better reflect that it's the OSS version of BUCK build, not the one shared with Bazel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78792
Approved by: https://github.com/kit1980
2022-06-03 07:23:16 +00:00
PyTorch MergeBot
954522a485 Revert "Autogen Tags enum, and allow specifying tags while defining an op"
This reverts commit 9476a78f37.

Reverted https://github.com/pytorch/pytorch/pull/77313 on behalf of https://github.com/malfet due to Broke OSS buck builds, see 9476a78f37
2022-06-03 01:53:53 +00:00
anjali411
9476a78f37 Autogen Tags enum, and allow specifying tags while defining an op
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77313

Approved by: https://github.com/ezyang, https://github.com/albanD
2022-06-03 01:13:44 +00:00
PyTorch MergeBot
fca1f495c2 Revert "Port index.Tensor to structured kernels."
This reverts commit 9fe6f1baf5.

Reverted https://github.com/pytorch/pytorch/pull/69607 on behalf of https://github.com/suo due to this broke master, see: 9fe6f1baf5
2022-06-01 00:12:15 +00:00
Brian Hirsh
9fe6f1baf5 Port index.Tensor to structured kernels.
Tracking issue: #55070

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69607

Approved by: https://github.com/bdhirsh
2022-05-31 22:15:20 +00:00
Antonio Kim
fe67dff82a Deprecate TSNodeLoweringInterface (#78273)
Fixes #78206

Deprecate `TSNodeLoweringInterface` and refactor lower functions into IR nodes.

CC: @wconstab @desertfire
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78273
Approved by: https://github.com/wconstab
2022-05-31 18:09:12 +00:00
Brian Hirsh
92229adf0c add special handling for resize_() in functionalization pass
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77714

Approved by: https://github.com/ezyang
2022-05-26 16:15:44 +00:00
Bin Bao
29189d2ba8 [LT] Add IR resuing support for manually-implemented ops
Summary: Add CanBeReused methods for manually-implemented ops and replace MakeNode with
ReuseOrMakeNode.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77616

Approved by: https://github.com/JackCaoG, https://github.com/wconstab
2022-05-26 16:04:47 +00:00
Hui Guo
1803a592f4 [static_runtime] Add script to auto-generate view ops (#77105)
Summary:
Add script to go through view ops in "native_functions.yaml" and auto-register them into static runtime and auto-generate op unit tests for each.

Overall there are 96 grouped view ops, among which 21 is already registered by hand; 9 (including sparse ops/training related ops etc.) are not the target of static runtime; 30 has list args or list ret; and 7 has non-basic types such as "Dimname", "MemoryFormat", etc. In summary, this script auto-generate 29 view ops for now.

Run `buck run //caffe2/torch/fb/jit:gen_static_runtime_ops` to generate static runtime ops, and the results with this script are,

```
total grouped native ops: 1582
grouped native ops with out variant: 548
generated functions groups with out variant: 241

view grouped native ops: 96
generated functions view groups: 29

overall generated : 270
```

The generated view ops are added in D36258968

Test Plan:
Generate static runtime ops: `buck run //caffe2/torch/fb/jit:gen_static_runtime_ops`

Unit tests: `buck run mode/opt //caffe2/benchmarks/static_runtime:static_runtime_cpptest`

Differential Revision: D36258767

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77105
Approved by: https://github.com/mikeiovine
2022-05-26 03:12:22 +00:00
Antonio Kim
02c4d877b4 Codegen Non-Native IR Nodes (#76535)
Add codegen infrastructure to generate IR nodes for non-native ops.

The proposed change is to add a `non_native` key to the `{backend}_native_functions.yaml` file that contains schema definitions similar to what is found in `native_functions.yaml`. e.g.
```
non_native:
    ...
    - func: expand(Tensor input, int[] size, bool is_scalar_expand) -> Tensor
    ...
```
these definitions are parsed into a `LazyIrSchema` that can be used for generating IR nodes using `GenLazyIR`.

Fixes #74628

CC: @wconstab @desertfire @henrytwo

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76535
Approved by: https://github.com/wconstab
2022-05-24 19:29:23 +00:00
Brian Hirsh
7ddc1425ff functionalization fix for inplace comparison ops
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77125

Approved by: https://github.com/ezyang
2022-05-24 18:20:31 +00:00
Brian Hirsh
22d566acda functionalization fix for inplace_view ops
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77126

Approved by: https://github.com/ezyang
2022-05-24 18:20:30 +00:00
Mengwei Liu
9e806619cc [Codegen] Remove view operator check in NativeFunctionGroups and allow skipping native function generation (#78145)
Summary:
This PR adds two features:
* A boolean to turn off native function generation in codegen
* Relaxing `view` operator check for `NativeFunctionGroups`

Differential Revision: D36604646

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78145
Approved by: https://github.com/iseeyuan, https://github.com/bdhirsh
2022-05-24 05:48:30 +00:00
Mengwei Liu
ffa3cce100 [Codegen] Expose namespace argument for static dispatch (#77710)
For static dispatch we are hardcoding namespace to be `at` for backend-specific C++ functions, e.g., `at::cpu::add()`. We are extending it to accept namespaces from callsite. This is a temporary solution, in the long run we want to introduce custom namespace into codegen system, e.g., we should be able to add `at::` to `native_functions.yaml` and parse it into `NativeFunction`. This needs a bit more design.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77710
Approved by: https://github.com/ezyang
2022-05-21 00:39:06 +00:00
John Clow
417373337f Put imports in correct order so clang-format doesn't get mad every time
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77282

Approved by: https://github.com/Krovatkin
2022-05-20 18:39:47 +00:00
Brian Hirsh
0161e9eb00 [test] attempt to functionalize ops with mutable positional-only args
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76320

Approved by: https://github.com/ezyang
2022-05-19 18:50:34 +00:00
Edward Z. Yang
befa4e371e Fix typo
Fixes #77412

Signed-off-by: Edward Z. Yang <ezyangfb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77488

Approved by: https://github.com/mruberry
2022-05-18 18:25:54 +00:00
Antonio Kim
55be35ae39 Fix 'Code below assumes there is at least one tensor arg' assumption (#76917)
Previously when codegening ops like `zeros_` or `ones_` we'd hit a `Code below assumes there is at least one tensor arg error`. This check is not entirely correct which is what is causing the error to be thrown. There are ops like the ones mentioned that pass in a `device` parameter that can be used in place of the "first tensor".

CC: @wconstab @desertfire @henrytwo @ke1337
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76917
Approved by: https://github.com/desertfire
2022-05-18 17:58:47 +00:00
John Clow
2a99018147 Adding a way to register both upper and lower bound functions
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77388

Approved by: https://github.com/eellison
2022-05-18 17:34:07 +00:00
Brian Hirsh
edc904d6ba add native view_copy.out ops, teach codegen about tensorlist out=
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76126

Approved by: https://github.com/ezyang
2022-05-18 14:23:43 +00:00
Yukio Siraichi
9d44250760 Reduce structured kernels' set_output boilerplate with new overloads.
Partially fix #69813
This PR does mainly 3 things:

1. Introduces new methods for the `MetaBase` API:
    - `set_output_strided`: creates proxy tensors with exact strides, if strides don't match
    - `set_output_contiguous`: alias for `set_output_strided` with contiguous strides
    - `set_output_raw_strided`: does not create proxy tensors

2. Modifies codegen for handling proxy tensors:
    - Creates a new field for out-of-place kernels: `proxy_output_`
    - Implements `set_output_strided` by creating a proxy tensor if necessary
    - Passes the proxy tensor to them `IMPL` function
    - Copy the result back to the real output, in the end, whenever a proxy was created

3. Replace `set_output` by `set_output_raw_strided` for `TensorIterator*`
    - Needed, since it overrides `set_output`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76096

Approved by: https://github.com/ezyang
2022-05-17 12:01:53 +00:00
Linbin Yu
1f8049566f Re-land BUCK build for pytorch mobile (#77612)
see https://github.com/pytorch/pytorch/pull/76480
fixed most lint errors
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77612
Approved by: https://github.com/kit1980
2022-05-17 00:30:13 +00:00
Bin Bao
25c6ebd12c Revert "Revert "[LT] Codegen ReuseNode for supported ops""
Summary: Fixed a XLC build failure by generating an always-return-false
default CanBeReused method.

This reverts commit 3cade9d454.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77513

Approved by: https://github.com/alanwaketan
2022-05-16 20:14:42 +00:00
PyTorch MergeBot
530481ed69 Revert "[mobile] add buck build for mobile targets (#76480)"
This reverts commit 168dc70faf.

Reverted https://github.com/pytorch/pytorch/pull/76480 on behalf of https://github.com/atalman
2022-05-16 16:14:17 +00:00
francescocastelli
dca416b578 Pretty-print dataclasses (#76810)
Unfortunately the built-in pprint module support pretty-print of dataclasses only from python 3.10. The code that I wrote in method `__str__` of OpInfo should do the same job and should also work for any dataclass. For now I've put it there but we can create a function and put it somewhere where is accessible also for other dataclasses. Also the max width (80) is now hardcode but it would ideally be the parameter of the function.

when you call print on an OpInfo you get:
```
OpInfo(name = '__getitem__',
       ref = None,
       aliases = (),
       variant_test_name = '',
       op = <slot wrapper '__getitem__' of 'torch._C._TensorBase' objects>,
       method_variant = <slot wrapper '__getitem__' of 'torch._C._TensorBase' objects>,
       inplace_variant = None,
       skips = (<torch.testing._internal.common_methods_invocations.DecorateInfo object at 0x7f463acbca90>,
                <torch.testing._internal.common_methods_invocations.DecorateInfo object at 0x7f463acbcae0>),
       decorators = (<torch.testing._internal.common_methods_invocations.DecorateInfo object at 0x7f463acbca90>,
                     <torch.testing._internal.common_methods_invocations.DecorateInfo object at 0x7f463acbcae0>),
       sample_inputs_func = <function sample_inputs_getitem at 0x7f463acc6af0>,
       reference_inputs_func = None,
       error_inputs_func = None,
       sample_inputs_sparse_coo_func = <function _DecoratorContextManager.__call__.<locals>.decorate_context at 0x7f463acc6b80>,
       sample_inputs_sparse_csr_func = <function _DecoratorContextManager.__call__.<locals>.decorate_context at 0x7f463acc6c10>,
       dtypes = {torch.int16,
                 torch.float64,
                 torch.int32,
                 torch.int64,
                 torch.complex64,
                 torch.float16,
                 torch.bfloat16,
                 torch.uint8,
                 torch.complex128,
                 torch.bool,
                 torch.float32,
                 torch.int8},
       dtypesIfCUDA = {torch.int16,
                       torch.float64,
                       torch.int32,
                       torch.int64,
                       torch.complex64,
                       torch.float16,
                       torch.bfloat16,
                       torch.uint8,
                       torch.complex128,
                       torch.bool,
                       torch.float32,
                       torch.int8},
       dtypesIfROCM = {torch.int16,
                       torch.float64,
                       torch.int32,
                       torch.int64,
                       torch.complex64,
                       torch.float16,
                       torch.bfloat16,
                       torch.uint8,
                       torch.complex128,
                       torch.bool,
                       torch.float32,
                       torch.int8},
       backward_dtypes = {torch.int16,
                          torch.float64,
                          torch.int32,
                          torch.int64,
                          torch.complex64,
                          torch.float16,
                          torch.bfloat16,
                          torch.uint8,
                          torch.complex128,
                          torch.bool,
                          torch.float32,
                          torch.int8},
       backward_dtypesIfCUDA = {torch.int16,
                                torch.float64,
                                torch.int32,
                                torch.int64,
                                torch.complex64,
                                torch.float16,
                                torch.bfloat16,
                                torch.uint8,
                                torch.complex128,
                                torch.bool,
                                torch.float32,
                                torch.int8},
       backward_dtypesIfROCM = {torch.int16,
                                torch.float64,
                                torch.int32,
                                torch.int64,
                                torch.complex64,
                                torch.float16,
                                torch.bfloat16,
                                torch.uint8,
                                torch.complex128,
                                torch.bool,
                                torch.float32,
                                torch.int8},
       supports_out = False,
       supports_autograd = True,
       supports_gradgrad = True,
       supports_fwgrad_bwgrad = True,
       supports_inplace_autograd = False,
       supports_forward_ad = True,
       gradcheck_wrapper = <function OpInfo.<lambda> at 0x7f463a7a40d0>,
       check_batched_grad = True,
       check_batched_gradgrad = True,
       check_batched_forward_grad = True,
       check_inplace_batched_forward_grad = True,
       gradcheck_nondet_tol = 0.0,
       gradcheck_fast_mode = None,
       aten_name = '__getitem__',
       decomp_aten_name = None,
       aten_backward_name = None,
       assert_autodiffed = False,
       autodiff_nonfusible_nodes = ['aten::__getitem__'],
       autodiff_fusible_nodes = [],
       supports_sparse = False,
       supports_scripting = False,
       supports_sparse_csr = False,
       test_conjugated_samples = True,
       test_neg_view = True,
       assert_jit_shape_analysis = False,
       supports_expanded_weight = False)
```

cc @ezyang

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76810
Approved by: https://github.com/ezyang
2022-05-16 14:20:41 +00:00
Linbin Yu
168dc70faf [mobile] add buck build for mobile targets (#76480)
Create buck targets to replicate internal BUCK build, including
- XNNPACK
- QNNPACK
- C10
- aten_cpu
- torch_mobile_core
- torch_mobile_all_ops
- ptmobile_benchmark

And able to run mobilenet v2 using ptmobile_benchmark (with all ops).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76480
Approved by: https://github.com/seemethere, https://github.com/dreiss
2022-05-15 18:42:41 +00:00
PyTorch MergeBot
3cade9d454 Revert "[LT] Codegen ReuseNode for supported ops"
This reverts commit 6066e5929f.

Reverted https://github.com/pytorch/pytorch/pull/76738 on behalf of https://github.com/malfet
2022-05-14 00:33:10 +00:00
Bin Bao
6066e5929f [LT] Codegen ReuseNode for supported ops
Summary:
1. Update the codegen script to add a TrieCache lookup (ReuseNode)
before creating a new IR node. The following is an example generated
code,

```
    at::Tensor LazyNativeFunctions::add(const at::Tensor & self, const at::Tensor & other, const at::Scalar & alpha) {
        ...
        torch::lazy::NodePtr node = torch::lazy::ReuseNode<AddTensor>(lazy_self->GetIrValue(), lazy_other->GetIrValue(), node_alpha);
        if (!node) {
            auto out_meta = at::meta::add(self, other, alpha);
            std::vector<Shape> shapes{Shape(out_meta.scalar_type(), out_meta.sizes().vec())};
            TORCH_INTERNAL_ASSERT(shapes.size() == 1);
            if(symbolicShapeEnabled()){
                std::vector<jit::IValue> inputs = { self, other, alpha };
                char* schema_str = "aten::add.Tensor(Tensor self, Tensor other, *, Scalar alpha=1) -> Tensor";
                applySymbolicShapesOnLT(schema_str, inputs, shapes);
            }

            node = torch::lazy::MakeNode<AddTensor>(lazy_self->GetIrValue(), lazy_other->GetIrValue(), node_alpha, std::move(shapes));
            CacheNode(node);
        }
        ...
    }
```
2. TrieCache lookup depends on each IR node subclass to provide its own
comparison function. The following is an example generated code,

```
  bool CanBeReused(const torch::lazy::Value& self, const torch::lazy::Value& other, const torch::lazy::Value& alpha) const {
    size_t i = 0;
    return (operand(i++) == self &&
        operand(i++) == other &&
        operand(i++) == alpha);
  }
```

3. DeviceData is specially handled.

4. Non-codegen op changes are coming a separate PR.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76738

Approved by: https://github.com/JackCaoG, https://github.com/wconstab
2022-05-13 19:13:58 +00:00
Kulin Seth
e011a8e18b Enable PyTorch operations on MPS Backend. (#77343)
Add PyTorch operations to MPS backend.

- https://github.com/pytorch/pytorch/issues/77394
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77343
Approved by: https://github.com/albanD
2022-05-13 18:28:53 +00:00
JackCaoG
e36a8c1f13 Lazy codegen change for xla (#76717)
Codegen change to enable PyTorch/XLA to generate the first op in https://github.com/pytorch/xla/pull/3544.

@bdhirsh @wconstab
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76717
Approved by: https://github.com/Krovatkin
2022-05-12 17:04:04 +00:00
Brian Hirsh
47dd092bae add a new at::lift operator, fix torch.tensor for functionalization
This reverts commit 85bd65a880.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77285

Approved by: https://github.com/albanD, https://github.com/ezyang
2022-05-12 13:31:19 +00:00
PyTorch MergeBot
85bd65a880 Revert "[test] try to fix torch.tensor for functionalization"
This reverts commit 9edee09ed6.

Reverted https://github.com/pytorch/pytorch/pull/76319 on behalf of https://github.com/janeyx99
2022-05-11 18:48:42 +00:00
Brian Hirsh
9edee09ed6 [test] try to fix torch.tensor for functionalization
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76319

Approved by: https://github.com/ezyang
2022-05-11 17:27:34 +00:00
Kulin Seth
f348b1b2b5 Add the Runtime components for MPS backend. (#76725)
The PR adds the runtime components and few basic operations like copy, as_strided for MPS backend.

Current list of identified TODOs are:

-  https://github.com/pytorch/pytorch/issues/77176
- Unify the logic with CUDACachingAllocator and remove redundant code.
-  https://github.com/pytorch/pytorch/issues/77170
- Look into using C++ smart pointers where possible with ObjC code
- Use empty_strided_generic() to implement the `empty_strided_mps` code
- https://github.com/pytorch/pytorch/issues/77144
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76725
Approved by: https://github.com/albanD
2022-05-11 17:19:45 +00:00
Bin Bao
8f5cdc6d5d Revert "Revert "[LT] Store OpKind for each IR subclass in a static field""
Summary: Re-land https://github.com/pytorch/pytorch/pull/76711 by
fixing internal build errors.
Generate class-level opkind as a static method instead of a static
member.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77102

Approved by: https://github.com/wconstab, https://github.com/JackCaoG, https://github.com/antoniojkim
2022-05-11 12:27:05 +00:00
PyTorch MergeBot
7eaf4780ba Revert "[LT] Store OpKind for each IR subclass in a static field"
This reverts commit ac37ddc795.

Reverted https://github.com/pytorch/pytorch/pull/76711 on behalf of https://github.com/malfet
2022-05-09 20:50:09 +00:00
Nikolay Korovaiko
daf8c48a87 Revert "Revert "[WIP] customize the C++ class for valueT"" (#77003)
This reverts commit ec841b0346.

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77003
Approved by: https://github.com/shunting314, https://github.com/JackCaoG
2022-05-09 17:40:17 +00:00
PyTorch MergeBot
ec841b0346 Revert "[WIP] customize the C++ class for valueT"
This reverts commit c152817926.

Reverted https://github.com/pytorch/pytorch/pull/76911 on behalf of https://github.com/suo
2022-05-06 22:36:04 +00:00
Nikolay Korovaiko
c152817926 [WIP] customize the C++ class for valueT
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76911
Approved by: https://github.com/wconstab
2022-05-06 21:05:35 +00:00
Bin Bao
ac37ddc795 [LT] Store OpKind for each IR subclass in a static field
Summary: Currently OpKind is stored as an object field called op_ for each IR
node, and one usage of op_ is to avoid dynamic_cast in NodeCast when we
need to downcast a base-node pointer into a concrete sub-node pointer.
As a result, we need to construct and pass in an op when downcasting
nodes, and this becomes quite anonnying when we start to implement the
trie-based IR node reusing. More importantly, the op for each subclass
should be unique for that subclass and thus making it a const static field
is a more logical design.

In this PR, we still keep the object-level op_ for easier XLA adoption. As
furture work, we can come back to remove op_, make the op() method
virtual, and get rid of OpKind in all the node constructors.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76711

Approved by: https://github.com/wconstab, https://github.com/JackCaoG
2022-05-06 19:14:46 +00:00
Yukio Siraichi
fcf38a5812 Add support to Tensor[]? for structured kernel codegen.
This PR turns the previously introduced `ITensorList` into a more general `IList`
class. It is a container wrapper for arbitrary types (given their appropriate
implementations).

In summary, I have:

- Renamed `ITensorList` (its iterators and macros, for consistency) to `IList`
- Made `IList` a templated function (for an arbitrary type `T`), given that they:
     - Specialize `IListTagImpl<T, Tag>`, for all `IListTag`
- Introduced type aliases (for both list and iterator types):
     - `at::ITensorList` -> `c10::IList<at::Tensor>`
     - `at::IOptTensorRefList` -> `c10::IList<at::OptionalTensorRef>`
- Added support for `Tensor?[]` in the structured codegen

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69606

Approved by: https://github.com/ezyang
2022-05-06 14:24:18 +00:00
Peter Bell
6df5a53127 Fix unmarked fstring
This error message currently prints the format string literally, because the string isn't marked with the `f`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76841
Approved by: https://github.com/bdhirsh
2022-05-04 21:18:05 +00:00
Hui Guo
ca0f267022 [Static Runtime] [RFC] Codegen support for ops with unstructured kernels (#76203)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76203

Request for comments:

This change adds extra code generator support to generate out variant wrappers for operators with unstructured kernels.

The current version generates 105 new out variant wrappers in addition to the existing 136 auto-generated out variants wrappers.

This change shows that a simple tweak can increase the generated op coverage to 16% (241/1559) among all native ops described in native_functions.yaml no. matter if they are structured or not.

Command to generate out variant wrappers.
```
buck run //caffe2/torch/fb/jit:gen_static_runtime_ops
```
- AFTER this change
```
total grouped native ops: 1559
structured grouped native ops: 545
generated grouped native ops: 241
```

- BEFORE this change
```
total grouped native ops: 1503
structured grouped native ops: 540
generated grouped native ops: 136
```

To enable CI tests and make it easier to review, the generated ops are added in a separate diff: D35945633

More details:
We added a block list to remove the generation of around 10 operations that are deprecated or for which the unit test would fail. All generated ops are well *compiled* but the compiled unittest may not pass due to the lack of hand-picked test input values for certain ops. Among the 42 ops whose unittest does not pass, 1 (op "index_select") is repeated from the existing ops; 32 ops are fixed; and 9 ops are removed and blocked from generation because either it is not being commonly used in internal models such as "cholesky", "linalg_householder_product", sparse kernel "sspaddmm", or it causes some errors in static runtime such as "conj_physical" leads to an error in memory planner, and "binary_cross_entropy".

Test Plan:
OP generation:
```buck run //caffe2/torch/fb/jit:gen_static_runtime_ops```

Test generated ops:
```buck run mode/opt //caffe2/benchmarks/static_runtime:static_runtime_cpptest```

Reviewed By: tenpercent

Differential Revision: D34913736

fbshipit-source-id: a6f408321653c3589ae1c76826177fc403d59c44
(cherry picked from commit 6f4501730478dbaeeea7f3ad4f9d29bf6787e7c1)
2022-05-04 19:34:19 +00:00
Michael Suo
fb0f285638 [lint] upgrade mypy to latest version
Fixes https://github.com/pytorch/pytorch/issues/75927.

Had to fix some bugs and add some ignores.

To check if clean:
```
lintrunner --paths-cmd='git grep -Il .' --take MYPY,MYPYSTRICT
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76753

Approved by: https://github.com/malfet
2022-05-03 20:51:34 +00:00
John Clow
db21e22b4b [EASY] Quick Fix for broken shape function autogen.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76703

Approved by: https://github.com/eellison
2022-05-03 17:34:05 +00:00
Bin Bao
f8a4780eb2 [LT] Move MakeNode into ir_builder.h
Summary: Move MakeNode into ir_builder.h to avoid circular header
reference later when introducing a trie cache for IR node lookup.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76482

Approved by: https://github.com/wconstab
2022-05-03 14:53:19 +00:00
Will Constable
d0cb31d5bc Make lazy tensor ptr class customizable (#76476)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/76476

Test Plan: Imported from OSS

Reviewed By: Krovatkin, bdhirsh

Differential Revision: D35980433

Pulled By: wconstab

fbshipit-source-id: 1d4d00a494bf8aea86278b007f7f353cd7a822f8
(cherry picked from commit a78655bef23b5fa8487ced13443ca0bfdec65e5c)
2022-04-28 03:21:56 +00:00
Will Constable
4cae57080a Make lazy tensor creation and value strings customizable (#76472)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76472

- lets XLA backend customize codegenned functions during
  migration to LTC

Test Plan: Imported from OSS

Reviewed By: Krovatkin, bdhirsh

Differential Revision: D35980435

Pulled By: wconstab

fbshipit-source-id: 6138aef20862fccec40d715ffbb5a40a0a7d0401
(cherry picked from commit bad23f4b7ef73ffc2ef4a893364512907e9c4555)
2022-04-28 03:21:56 +00:00
Will Constable
cfc90cf3eb Fix GenLazyIR.node_base_ctor_call (#76471)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76471

Make node_base_ctor_call produce the entire node_bace_ctor_call.

Previously it was only producing the beginning of the call, which was unintended.

Addresses part of https://github.com/pytorch/xla/issues/3472

Test Plan: Imported from OSS

Reviewed By: qihqi, ngimel

Differential Revision: D35980436

Pulled By: wconstab

fbshipit-source-id: a443cf593ac7c35b2b65e72b82907e88e1e71c7a
(cherry picked from commit 360ad6d82a7e8303b8a60e61b177dabf0131ea8b)
2022-04-28 03:21:56 +00:00
anjali411
b204ad863f Revert "Revert "Allow specifying tags for aten operators in native_functions.yaml""
This reverts commit ea44645c9a.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76456

Approved by: https://github.com/osalpekar
2022-04-28 02:04:57 +00:00
Brian Hirsh
40d96f0afd Revert "functionalization: add support for zero_()"
This reverts commit 7d44b3675b.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76375

Approved by: https://github.com/datumbox, https://github.com/albanD
2022-04-26 19:27:27 +00:00
Edward Z. Yang
c2ae0b01c0 Reapply black for torchgen, this time with lint to fix!
Signed-off-by: Edward Z. Yang <ezyangfb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76359

Approved by: https://github.com/suo
2022-04-26 04:03:38 +00:00
Nikolay Korovaiko
bb60cac25a E2E SymInt example narrow_copy
This **roughly** corresponds to Goal 3.2 in https://docs.google.com/document/d/1iiLNwR5ohAsw_ymfnOpDsyF6L9RTUaHMpD8YLw-jxEw/edit#

Namely:

It adds the following:

* SymbolicIntNode interface
* LazySymbolicIntNode implementation
* Lazy `narrow_copy` implementation
* Need add support for SymInt in codegen
* Test (below)

```cpp
TEST(LazyDynamicOpsTest, NarrowCopy) {
  auto x = torch::rand({5, 10, 10}).to(kLazy);
  const size_t Y_DIM = 3;
  const size_t X_DIM_INDEX = 2;
  auto y = torch::rand({Y_DIM}).to(kLazy);
  auto ly = torch::lazy::TryGetLtcTensor(y);
  auto dim_node = MakeNode<SizeNode>(ly->GetIrValue(), 0);
  auto lmn = new torch::lazy::SymbolicIntNode(dim_node);
  auto z = x.narrow_copy(X_DIM_INDEX, 0, lmn->toSymInt());
  AllClose(z.cpu(), x.cpu().narrow_copy(X_DIM_INDEX, 0, Y_DIM));
}
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75759
Approved by: https://github.com/wconstab
2022-04-26 02:40:27 +00:00
Brian Hirsh
640ce6bc9b functionalization bugfix: using owning type when unwrapping tensors
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76125

Approved by: https://github.com/ezyang
2022-04-25 22:00:19 +00:00
Brian Hirsh
74e93f727a remove _is_foreach_op codegen special cases, clean up mutable return type checks
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76190

Approved by: https://github.com/ezyang
2022-04-25 21:34:17 +00:00
Brian Hirsh
5da76acd1d functionalization: add a copy() native function
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76083

Approved by: https://github.com/albanD
2022-04-25 21:31:48 +00:00
Brian Hirsh
7d44b3675b functionalization: add support for zero_()
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75913

Approved by: https://github.com/albanD
2022-04-25 21:31:48 +00:00
Priya Ramani
f954c0a774 [Pytorch][4/4 Static dispatch] Support multiple backends with multiple kernels (#76059)
Summary:
- Supports multiple backends with multiple kernels in static dispatch
- Refactor static dispatch generators

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76059
ghstack-source-id: 154735166

Test Plan:
```
(pytorch)  ~/fbsource
└─ $ buck build --config pt.enable_lightweight_dispatch=1 --config pt.static_dispatch_backend="CPU;QuantizedCPU;CompositeExplicitAutograd" //xplat/caffe2/fb/lite_predictor:lite_predictor_flatbuffer
```

Reviewed By: bdhirsh

Differential Revision: D35727473

fbshipit-source-id: 986ba3390c6e585fcf8477b6d069720ee1fbc90b
(cherry picked from commit 6473990c208a78879985e4cdfb50960f5727ad5e)
2022-04-25 21:18:08 +00:00
Edward Yang
36420b5e8c Rename tools/codegen to torchgen (#76275)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76275

In preparation for addressing
https://github.com/pytorch/pytorch/issues/73212

Diff was generated with:

```
git mv tools/codegen torchgen
git grep -l 'tools.codegen' | xargs sed -i 's/tools.codegen/torchgen/g'
sed -i "s/\${TOOLS_PATH}\/codegen/\${TORCH_ROOT}\/torchgen/g" caffe2/CMakeLists.txt
```

and a manual edits to:

* tools/test/test_gen_backend_stubs.py
* torchgen/build.bzl
* torchgen/gen_backend_stubs.py

aka this diff:

```
 diff --git a/tools/test/test_gen_backend_stubs.py b/tools/test/test_gen_backend_stubs.py
index 3dc26c6d2d..104054575e 100644
 --- a/tools/test/test_gen_backend_stubs.py
+++ b/tools/test/test_gen_backend_stubs.py
@@ -9,7 +9,7 @@ from torchgen.gen_backend_stubs import run
 from torchgen.gen import _GLOBAL_PARSE_NATIVE_YAML_CACHE  # noqa: F401

 path = os.path.dirname(os.path.realpath(__file__))
-gen_backend_stubs_path = os.path.join(path, '../torchgen/gen_backend_stubs.py')
+gen_backend_stubs_path = os.path.join(path, '../../torchgen/gen_backend_stubs.py')

 # gen_backend_stubs.py is an integration point that is called directly by external backends.
 # The tests here are to confirm that badly formed inputs result in reasonable error messages.
 diff --git a/torchgen/build.bzl b/torchgen/build.bzl
index ed04e35a43..d00078a3cf 100644
 --- a/torchgen/build.bzl
+++ b/torchgen/build.bzl
@@ -1,6 +1,6 @@
 def define_targets(rules):
     rules.py_library(
-        name = "codegen",
+        name = "torchgen",
         srcs = rules.glob(["**/*.py"]),
         deps = [
             rules.requirement("PyYAML"),
@@ -11,6 +11,6 @@ def define_targets(rules):

     rules.py_binary(
         name = "gen",
-        srcs = [":codegen"],
+        srcs = [":torchgen"],
         visibility = ["//visibility:public"],
     )
 diff --git a/torchgen/gen_backend_stubs.py b/torchgen/gen_backend_stubs.py
index c1a672a655..beee7a15e0 100644
 --- a/torchgen/gen_backend_stubs.py
+++ b/torchgen/gen_backend_stubs.py
@@ -474,7 +474,7 @@ def run(
 ) -> None:

     # Assumes that this file lives at PYTORCH_ROOT/torchgen/gen_backend_stubs.py
-    pytorch_root = pathlib.Path(__file__).parent.parent.parent.absolute()
+    pytorch_root = pathlib.Path(__file__).parent.parent.absolute()
     template_dir = os.path.join(pytorch_root, "aten/src/ATen/templates")

     def make_file_manager(install_dir: str) -> FileManager:
```

run_all_fbandroid_tests

Test Plan: sandcastle

Reviewed By: albanD, ngimel

Differential Revision: D35770317

fbshipit-source-id: 153ac4a7fef15b1e750812a90bfafdbc8f1ebcdf
(cherry picked from commit c6d485d1d4648fa1c8a4c14c5bf3d8e899b9b4dd)
2022-04-25 01:38:06 +00:00