pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Edward Z. Yang	ec91f42d79	Sync up DispatchKey in model with C++ (#81770 ) I also dropped some keys that are not useful for codegen. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/81770 Approved by: https://github.com/bdhirsh	2022-07-21 21:23:54 +00:00
Peter Bell	5f2e31797a	Replace _dtype_default_type_hack (#81479 ) Currently any function with a default dtype other than None has to be manually entered into this function. Instead, this reads the default directly from `native_functions.yaml`. In order to do this, I also change `PythonSignatureGroup` to take `tensor_options_args` from the functional variant since the out variant doesn't actually have tensor options arguments to take the default values from. Also note that we need to use `default_init` instead of `default` because the out argument version doesn't have a `tensor_options` argument to extract the default value from and so the PythonSignature objects wouldn't match. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81479 Approved by: https://github.com/albanD	2022-07-21 16:42:49 +00:00
Brian Hirsh	ed0091f8db	fix view_copy kernel striding check logic (#81553 ) The composite kernel for `view_copy` that we generate is special-cased a bit for efficiency to avoid having to do extra clones in some cases. That logic was slightly wrong though, and is fixed here (it needs to mirror the logic in `reshape()`). It manifested as a debug assert firing for Lazy Tensor, which I confirmed no longer fires when running this script: ``` # ran with "python test_ltc_only_torch.py --device=lazy --sync=1 --nvtx=1" import torch import torch._lazy from torch._lazy.ts_backend import init as init_ts_backend init_ts_backend() torch.manual_seed(42) from transformers import BertForSequenceClassification def parse_args(): import argparse parser = argparse.ArgumentParser(description='') parser.add_argument('--device', type=str, default='cuda') parser.add_argument('--sync', type=bool, default=False) parser.add_argument('--nvtx', type=bool, default=False) return parser.parse_args() args = parse_args() device = args.device model = BertForSequenceClassification.from_pretrained('bert-base-uncased', return_dict=True) from transformers import AdamW from transformers import BertTokenizer tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') text_batch = ["I love Pixar.", "I don't care for Pixar."] encoding = tokenizer(text_batch, return_tensors='pt', padding=True, truncation=True) input_ids = encoding['input_ids'].to(device) attention_mask = encoding['attention_mask'].to(device) model = model.to(device) model.train() no_decay = ['bias', 'LayerNorm.weight'] optimizer_grouped_parameters = [ {'params': [p for n, p in model.named_parameters() if not any(nd in n for nd in no_decay)], 'weight_decay': 0.01}, {'params': [p for n, p in model.named_parameters() if any(nd in n for nd in no_decay)], 'weight_decay': 0.0} ] optimizer = AdamW(optimizer_grouped_parameters, lr=1e-5) labels = torch.tensor([1,0]).unsqueeze(0).to(device) for _ in range(6): torch.cuda.nvtx.range_push(f'Iter{_}') torch.cuda.nvtx.range_push('F') outputs = model(input_ids, attention_mask=attention_mask, labels=labels) if args.sync: torch._lazy.mark_step() torch._lazy.wait_device_ops() torch.cuda.nvtx.range_pop() loss = outputs.loss torch.cuda.nvtx.range_push('B') optimizer.zero_grad() loss.backward() if args.sync: torch._lazy.mark_step() torch._lazy.wait_device_ops() torch.cuda.nvtx.range_pop() torch.cuda.nvtx.range_push('O') optimizer.step() if args.sync: torch._lazy.mark_step() torch._lazy.wait_device_ops() torch.cuda.nvtx.range_pop() torch.cuda.nvtx.range_pop() torch._lazy.mark_step() torch._lazy.wait_device_ops() ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/81553 Approved by: https://github.com/ezyang	2022-07-19 13:47:05 +00:00
Mengwei Liu	9f873ed7c8	[torchgen] support codegen'd C++ API for a mixture of namespaces (#81581 ) Summary: In #77710 I introduces some hack to allow static dispatch to take namespaces. After we introduced namespace into ops and kernels, we don't have to pass namespace into `static_dispatch()`; instead we will generate ops with the kernel namespace for `Functions.h`. After this diff: If we have a yaml file looking like this: ``` - func: op_1(Tensor(a) self) -> Tensor(a) dispatch: CPU: at::op_1_kernel # ATen kernel - func: op_2(Tensor(a) self) -> Tensor(a) dispatch: CPU: custom::op_2_kernel # custom kernel ``` `Functions.h` will contain the following C++ APIs: ``` TORCH_API inline at::Tensor & op_1(at::Tensor & self) { return at::cpu::op_1_kernel(self); } TORCH_API inline at::Tensor & op_2(at::Tensor & self) { return custom::cpu::op_2_kernel(self); } ``` Test Plan: Rely on CI Differential Revision: D37900753 Pull Request resolved: https://github.com/pytorch/pytorch/pull/81581 Approved by: https://github.com/iseeyuan	2022-07-19 07:46:36 +00:00
Sergii Dymchenko	19a296486b	Remove unused local variables from generator.py (#81505 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/81505 Approved by: https://github.com/seemethere	2022-07-18 16:26:59 +00:00
Huy Do	a4647cc1fa	Apply ufmt linter to all py files under torchgen (#81570 ) Previous batches: * https://github.com/pytorch/pytorch/pull/81285 * https://github.com/pytorch/pytorch/pull/81335 We have multiple batches here to minimize merge conflicts and reviewing process. Once everything has been formatted by ufmt (black+usort), the current black linter will be removed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81570 Approved by: https://github.com/ezyang	2022-07-16 03:52:25 +00:00
Edward Z. Yang	fca03eeec1	Make proxy tensor support item() calls on torch.tensor constants (#81192 ) This PR is doing a few interrelated things, all of which are necessary to get correctness. Read the comment in torch/fx/experimental/proxy_tensor.py for the high level overview. Let's break down the parts of this PR: * Bug fix where `enable_torch_dispatch_mode` with `None` doesn't work. This make `enable_torch_dispatch_mode(current_mode.inner)` work which is the basis for how we temporarily disable fake tensor mode. * Bug fix for when fake tensor mode is combined with a non-mode tensor subclass. This actually could be ablated from this PR but it affects where the logic for allowing non fake tensor inputs with lift goes, so it's all in here in one go. There are some relevant tests for the fix in fake tensor, but it turns out I didn't need this because I'm always using proxy tensors as a mode (which ensures the ordering is right.) * New `lift_fresh` view operator. Note that like lift, we have to manually write the functionalize kernel for these functions. * The actual change, which is to save constants when we see them in the proxy tensor mode, and then propagate them as we go (because otherwise you'll handle mutations on constants incorrectly--see test.) This is mildly BC-breaking if anyone was previously interposing on at::lift, but this operator was relatively new and I checked functorch which has no explicit reference to lift. So I think it should not be too disruptive. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/81192 Approved by: https://github.com/samdow, https://github.com/bdhirsh	2022-07-15 03:53:40 +00:00
Sergii Dymchenko	3dea7fe6f3	Remove unused local variables from gen.py (#81508 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/81508 Approved by: https://github.com/huydhn	2022-07-15 01:26:32 +00:00
Tim Gates	3a87b47de9	docs: Fix a few typos (#81435 ) There are small typos in: - caffe2/python/recurrent.py - test/distributed/test_c10d_nccl.py - test/test_fx.py - torch/csrc/jit/runtime/autodiff.cpp - torchgen/gen.py Fixes: - Should read `propagation` rather than `propogation`. - Should read `multiplied` rather than `multuplied`. - Should read `eliminate` rather than `elminate`. - Should read `dispatcher` rather than `disaptcher`. Semi-automated pull request generated by https://github.com/timgates42/meticulous/blob/master/docs/NOTE.md Pull Request resolved: https://github.com/pytorch/pytorch/pull/81435 Approved by: https://github.com/ngimel	2022-07-14 04:20:26 +00:00
Huy Do	8f07b7a069	Fix circular import error in torchgen (#81355 ) This also formats `tools/pyi/gen_pyi.py` with `usort` to test the fix because that is how the bug was discovered. The usort-formatted `gen_pyi.py` should work now without any issues Fixes #81294 Pull Request resolved: https://github.com/pytorch/pytorch/pull/81355 Approved by: https://github.com/ezyang	2022-07-13 03:16:38 +00:00
Mengwei Liu	80f6d2e9e6	[torchgen] Extract out schema registration logic into a function (#80780 ) Summary: A followup to #78015 and #79733. In those PRs I introduced custom namespace support into: * `Register<DispatchKey>.cpp` * `RegisterSchema.cpp` * `NativeFunctions.h` This PR extracts out logic that generates schema registration code (used in `RegisterSchema.cpp`) into a function so that it can be easily tested and reused. Added unit test to cover the logic as well. Test Plan: Rely on newly added unit tests. Differential Revision: D37581186 Pull Request resolved: https://github.com/pytorch/pytorch/pull/80780 Approved by: https://github.com/iseeyuan	2022-07-12 21:52:42 +00:00
Brian Hirsh	f84b30f790	fix functionalization regression introduced by ProxyTorchDispatchMode, migrate testing to make_fx (#80416 ) `ProxyTorchDispatchMode` was added recently as part of `make_fx`, which was secretly causing the meta tensor calls used inside of functionalization to get baked into the graph. It also wasn't caught because the functionalization tests in core don't use `make_fx`, and the tests in functorch aren't as comprehensive. Now that `make_fx` is in core, I also ported the functionalization test infra over to use it, which would have caught the regression. This also makes the tests cleaner, since mode-based tracing lets us pick up factory functions in the trace output. Pull Request resolved: https://github.com/pytorch/pytorch/pull/80416 Approved by: https://github.com/ezyang, https://github.com/albanD	2022-07-12 01:46:16 +00:00
Mengwei Liu	5c8a9803c8	[torchgen] Support multiple namespace in NativeFunctions.h (#79733 ) Summary: This is a follow up to #78015. This PR * introduces namespace logic for generating `NativeFunctions.h`. * adds helper function to extract namespace from string * relaxes the constraint on the levels we support for custom kernel namespace to 2 Test Plan: Yaml entry: ``` - func: unsqueeze.out(Tensor(a) self, int dim, *, Tensor(a!) out) -> Tensor(a!) variants: function device_check: NoCheck dispatch: CPU: custom_1::custom_2::unsqueeze ``` Generated `NativeFunctions.h`: ``` namespace custom_1 { namespace custom_2 { namespace native { TORCH_API at::Tensor & unsqueeze(const at::Tensor & self, int64_t dim, at::Tensor & out); } // namespace native } // namespace custom_2 } // namespace custom_1 ``` Differential Revision: D37198111 Pull Request resolved: https://github.com/pytorch/pytorch/pull/79733 Approved by: https://github.com/bdhirsh	2022-07-08 21:56:52 +00:00
Brian Hirsh	960758b0b7	fix overload ambiguity with functional ops; fix _foreach op grouping (#80556 ) This should fix the last issue that @anijain2305 hit when running ResNet with TorchDynamo <> functionalization. Today if you try to call an `OpOverloadPacket` from python with some arguments, we will use the types of those arguments to perform overload resolution. With some functional variants of ops, this can be ambiguous. Today this affects just one op: `_fused_moving_avg_obs_fq_helper`, although it would potentially affect e.g. `native_batch_norm` in the future. Example: ``` # There are technically two overloads: # torch.ops.aten._fused_moving_avg_obs_fq_helper.default (returns 2 argument, mutates 4 of its inputs inplace) # torch.ops.aten._fused_moving_avg_obs_fq_helper.functional (returns 6 argument, mutates none of its inputs) # We pick the wrong one - no way to know that we should pick the functional one, just from the call site. outs = torch.ops.aten._fused_moving_avg_obs_fq_helper(a, a, a, a, a, a, a, 1.0, 0, 1, 0) # raises an error - tries to call the overload with only 2 returns return _fused_moving_avg_obs_fq_helper_functional[5] ``` Specifically, functionalization will bake `_fused_moving_avg_obs_fq_helper.functional` into the graph, but when AOTAutograd tries to compile with TorchScript, it needs to remove the overload name (TS doesn't know how to parse overload names directly, so we need to remove the overload name and let it infer the right overload at runtime later- so it picks the wrong one). The situation is pretty similar to inplace; `ops.aten.add` and `ops.aten.add_` represent two different `OverloadPacket` objects; they can't be overloads of the same op, because their schemas would be ambiguous - the alias annotations are different, but that isn't enough to disambiguate). In this PR, I try to fix the situation in a pretty similar way to how we handle `inplace` in the data model: `inplace` ops get their own base operator name, but they are represented as a flag inside of `BaseOperatorName` in the data model. Two other important changes that I made as part of this PR: (1) Originally, there were ~100 different `_functional` operators: e.g. we had operators named `resize.functional` and `zero.functional`. The `_functional` bit isn't actually necessary in most cases: it's only necessary for operators that also* have a `SchemaKind.mutable` variant, where `_fused_moving_avg_obs_fq_helper` is the only op that fits that description today. So I removed the unnecessary notion of "functional" from those other ops. I also added a bunch of assertions to force this restriction. I think that makes more sense in the long run, because it eliminates an unnecessary difference in the model. E.g. we don't have `add_.Tensor` and `add.Tensor_functional`. We just have `add_.Tensor` and `add.Tensor`. (2) I noticed that we actually still weren't pairing up a bunch of `_foreach` operators correctly, because their input arguments were different (`self` vs. `tensors`). Since they're private API's, I went ahead and changed the argument names directly so they get matched up. Before this PR, we were generating a separate `_foreach_add` and `_foreach_add.functional` variant in a bunch of cases, that really did the same thing (but happened to have a different name for the first argument). Pull Request resolved: https://github.com/pytorch/pytorch/pull/80556 Approved by: https://github.com/ezyang, https://github.com/albanD	2022-07-06 12:45:11 +00:00
Edward Z. Yang	805120ab57	See if we can elide TORCH_API from inline functions. (#80609 ) See https://github.com/pytorch/pytorch/issues/80604 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/80609 Approved by: https://github.com/malfet	2022-06-30 23:31:38 +00:00
Brian Hirsh	c2d395cf8e	functionalization <> LTC integration (take 3) (#80251 ) new PR for https://github.com/pytorch/pytorch/pull/75527. It looks like there's a bug in the windows CI scripts that was causing flaky failures, that disappear when I create a new PR. example failure: https://github.com/pytorch/pytorch/runs/6999272635?check_suite_focus=true Pull Request resolved: https://github.com/pytorch/pytorch/pull/80251 Approved by: https://github.com/wconstab	2022-06-26 23:10:21 +00:00
Nikita Shulga	f11cce309b	[MPS] Add equal operator (#80195 ) Which is, in essence is composite of `eq`->`all`->`item` `native/mps/operators/Equal.cpp` is an almost verbatim copy of `native/cuda/Equal.cpp` Fix codegen by generating MPSFunctions headers Pull Request resolved: https://github.com/pytorch/pytorch/pull/80195 Approved by: https://github.com/albanD	2022-06-25 12:40:52 +00:00
Peter Bell	2c43876f64	AT_DISPATCH: Expose switch-case like macro syntax (#79978 ) This expands the `AT_DISPATCH` macros to enable writing your own `AT_DISPATCH_SWITCH` statements with multiple `AT_DISPATCH_CASE` labels. So, where previously you may have written: ```cpp if (iter.common_dtype() == kBool) { my_bool_kernel(iter); } else { AT_DISPATCH_INTEGRAL_TYPES(iter.common_dtype(), "my_kernel", [&] { ... }); } ``` You can now instead write ```cpp AT_DISPATCH_SWITCH(iter.common_dtype(), "my_kernel", AT_DISPATCH_CASE(kBool, [&] { my_kernel_bool(iter); }) AT_DISPATCH_CASE_INTEGRAL_TYPES([&] { ... }) ); ``` The macro syntax is a bit ugly, however the benefits are: - Greater flexibility, as the kernel code doesn't have to be shared for all dtypes. - Selective build and RECORD_KERNEL_FUNCTION work even for single dtype specializations such as the bool case in the example. - The compiler sees a single switch for all types, which should be easier to optimize into a jump table. - We also now get errors if the same scalar type is handled twice. Pull Request resolved: https://github.com/pytorch/pytorch/pull/79978 Approved by: https://github.com/ezyang	2022-06-25 04:06:56 +00:00
Henry Tu	fc6b645fe2	Prevent out of bounds access to null LTC operands (#80060 ) When constructing a lazy::Node, [null operands (optional values that aren't included) are dropped](`30fb2c4aba/torch/csrc/lazy/core/ir.cpp (L82-L84)`), so it’s possible for the stored operand list to be a different length than the one that was passed into the constructor. This can become a problem during the call to `CanBeReused` in the autogen `LazyIr.h` code. For example: ``` bool CanBeReused(const torch::lazy::Value& input, const c10::optional<torch::lazy::Value>& weight, const c10::optional<torch::lazy::Value>& bias, const c10::optional<torch::lazy::Value>& running_mean, const c10::optional<torch::lazy::Value>& running_var, const bool& training, const double& momentum, const double& eps) const { size_t i = 0; std::cout << "Num operands: " << operands().size() << std::endl; return (operand(i++) == input && operand(i++) == weight.value_or(kNullValue) && operand(i++) == bias.value_or(kNullValue) && operand(i++) == running_mean.value_or(kNullValue) && operand(i++) == running_var.value_or(kNullValue) && this->training == training && this->momentum == momentum && this->eps == eps); } ``` Here we operate under the assumption that the number of operands stored in the `lazy::Node` is equal to the number of operands originally passed into the constructor. Recall that we drop any null operands though, so it’s possible to inadvertently access an invalid index at this point. This PR addresses this issue by adding a new nullable_operand method which falls back to a null value instead of producing an index error when going out of bounds. This should solve the issue found at https://github.com/pytorch/pytorch/pull/79637#issuecomment-1162044545 cc: @antoniojkim @ke1337 @wconstab @desertfire Pull Request resolved: https://github.com/pytorch/pytorch/pull/80060 Approved by: https://github.com/desertfire	2022-06-24 20:39:37 +00:00
Max Podkorytov	bf75708ce4	[static-runtime] add nnc codegen for aten::div (#76903 ) Differential Revision: D36151087 Pull Request resolved: https://github.com/pytorch/pytorch/pull/76903 Approved by: https://github.com/mikeiovine	2022-06-22 05:47:44 +00:00
Nikolay Korovaiko	efc7343743	Revert "Revert "Put symint overloads on a different name"" (#79680 ) This relands https://github.com/pytorch/pytorch/pull/79281 Pull Request resolved: https://github.com/pytorch/pytorch/pull/79680 Approved by: https://github.com/malfet	2022-06-21 07:06:33 +00:00
Shunting Zhang	0d909d3cff	add a new FunctionSchema kind called scratch Pull Request resolved: https://github.com/pytorch/pytorch/pull/79659 The scratch op expose intermetidate/scratch tensors used in kernel implementation as kernel input arguments so a memory planning algorithm can plan memory for those tensors. Differential Revision: [D37194429](https://our.internmc.facebook.com/intern/diff/D37194429/) Approved by: https://github.com/bdhirsh	2022-06-20 16:34:16 +00:00
Brian Hirsh	adf8060600	add a new alias key for functional to view op decompositions Pull Request resolved: https://github.com/pytorch/pytorch/pull/79615 Approved by: https://github.com/zou3519	2022-06-15 23:18:09 +00:00
PyTorch MergeBot	b9bb52d97b	Revert "Put symint overloads on a different name" This reverts commit `213a8fc992`. Reverted https://github.com/pytorch/pytorch/pull/79281 on behalf of https://github.com/bigfootjon due to Diff reverted internally	2022-06-15 17:15:21 +00:00
Edward Z. Yang	213a8fc992	Put symint overloads on a different name Due to implicit conversion shenanigans, having both IntArrayRef and SymIntArrayRef overloads makes {} ambiguous. While we could fix this by making a single unified type that accepts all the overloads we want, an easier fix was to just push the SymIntArrayRef overload to its own name. Signed-off-by: Edward Z. Yang <ezyangfb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/79281 Approved by: https://github.com/suo	2022-06-12 14:36:39 +00:00
anjali411	38350acf8f	Autogen Tags enum, and allow specifying tags while defining an op Pull Request resolved: https://github.com/pytorch/pytorch/pull/79322 Approved by: https://github.com/albanD	2022-06-11 00:29:32 +00:00
Mengwei Liu	24050a5801	[RFC][Codegen] Add custom namespace support (#78015 ) Summary: Adding a feature to allow user to specify namespaces for operator and kernels. # Feature There's a feature request to allow DSL to: 1. take in an operator namespace other than `aten`. 2. take in a kernel that is in a different namespace than `at::native`. For both features, we only allow user to have a single layer of namespace for the sake of simplicity. If user specify `custom::function` as kernel, the codegen will depend on `custom::native::function` where `native` is hardcoded. # Proposal For feature 1, add a `namespace` attribute to data class `NativeFunction`. The namespace will be extract out by matching pattern "::" on the `func` variable. For `NativeFunctionsGroup` there's an assumption that all variants (function, inplace, out) will have the same namespace. By default (if not specified) the namespace will be "aten". For feature 2, add a `namespace` attribute to `BackendMetadata` class, similarly match pattern "::" on the kernel field. Remove the `cpp_namespace` field from `register_dispatch_key` data class. By default (if not specified) the namespace for a kernel would be "at::native". Test Plan: Example yaml entries: ``` - func: custom::gelu.out(Tensor self, , str approximate='none', Tensor(a!) out) -> Tensor(a!) structured: True structured_inherits: TensorIteratorBase device_check: NoCheck # TensorIterator python_module: nn dispatch: CPU: custom::gelu_out_cpu CUDA: custom::gelu_out_cuda MPS: custom::gelu_out_mps - func: custom::gelu_(Tensor(a!) self, , str approximate='none') -> Tensor(a!) structured_delegate: gelu.out device_check: NoCheck # TensorIterator python_module: nn dispatch: NestedTensorCPU, NestedTensorCUDA: custom::NestedTensor_gelu_ - func: custom::gelu(Tensor self, , str approximate='none') -> Tensor structured_delegate: gelu.out device_check: NoCheck # TensorIterator python_module: nn dispatch: MkldnnCPU: custom::mkldnn_gelu QuantizedCPU: custom::gelu_quantized_cpu NestedTensorCPU, NestedTensorCUDA: custom::NestedTensor_gelu ``` see generated code: `RegisterCPU.cpp`: ``` TORCH_LIBRARY_IMPL(aten, CPU, m) { ... } TORCH_LIBRARY_IMPL(custom, CPU, m) { m.impl("gelu", TORCH_FN(wrapper_gelu)); m.impl("gelu.out", TORCH_FN(wrapper_gelu_out_out)); m.impl("gelu_", TORCH_FN(wrapper_gelu_)); }; ``` ``` struct structured_gelu_out_cpu_inplace final : public custom::native::structured_gelu_out_cpu { structured_gelu_out_cpu_inplace(Tensor& self) : outputs_{std::ref(self)} {} void set_output_strided( int64_t output_idx, IntArrayRef sizes, IntArrayRef strides, TensorOptions options, DimnameList names ) override { const auto& out = outputs_[output_idx].get(); check_inplace(out, sizes, options); auto maybe_proxy = maybe_create_proxy(out, sizes, strides, options); if (C10_UNLIKELY(maybe_proxy.has_value())) { proxy_outputs_[output_idx] = c10::ExclusivelyOwned<Tensor>(std::move(maybe_proxy).value()); } if (!names.empty()) { namedinference::propagate_names(outputs_[output_idx], names); } // super must happen after, so that downstream can use maybe_get_output // to retrieve the output custom::native::structured_gelu_out_cpu::set_output_raw_strided(output_idx, sizes, strides, options, names); } void set_output_raw_strided( int64_t output_idx, IntArrayRef sizes, IntArrayRef strides, TensorOptions options, DimnameList names ) override { const auto& out = outputs_[output_idx].get(); check_inplace(out, sizes, options); if (!names.empty()) { namedinference::propagate_names(outputs_[output_idx], names); } // super must happen after, so that downstream can use maybe_get_output // to retrieve the output custom::native::structured_gelu_out_cpu::set_output_raw_strided(output_idx, sizes, strides, options, names); } const Tensor& maybe_get_output(int64_t output_idx) override { return proxy_outputs_[output_idx].has_value() ? proxy_outputs_[output_idx] : outputs_[output_idx].get(); } std::array<std::reference_wrapper<Tensor>, 1> outputs_; std::array<c10::optional<c10::ExclusivelyOwned<Tensor>>, 1> proxy_outputs_; }; ``` `RegisterSchema.cpp` ``` TORCH_LIBRARY(aten, m) { ... } TORCH_LIBRARY(custom, m) { m.def("gelu.out(Tensor self, , str approximate='none', Tensor(a!) out) -> Tensor(a!)"); m.def("gelu_(Tensor(a!) self, , str approximate='none') -> Tensor(a!)"); m.def("gelu(Tensor self, , str approximate='none') -> Tensor"); }; ``` Differential Revision: D36558459 Pull Request resolved: https://github.com/pytorch/pytorch/pull/78015 Approved by: https://github.com/bdhirsh	2022-06-10 21:04:36 +00:00
Brian Hirsh	7b3a0ff87a	Port `index.Tensor` to structured kernels. Tracking issue: #55070 Pull Request resolved: https://github.com/pytorch/pytorch/pull/69607 Approved by: https://github.com/bdhirsh	2022-06-10 17:27:47 +00:00
George Qi	a90f006fe5	add strides to slow path Pull Request resolved: https://github.com/pytorch/pytorch/pull/78610 Approved by: https://github.com/ezyang	2022-06-10 16:59:14 +00:00
PyTorch MergeBot	4b82ef7928	Revert "Port `index.Tensor` to structured kernels." This reverts commit `cfd84125bd`. Reverted https://github.com/pytorch/pytorch/pull/69607 on behalf of https://github.com/zengk95 due to This is breaking mac trunk tests `cfd84125bd`	2022-06-08 20:16:10 +00:00
Brian Hirsh	cfd84125bd	Port `index.Tensor` to structured kernels. Tracking issue: #55070 Pull Request resolved: https://github.com/pytorch/pytorch/pull/69607 Approved by: https://github.com/bdhirsh	2022-06-08 18:17:52 +00:00
Richard Zou	9da5defff6	Package config/template files with torchgen (#78942 ) Package config/template files with torchgen This PR packages native_functions.yaml, tags.yaml and ATen/templates with torchgen. This PR: - adds a step to setup.py to copy the relevant files over into torchgen - adds a docstring for torchgen (so `import torchgen; help(torchgen)` says something) - adds a helper function in torchgen so you can get the torchgen root directory (and figure out where the packaged files are) - changes some scripts to explicitly pass the location of torchgen, which will be helpful for the first item in the Future section. Future ====== - torchgen, when invoked from the command line, should use sources in torchgen/packaged instead of aten/src. I'm unable to do this because people (aka PyTorch CI) invokes `python -m torchgen.gen` without installing torchgen. - the source of truth for all of these files should be in torchgen. This is a bit annoying to execute on due to potential merge conflicts and dealing with merge systems - CI and testing. The way things are set up right now is really fragile, we should have a CI job for torchgen. Test Plan ========= I ran the following locally: ``` python -m torchgen.gen -s torchgen/packaged ``` and verified that it outputted files. Furthermore, I did a setup.py install and checked that the files are actually being packaged with torchgen. Pull Request resolved: https://github.com/pytorch/pytorch/pull/78942 Approved by: https://github.com/ezyang	2022-06-07 13:33:55 +00:00
Sergii Dymchenko	0fdc1caf02	Cleanup some Python2-related code (#78864 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/78864 Approved by: https://github.com/janeyx99, https://github.com/jbschlosser	2022-06-06 17:40:02 +00:00
Brian Hirsh	67b27a7bae	generate kernels for codegend out= operators Pull Request resolved: https://github.com/pytorch/pytorch/pull/78626 Approved by: https://github.com/ezyang, https://github.com/JacobSzwejbka, https://github.com/larryliu0820	2022-06-06 15:36:28 +00:00
PyTorch MergeBot	bcb424c8cf	Fix #78675 Signed-off-by: Edward Z. Yang <ezyangfb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/78699 Approved by: https://github.com/tugsbayasgalan	2022-06-04 01:07:24 +00:00
Linbin Yu	1683a2618d	rename BUILD.buck to BUCK.oss (#78792 ) rename BUILD.buck to BUCK.oss to better reflect that it's the OSS version of BUCK build, not the one shared with Bazel Pull Request resolved: https://github.com/pytorch/pytorch/pull/78792 Approved by: https://github.com/kit1980	2022-06-03 07:23:16 +00:00
PyTorch MergeBot	954522a485	Revert "Autogen Tags enum, and allow specifying tags while defining an op" This reverts commit `9476a78f37`. Reverted https://github.com/pytorch/pytorch/pull/77313 on behalf of https://github.com/malfet due to Broke OSS buck builds, see `9476a78f37`	2022-06-03 01:53:53 +00:00
anjali411	9476a78f37	Autogen Tags enum, and allow specifying tags while defining an op Pull Request resolved: https://github.com/pytorch/pytorch/pull/77313 Approved by: https://github.com/ezyang, https://github.com/albanD	2022-06-03 01:13:44 +00:00
PyTorch MergeBot	fca1f495c2	Revert "Port `index.Tensor` to structured kernels." This reverts commit `9fe6f1baf5`. Reverted https://github.com/pytorch/pytorch/pull/69607 on behalf of https://github.com/suo due to this broke master, see: `9fe6f1baf5`	2022-06-01 00:12:15 +00:00
Brian Hirsh	9fe6f1baf5	Port `index.Tensor` to structured kernels. Tracking issue: #55070 Pull Request resolved: https://github.com/pytorch/pytorch/pull/69607 Approved by: https://github.com/bdhirsh	2022-05-31 22:15:20 +00:00
Antonio Kim	fe67dff82a	Deprecate `TSNodeLoweringInterface` (#78273 ) Fixes #78206 Deprecate `TSNodeLoweringInterface` and refactor lower functions into IR nodes. CC: @wconstab @desertfire Pull Request resolved: https://github.com/pytorch/pytorch/pull/78273 Approved by: https://github.com/wconstab	2022-05-31 18:09:12 +00:00
Brian Hirsh	92229adf0c	add special handling for resize_() in functionalization pass Pull Request resolved: https://github.com/pytorch/pytorch/pull/77714 Approved by: https://github.com/ezyang	2022-05-26 16:15:44 +00:00
Bin Bao	29189d2ba8	[LT] Add IR resuing support for manually-implemented ops Summary: Add CanBeReused methods for manually-implemented ops and replace MakeNode with ReuseOrMakeNode. Pull Request resolved: https://github.com/pytorch/pytorch/pull/77616 Approved by: https://github.com/JackCaoG, https://github.com/wconstab	2022-05-26 16:04:47 +00:00
Hui Guo	1803a592f4	[static_runtime] Add script to auto-generate view ops (#77105 ) Summary: Add script to go through view ops in "native_functions.yaml" and auto-register them into static runtime and auto-generate op unit tests for each. Overall there are 96 grouped view ops, among which 21 is already registered by hand; 9 (including sparse ops/training related ops etc.) are not the target of static runtime; 30 has list args or list ret; and 7 has non-basic types such as "Dimname", "MemoryFormat", etc. In summary, this script auto-generate 29 view ops for now. Run `buck run //caffe2/torch/fb/jit:gen_static_runtime_ops` to generate static runtime ops, and the results with this script are, ``` total grouped native ops: 1582 grouped native ops with out variant: 548 generated functions groups with out variant: 241 view grouped native ops: 96 generated functions view groups: 29 overall generated : 270 ``` The generated view ops are added in D36258968 Test Plan: Generate static runtime ops: `buck run //caffe2/torch/fb/jit:gen_static_runtime_ops` Unit tests: `buck run mode/opt //caffe2/benchmarks/static_runtime:static_runtime_cpptest` Differential Revision: D36258767 Pull Request resolved: https://github.com/pytorch/pytorch/pull/77105 Approved by: https://github.com/mikeiovine	2022-05-26 03:12:22 +00:00
Antonio Kim	02c4d877b4	Codegen Non-Native IR Nodes (#76535 ) Add codegen infrastructure to generate IR nodes for non-native ops. The proposed change is to add a `non_native` key to the `{backend}_native_functions.yaml` file that contains schema definitions similar to what is found in `native_functions.yaml`. e.g. ``` non_native: ... - func: expand(Tensor input, int[] size, bool is_scalar_expand) -> Tensor ... ``` these definitions are parsed into a `LazyIrSchema` that can be used for generating IR nodes using `GenLazyIR`. Fixes #74628 CC: @wconstab @desertfire @henrytwo Pull Request resolved: https://github.com/pytorch/pytorch/pull/76535 Approved by: https://github.com/wconstab	2022-05-24 19:29:23 +00:00
Brian Hirsh	7ddc1425ff	functionalization fix for inplace comparison ops Pull Request resolved: https://github.com/pytorch/pytorch/pull/77125 Approved by: https://github.com/ezyang	2022-05-24 18:20:31 +00:00
Brian Hirsh	22d566acda	functionalization fix for inplace_view ops Pull Request resolved: https://github.com/pytorch/pytorch/pull/77126 Approved by: https://github.com/ezyang	2022-05-24 18:20:30 +00:00
Mengwei Liu	9e806619cc	[Codegen] Remove view operator check in NativeFunctionGroups and allow skipping native function generation (#78145 ) Summary: This PR adds two features: * A boolean to turn off native function generation in codegen * Relaxing `view` operator check for `NativeFunctionGroups` Differential Revision: D36604646 Pull Request resolved: https://github.com/pytorch/pytorch/pull/78145 Approved by: https://github.com/iseeyuan, https://github.com/bdhirsh	2022-05-24 05:48:30 +00:00
Mengwei Liu	ffa3cce100	[Codegen] Expose namespace argument for static dispatch (#77710 ) For static dispatch we are hardcoding namespace to be `at` for backend-specific C++ functions, e.g., `at::cpu::add()`. We are extending it to accept namespaces from callsite. This is a temporary solution, in the long run we want to introduce custom namespace into codegen system, e.g., we should be able to add `at::` to `native_functions.yaml` and parse it into `NativeFunction`. This needs a bit more design. Pull Request resolved: https://github.com/pytorch/pytorch/pull/77710 Approved by: https://github.com/ezyang	2022-05-21 00:39:06 +00:00
John Clow	417373337f	Put imports in correct order so clang-format doesn't get mad every time Pull Request resolved: https://github.com/pytorch/pytorch/pull/77282 Approved by: https://github.com/Krovatkin	2022-05-20 18:39:47 +00:00
Brian Hirsh	0161e9eb00	[test] attempt to functionalize ops with mutable positional-only args Pull Request resolved: https://github.com/pytorch/pytorch/pull/76320 Approved by: https://github.com/ezyang	2022-05-19 18:50:34 +00:00
Edward Z. Yang	befa4e371e	Fix typo Fixes #77412 Signed-off-by: Edward Z. Yang <ezyangfb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/77488 Approved by: https://github.com/mruberry	2022-05-18 18:25:54 +00:00
Antonio Kim	55be35ae39	Fix 'Code below assumes there is at least one tensor arg' assumption (#76917 ) Previously when codegening ops like `zeros_` or `ones_` we'd hit a `Code below assumes there is at least one tensor arg error`. This check is not entirely correct which is what is causing the error to be thrown. There are ops like the ones mentioned that pass in a `device` parameter that can be used in place of the "first tensor". CC: @wconstab @desertfire @henrytwo @ke1337 Pull Request resolved: https://github.com/pytorch/pytorch/pull/76917 Approved by: https://github.com/desertfire	2022-05-18 17:58:47 +00:00
John Clow	2a99018147	Adding a way to register both upper and lower bound functions Pull Request resolved: https://github.com/pytorch/pytorch/pull/77388 Approved by: https://github.com/eellison	2022-05-18 17:34:07 +00:00
Brian Hirsh	edc904d6ba	add native view_copy.out ops, teach codegen about tensorlist out= Pull Request resolved: https://github.com/pytorch/pytorch/pull/76126 Approved by: https://github.com/ezyang	2022-05-18 14:23:43 +00:00
Yukio Siraichi	9d44250760	Reduce structured kernels' `set_output` boilerplate with new overloads. Partially fix #69813 This PR does mainly 3 things: 1. Introduces new methods for the `MetaBase` API: - `set_output_strided`: creates proxy tensors with exact strides, if strides don't match - `set_output_contiguous`: alias for `set_output_strided` with contiguous strides - `set_output_raw_strided`: does not create proxy tensors 2. Modifies codegen for handling proxy tensors: - Creates a new field for out-of-place kernels: `proxy_output_` - Implements `set_output_strided` by creating a proxy tensor if necessary - Passes the proxy tensor to them `IMPL` function - Copy the result back to the real output, in the end, whenever a proxy was created 3. Replace `set_output` by `set_output_raw_strided` for `TensorIterator*` - Needed, since it overrides `set_output` Pull Request resolved: https://github.com/pytorch/pytorch/pull/76096 Approved by: https://github.com/ezyang	2022-05-17 12:01:53 +00:00
Linbin Yu	1f8049566f	Re-land BUCK build for pytorch mobile (#77612 ) see https://github.com/pytorch/pytorch/pull/76480 fixed most lint errors Pull Request resolved: https://github.com/pytorch/pytorch/pull/77612 Approved by: https://github.com/kit1980	2022-05-17 00:30:13 +00:00
Bin Bao	25c6ebd12c	Revert "Revert "[LT] Codegen ReuseNode for supported ops"" Summary: Fixed a XLC build failure by generating an always-return-false default CanBeReused method. This reverts commit `3cade9d454`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/77513 Approved by: https://github.com/alanwaketan	2022-05-16 20:14:42 +00:00
PyTorch MergeBot	530481ed69	Revert "[mobile] add buck build for mobile targets (#76480 )" This reverts commit `168dc70faf`. Reverted https://github.com/pytorch/pytorch/pull/76480 on behalf of https://github.com/atalman	2022-05-16 16:14:17 +00:00
francescocastelli	dca416b578	Pretty-print dataclasses (#76810 ) Unfortunately the built-in pprint module support pretty-print of dataclasses only from python 3.10. The code that I wrote in method `__str__` of OpInfo should do the same job and should also work for any dataclass. For now I've put it there but we can create a function and put it somewhere where is accessible also for other dataclasses. Also the max width (80) is now hardcode but it would ideally be the parameter of the function. when you call print on an OpInfo you get: ``` OpInfo(name = '__getitem__', ref = None, aliases = (), variant_test_name = '', op = <slot wrapper '__getitem__' of 'torch._C._TensorBase' objects>, method_variant = <slot wrapper '__getitem__' of 'torch._C._TensorBase' objects>, inplace_variant = None, skips = (<torch.testing._internal.common_methods_invocations.DecorateInfo object at 0x7f463acbca90>, <torch.testing._internal.common_methods_invocations.DecorateInfo object at 0x7f463acbcae0>), decorators = (<torch.testing._internal.common_methods_invocations.DecorateInfo object at 0x7f463acbca90>, <torch.testing._internal.common_methods_invocations.DecorateInfo object at 0x7f463acbcae0>), sample_inputs_func = <function sample_inputs_getitem at 0x7f463acc6af0>, reference_inputs_func = None, error_inputs_func = None, sample_inputs_sparse_coo_func = <function _DecoratorContextManager.__call__.<locals>.decorate_context at 0x7f463acc6b80>, sample_inputs_sparse_csr_func = <function _DecoratorContextManager.__call__.<locals>.decorate_context at 0x7f463acc6c10>, dtypes = {torch.int16, torch.float64, torch.int32, torch.int64, torch.complex64, torch.float16, torch.bfloat16, torch.uint8, torch.complex128, torch.bool, torch.float32, torch.int8}, dtypesIfCUDA = {torch.int16, torch.float64, torch.int32, torch.int64, torch.complex64, torch.float16, torch.bfloat16, torch.uint8, torch.complex128, torch.bool, torch.float32, torch.int8}, dtypesIfROCM = {torch.int16, torch.float64, torch.int32, torch.int64, torch.complex64, torch.float16, torch.bfloat16, torch.uint8, torch.complex128, torch.bool, torch.float32, torch.int8}, backward_dtypes = {torch.int16, torch.float64, torch.int32, torch.int64, torch.complex64, torch.float16, torch.bfloat16, torch.uint8, torch.complex128, torch.bool, torch.float32, torch.int8}, backward_dtypesIfCUDA = {torch.int16, torch.float64, torch.int32, torch.int64, torch.complex64, torch.float16, torch.bfloat16, torch.uint8, torch.complex128, torch.bool, torch.float32, torch.int8}, backward_dtypesIfROCM = {torch.int16, torch.float64, torch.int32, torch.int64, torch.complex64, torch.float16, torch.bfloat16, torch.uint8, torch.complex128, torch.bool, torch.float32, torch.int8}, supports_out = False, supports_autograd = True, supports_gradgrad = True, supports_fwgrad_bwgrad = True, supports_inplace_autograd = False, supports_forward_ad = True, gradcheck_wrapper = <function OpInfo.<lambda> at 0x7f463a7a40d0>, check_batched_grad = True, check_batched_gradgrad = True, check_batched_forward_grad = True, check_inplace_batched_forward_grad = True, gradcheck_nondet_tol = 0.0, gradcheck_fast_mode = None, aten_name = '__getitem__', decomp_aten_name = None, aten_backward_name = None, assert_autodiffed = False, autodiff_nonfusible_nodes = ['aten::__getitem__'], autodiff_fusible_nodes = [], supports_sparse = False, supports_scripting = False, supports_sparse_csr = False, test_conjugated_samples = True, test_neg_view = True, assert_jit_shape_analysis = False, supports_expanded_weight = False) ``` cc @ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/76810 Approved by: https://github.com/ezyang	2022-05-16 14:20:41 +00:00
Linbin Yu	168dc70faf	[mobile] add buck build for mobile targets (#76480 ) Create buck targets to replicate internal BUCK build, including - XNNPACK - QNNPACK - C10 - aten_cpu - torch_mobile_core - torch_mobile_all_ops - ptmobile_benchmark And able to run mobilenet v2 using ptmobile_benchmark (with all ops). Pull Request resolved: https://github.com/pytorch/pytorch/pull/76480 Approved by: https://github.com/seemethere, https://github.com/dreiss	2022-05-15 18:42:41 +00:00
PyTorch MergeBot	3cade9d454	Revert "[LT] Codegen ReuseNode for supported ops" This reverts commit `6066e5929f`. Reverted https://github.com/pytorch/pytorch/pull/76738 on behalf of https://github.com/malfet	2022-05-14 00:33:10 +00:00
Bin Bao	6066e5929f	[LT] Codegen ReuseNode for supported ops Summary: 1. Update the codegen script to add a TrieCache lookup (ReuseNode) before creating a new IR node. The following is an example generated code, ``` at::Tensor LazyNativeFunctions::add(const at::Tensor & self, const at::Tensor & other, const at::Scalar & alpha) { ... torch::lazy::NodePtr node = torch::lazy::ReuseNode<AddTensor>(lazy_self->GetIrValue(), lazy_other->GetIrValue(), node_alpha); if (!node) { auto out_meta = at::meta::add(self, other, alpha); std::vector<Shape> shapes{Shape(out_meta.scalar_type(), out_meta.sizes().vec())}; TORCH_INTERNAL_ASSERT(shapes.size() == 1); if(symbolicShapeEnabled()){ std::vector<jit::IValue> inputs = { self, other, alpha }; char* schema_str = "aten::add.Tensor(Tensor self, Tensor other, *, Scalar alpha=1) -> Tensor"; applySymbolicShapesOnLT(schema_str, inputs, shapes); } node = torch::lazy::MakeNode<AddTensor>(lazy_self->GetIrValue(), lazy_other->GetIrValue(), node_alpha, std::move(shapes)); CacheNode(node); } ... } ``` 2. TrieCache lookup depends on each IR node subclass to provide its own comparison function. The following is an example generated code, ``` bool CanBeReused(const torch::lazy::Value& self, const torch::lazy::Value& other, const torch::lazy::Value& alpha) const { size_t i = 0; return (operand(i++) == self && operand(i++) == other && operand(i++) == alpha); } ``` 3. DeviceData is specially handled. 4. Non-codegen op changes are coming a separate PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/76738 Approved by: https://github.com/JackCaoG, https://github.com/wconstab	2022-05-13 19:13:58 +00:00
Kulin Seth	e011a8e18b	Enable PyTorch operations on MPS Backend. (#77343 ) Add PyTorch operations to MPS backend. - https://github.com/pytorch/pytorch/issues/77394 Pull Request resolved: https://github.com/pytorch/pytorch/pull/77343 Approved by: https://github.com/albanD	2022-05-13 18:28:53 +00:00
JackCaoG	e36a8c1f13	Lazy codegen change for xla (#76717 ) Codegen change to enable PyTorch/XLA to generate the first op in https://github.com/pytorch/xla/pull/3544. @bdhirsh @wconstab Pull Request resolved: https://github.com/pytorch/pytorch/pull/76717 Approved by: https://github.com/Krovatkin	2022-05-12 17:04:04 +00:00
Brian Hirsh	47dd092bae	add a new at::lift operator, fix torch.tensor for functionalization This reverts commit `85bd65a880`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/77285 Approved by: https://github.com/albanD, https://github.com/ezyang	2022-05-12 13:31:19 +00:00
PyTorch MergeBot	85bd65a880	Revert "[test] try to fix torch.tensor for functionalization" This reverts commit `9edee09ed6`. Reverted https://github.com/pytorch/pytorch/pull/76319 on behalf of https://github.com/janeyx99	2022-05-11 18:48:42 +00:00
Brian Hirsh	9edee09ed6	[test] try to fix torch.tensor for functionalization Pull Request resolved: https://github.com/pytorch/pytorch/pull/76319 Approved by: https://github.com/ezyang	2022-05-11 17:27:34 +00:00
Kulin Seth	f348b1b2b5	Add the Runtime components for MPS backend. (#76725 ) The PR adds the runtime components and few basic operations like copy, as_strided for MPS backend. Current list of identified TODOs are: - https://github.com/pytorch/pytorch/issues/77176 - Unify the logic with CUDACachingAllocator and remove redundant code. - https://github.com/pytorch/pytorch/issues/77170 - Look into using C++ smart pointers where possible with ObjC code - Use empty_strided_generic() to implement the `empty_strided_mps` code - https://github.com/pytorch/pytorch/issues/77144 Pull Request resolved: https://github.com/pytorch/pytorch/pull/76725 Approved by: https://github.com/albanD	2022-05-11 17:19:45 +00:00
Bin Bao	8f5cdc6d5d	Revert "Revert "[LT] Store OpKind for each IR subclass in a static field"" Summary: Re-land https://github.com/pytorch/pytorch/pull/76711 by fixing internal build errors. Generate class-level opkind as a static method instead of a static member. Pull Request resolved: https://github.com/pytorch/pytorch/pull/77102 Approved by: https://github.com/wconstab, https://github.com/JackCaoG, https://github.com/antoniojkim	2022-05-11 12:27:05 +00:00
PyTorch MergeBot	7eaf4780ba	Revert "[LT] Store OpKind for each IR subclass in a static field" This reverts commit `ac37ddc795`. Reverted https://github.com/pytorch/pytorch/pull/76711 on behalf of https://github.com/malfet	2022-05-09 20:50:09 +00:00
Nikolay Korovaiko	daf8c48a87	Revert "Revert "[WIP] customize the C++ class for valueT"" (#77003 ) This reverts commit `ec841b0346`. Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/77003 Approved by: https://github.com/shunting314, https://github.com/JackCaoG	2022-05-09 17:40:17 +00:00
PyTorch MergeBot	ec841b0346	Revert "[WIP] customize the C++ class for valueT" This reverts commit `c152817926`. Reverted https://github.com/pytorch/pytorch/pull/76911 on behalf of https://github.com/suo	2022-05-06 22:36:04 +00:00
Nikolay Korovaiko	c152817926	[WIP] customize the C++ class for valueT Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/76911 Approved by: https://github.com/wconstab	2022-05-06 21:05:35 +00:00
Bin Bao	ac37ddc795	[LT] Store OpKind for each IR subclass in a static field Summary: Currently OpKind is stored as an object field called op_ for each IR node, and one usage of op_ is to avoid dynamic_cast in NodeCast when we need to downcast a base-node pointer into a concrete sub-node pointer. As a result, we need to construct and pass in an op when downcasting nodes, and this becomes quite anonnying when we start to implement the trie-based IR node reusing. More importantly, the op for each subclass should be unique for that subclass and thus making it a const static field is a more logical design. In this PR, we still keep the object-level op_ for easier XLA adoption. As furture work, we can come back to remove op_, make the op() method virtual, and get rid of OpKind in all the node constructors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/76711 Approved by: https://github.com/wconstab, https://github.com/JackCaoG	2022-05-06 19:14:46 +00:00
Yukio Siraichi	fcf38a5812	Add support to `Tensor[]?` for structured kernel codegen. This PR turns the previously introduced `ITensorList` into a more general `IList` class. It is a container wrapper for arbitrary types (given their appropriate implementations). In summary, I have: - Renamed `ITensorList` (its iterators and macros, for consistency) to `IList` - Made `IList` a templated function (for an arbitrary type `T`), given that they: - Specialize `IListTagImpl<T, Tag>`, for all `IListTag` - Introduced type aliases (for both list and iterator types): - `at::ITensorList` -> `c10::IList<at::Tensor>` - `at::IOptTensorRefList` -> `c10::IList<at::OptionalTensorRef>` - Added support for `Tensor?[]` in the structured codegen Pull Request resolved: https://github.com/pytorch/pytorch/pull/69606 Approved by: https://github.com/ezyang	2022-05-06 14:24:18 +00:00
Peter Bell	6df5a53127	Fix unmarked fstring This error message currently prints the format string literally, because the string isn't marked with the `f`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/76841 Approved by: https://github.com/bdhirsh	2022-05-04 21:18:05 +00:00
Hui Guo	ca0f267022	[Static Runtime] [RFC] Codegen support for ops with unstructured kernels (#76203 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/76203 Request for comments: This change adds extra code generator support to generate out variant wrappers for operators with unstructured kernels. The current version generates 105 new out variant wrappers in addition to the existing 136 auto-generated out variants wrappers. This change shows that a simple tweak can increase the generated op coverage to 16% (241/1559) among all native ops described in native_functions.yaml no. matter if they are structured or not. Command to generate out variant wrappers. ``` buck run //caffe2/torch/fb/jit:gen_static_runtime_ops ``` - AFTER this change ``` total grouped native ops: 1559 structured grouped native ops: 545 generated grouped native ops: 241 ``` - BEFORE this change ``` total grouped native ops: 1503 structured grouped native ops: 540 generated grouped native ops: 136 ``` To enable CI tests and make it easier to review, the generated ops are added in a separate diff: D35945633 More details: We added a block list to remove the generation of around 10 operations that are deprecated or for which the unit test would fail. All generated ops are well compiled but the compiled unittest may not pass due to the lack of hand-picked test input values for certain ops. Among the 42 ops whose unittest does not pass, 1 (op "index_select") is repeated from the existing ops; 32 ops are fixed; and 9 ops are removed and blocked from generation because either it is not being commonly used in internal models such as "cholesky", "linalg_householder_product", sparse kernel "sspaddmm", or it causes some errors in static runtime such as "conj_physical" leads to an error in memory planner, and "binary_cross_entropy". Test Plan: OP generation: ```buck run //caffe2/torch/fb/jit:gen_static_runtime_ops``` Test generated ops: ```buck run mode/opt //caffe2/benchmarks/static_runtime:static_runtime_cpptest``` Reviewed By: tenpercent Differential Revision: D34913736 fbshipit-source-id: a6f408321653c3589ae1c76826177fc403d59c44 (cherry picked from commit 6f4501730478dbaeeea7f3ad4f9d29bf6787e7c1)	2022-05-04 19:34:19 +00:00
Michael Suo	fb0f285638	[lint] upgrade mypy to latest version Fixes https://github.com/pytorch/pytorch/issues/75927. Had to fix some bugs and add some ignores. To check if clean: ``` lintrunner --paths-cmd='git grep -Il .' --take MYPY,MYPYSTRICT ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/76753 Approved by: https://github.com/malfet	2022-05-03 20:51:34 +00:00
John Clow	db21e22b4b	[EASY] Quick Fix for broken shape function autogen. Pull Request resolved: https://github.com/pytorch/pytorch/pull/76703 Approved by: https://github.com/eellison	2022-05-03 17:34:05 +00:00
Bin Bao	f8a4780eb2	[LT] Move MakeNode into ir_builder.h Summary: Move MakeNode into ir_builder.h to avoid circular header reference later when introducing a trie cache for IR node lookup. Pull Request resolved: https://github.com/pytorch/pytorch/pull/76482 Approved by: https://github.com/wconstab	2022-05-03 14:53:19 +00:00
Will Constable	d0cb31d5bc	Make lazy tensor ptr class customizable (#76476 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/76476 Test Plan: Imported from OSS Reviewed By: Krovatkin, bdhirsh Differential Revision: D35980433 Pulled By: wconstab fbshipit-source-id: 1d4d00a494bf8aea86278b007f7f353cd7a822f8 (cherry picked from commit a78655bef23b5fa8487ced13443ca0bfdec65e5c)	2022-04-28 03:21:56 +00:00
Will Constable	4cae57080a	Make lazy tensor creation and value strings customizable (#76472 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/76472 - lets XLA backend customize codegenned functions during migration to LTC Test Plan: Imported from OSS Reviewed By: Krovatkin, bdhirsh Differential Revision: D35980435 Pulled By: wconstab fbshipit-source-id: 6138aef20862fccec40d715ffbb5a40a0a7d0401 (cherry picked from commit bad23f4b7ef73ffc2ef4a893364512907e9c4555)	2022-04-28 03:21:56 +00:00
Will Constable	cfc90cf3eb	Fix GenLazyIR.node_base_ctor_call (#76471 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/76471 Make node_base_ctor_call produce the entire node_bace_ctor_call. Previously it was only producing the beginning of the call, which was unintended. Addresses part of https://github.com/pytorch/xla/issues/3472 Test Plan: Imported from OSS Reviewed By: qihqi, ngimel Differential Revision: D35980436 Pulled By: wconstab fbshipit-source-id: a443cf593ac7c35b2b65e72b82907e88e1e71c7a (cherry picked from commit 360ad6d82a7e8303b8a60e61b177dabf0131ea8b)	2022-04-28 03:21:56 +00:00
anjali411	b204ad863f	Revert "Revert "Allow specifying tags for aten operators in native_functions.yaml"" This reverts commit `ea44645c9a`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/76456 Approved by: https://github.com/osalpekar	2022-04-28 02:04:57 +00:00
Brian Hirsh	40d96f0afd	Revert "functionalization: add support for zero_()" This reverts commit `7d44b3675b`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/76375 Approved by: https://github.com/datumbox, https://github.com/albanD	2022-04-26 19:27:27 +00:00
Edward Z. Yang	c2ae0b01c0	Reapply black for torchgen, this time with lint to fix! Signed-off-by: Edward Z. Yang <ezyangfb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/76359 Approved by: https://github.com/suo	2022-04-26 04:03:38 +00:00
Nikolay Korovaiko	bb60cac25a	E2E SymInt example narrow_copy This roughly corresponds to Goal 3.2 in https://docs.google.com/document/d/1iiLNwR5ohAsw_ymfnOpDsyF6L9RTUaHMpD8YLw-jxEw/edit# Namely: It adds the following: * SymbolicIntNode interface * LazySymbolicIntNode implementation * Lazy `narrow_copy` implementation * Need add support for SymInt in codegen * Test (below) ```cpp TEST(LazyDynamicOpsTest, NarrowCopy) { auto x = torch::rand({5, 10, 10}).to(kLazy); const size_t Y_DIM = 3; const size_t X_DIM_INDEX = 2; auto y = torch::rand({Y_DIM}).to(kLazy); auto ly = torch::lazy::TryGetLtcTensor(y); auto dim_node = MakeNode<SizeNode>(ly->GetIrValue(), 0); auto lmn = new torch::lazy::SymbolicIntNode(dim_node); auto z = x.narrow_copy(X_DIM_INDEX, 0, lmn->toSymInt()); AllClose(z.cpu(), x.cpu().narrow_copy(X_DIM_INDEX, 0, Y_DIM)); } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/75759 Approved by: https://github.com/wconstab	2022-04-26 02:40:27 +00:00
Brian Hirsh	640ce6bc9b	functionalization bugfix: using owning type when unwrapping tensors Pull Request resolved: https://github.com/pytorch/pytorch/pull/76125 Approved by: https://github.com/ezyang	2022-04-25 22:00:19 +00:00
Brian Hirsh	74e93f727a	remove _is_foreach_op codegen special cases, clean up mutable return type checks Pull Request resolved: https://github.com/pytorch/pytorch/pull/76190 Approved by: https://github.com/ezyang	2022-04-25 21:34:17 +00:00
Brian Hirsh	5da76acd1d	functionalization: add a copy() native function Pull Request resolved: https://github.com/pytorch/pytorch/pull/76083 Approved by: https://github.com/albanD	2022-04-25 21:31:48 +00:00
Brian Hirsh	7d44b3675b	functionalization: add support for zero_() Pull Request resolved: https://github.com/pytorch/pytorch/pull/75913 Approved by: https://github.com/albanD	2022-04-25 21:31:48 +00:00
Priya Ramani	f954c0a774	[Pytorch][4/4 Static dispatch] Support multiple backends with multiple kernels (#76059 ) Summary: - Supports multiple backends with multiple kernels in static dispatch - Refactor static dispatch generators Pull Request resolved: https://github.com/pytorch/pytorch/pull/76059 ghstack-source-id: 154735166 Test Plan: ``` (pytorch) ~/fbsource └─ $ buck build --config pt.enable_lightweight_dispatch=1 --config pt.static_dispatch_backend="CPU;QuantizedCPU;CompositeExplicitAutograd" //xplat/caffe2/fb/lite_predictor:lite_predictor_flatbuffer ``` Reviewed By: bdhirsh Differential Revision: D35727473 fbshipit-source-id: 986ba3390c6e585fcf8477b6d069720ee1fbc90b (cherry picked from commit 6473990c208a78879985e4cdfb50960f5727ad5e)	2022-04-25 21:18:08 +00:00
Edward Yang	36420b5e8c	Rename tools/codegen to torchgen (#76275 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/76275 In preparation for addressing https://github.com/pytorch/pytorch/issues/73212 Diff was generated with: ``` git mv tools/codegen torchgen git grep -l 'tools.codegen' \| xargs sed -i 's/tools.codegen/torchgen/g' sed -i "s/\${TOOLS_PATH}\/codegen/\${TORCH_ROOT}\/torchgen/g" caffe2/CMakeLists.txt ``` and a manual edits to: * tools/test/test_gen_backend_stubs.py * torchgen/build.bzl * torchgen/gen_backend_stubs.py aka this diff: ``` diff --git a/tools/test/test_gen_backend_stubs.py b/tools/test/test_gen_backend_stubs.py index 3dc26c6d2d..104054575e 100644 --- a/tools/test/test_gen_backend_stubs.py +++ b/tools/test/test_gen_backend_stubs.py @@ -9,7 +9,7 @@ from torchgen.gen_backend_stubs import run from torchgen.gen import _GLOBAL_PARSE_NATIVE_YAML_CACHE # noqa: F401 path = os.path.dirname(os.path.realpath(__file__)) -gen_backend_stubs_path = os.path.join(path, '../torchgen/gen_backend_stubs.py') +gen_backend_stubs_path = os.path.join(path, '../../torchgen/gen_backend_stubs.py') # gen_backend_stubs.py is an integration point that is called directly by external backends. # The tests here are to confirm that badly formed inputs result in reasonable error messages. diff --git a/torchgen/build.bzl b/torchgen/build.bzl index ed04e35a43..d00078a3cf 100644 --- a/torchgen/build.bzl +++ b/torchgen/build.bzl @@ -1,6 +1,6 @@ def define_targets(rules): rules.py_library( - name = "codegen", + name = "torchgen", srcs = rules.glob(["*/.py"]), deps = [ rules.requirement("PyYAML"), @@ -11,6 +11,6 @@ def define_targets(rules): rules.py_binary( name = "gen", - srcs = [":codegen"], + srcs = [":torchgen"], visibility = ["//visibility:public"], ) diff --git a/torchgen/gen_backend_stubs.py b/torchgen/gen_backend_stubs.py index c1a672a655..beee7a15e0 100644 --- a/torchgen/gen_backend_stubs.py +++ b/torchgen/gen_backend_stubs.py @@ -474,7 +474,7 @@ def run( ) -> None: # Assumes that this file lives at PYTORCH_ROOT/torchgen/gen_backend_stubs.py - pytorch_root = pathlib.Path(__file__).parent.parent.parent.absolute() + pytorch_root = pathlib.Path(__file__).parent.parent.absolute() template_dir = os.path.join(pytorch_root, "aten/src/ATen/templates") def make_file_manager(install_dir: str) -> FileManager: ``` run_all_fbandroid_tests Test Plan: sandcastle Reviewed By: albanD, ngimel Differential Revision: D35770317 fbshipit-source-id: 153ac4a7fef15b1e750812a90bfafdbc8f1ebcdf (cherry picked from commit c6d485d1d4648fa1c8a4c14c5bf3d8e899b9b4dd)	2022-04-25 01:38:06 +00:00

... 3 4 5 6 7

344 Commits