pytorch/torchgen
Yutao Xu 79fb7416e7 [Intel GPU] Add device guard for XPU structured operator in torchgen (#138802)
This PR is a supplement to https://github.com/pytorch/pytorch/pull/133980. The previous PR fulfill the basic functionality of XPU device guard, while we found it fails to address structured operators.

With current PR, the code snippet in RegisterXPU.cpp is as follows, where we can see the device guard is successfully generated.

```c++
struct structured_exp_out_functional final : public at::native::structured_exp_out {
    void set_output_strided(
        int64_t output_idx, IntArrayRef sizes, IntArrayRef strides,
        TensorOptions options, DimnameList names
    ) override {
        auto current_device = guard_.current_device();
        if (C10_UNLIKELY(current_device.has_value())) {
          TORCH_INTERNAL_ASSERT(*current_device == options.device(),
            "structured kernels don't support multi-device outputs");
        } else {
          guard_.reset_device(options.device());
        }
        outputs_[output_idx] = create_out(sizes, strides, options);
        if (!names.empty()) {
          namedinference::propagate_names(outputs_[output_idx], names);
        }
        // super must happen after, so that downstream can use maybe_get_output
        // to retrieve the output
        at::native::structured_exp_out::set_output_raw_strided(output_idx, sizes, strides, options, names);
    }
    void set_output_raw_strided(
        int64_t output_idx, IntArrayRef sizes, IntArrayRef strides,
        TensorOptions options, DimnameList names
    ) override {
        auto current_device = guard_.current_device();
        if (C10_UNLIKELY(current_device.has_value())) {
          TORCH_INTERNAL_ASSERT(*current_device == options.device(),
            "structured kernels don't support multi-device outputs");
        } else {
          guard_.reset_device(options.device());
        }
        outputs_[output_idx] = create_out(sizes, strides, options);
        if (!names.empty()) {
          namedinference::propagate_names(outputs_[output_idx], names);
        }
        // super must happen after, so that downstream can use maybe_get_output
        // to retrieve the output
        at::native::structured_exp_out::set_output_raw_strided(output_idx, sizes, strides, options, names);
    }
    const Tensor& maybe_get_output(int64_t output_idx) override {
      return outputs_[output_idx];
    }
    std::array<Tensor, 1> outputs_;
    c10::OptionalDeviceGuard guard_;
};

```

However, without current change, the generated code is

```c++
struct structured_exp_out_functional final : public at::native::structured_exp_out {
    void set_output_strided(
        int64_t output_idx, IntArrayRef sizes, IntArrayRef strides,
        TensorOptions options, DimnameList names
    ) override {
        outputs_[output_idx] = create_out(sizes, strides, options);
        if (!names.empty()) {
          namedinference::propagate_names(outputs_[output_idx], names);
        }
        // super must happen after, so that downstream can use maybe_get_output
        // to retrieve the output
        at::native::structured_exp_out::set_output_raw_strided(output_idx, sizes, strides, options, names);
    }
    void set_output_raw_strided(
        int64_t output_idx, IntArrayRef sizes, IntArrayRef strides,
        TensorOptions options, DimnameList names
    ) override {
        outputs_[output_idx] = create_out(sizes, strides, options);
        if (!names.empty()) {
          namedinference::propagate_names(outputs_[output_idx], names);
        }
        // super must happen after, so that downstream can use maybe_get_output
        // to retrieve the output
        at::native::structured_exp_out::set_output_raw_strided(output_idx, sizes, strides, options, names);
    }
    const Tensor& maybe_get_output(int64_t output_idx) override {
      return outputs_[output_idx];
    }
    std::array<Tensor, 1> outputs_;
};
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138802
Approved by: https://github.com/EikanWang, https://github.com/guangyey, https://github.com/ezyang
2024-11-13 05:40:38 +00:00
..
_autoheuristic [BE] Format .ci/ / .github/ / benchmarks/ / functorch/ / tools/ / torchgen/ with ruff format (#132577) 2024-10-11 18:30:26 +00:00
aoti [aoti] Add masked_select to cshim (#139071) 2024-10-31 21:52:53 +00:00
api [BE] Format .ci/ / .github/ / benchmarks/ / functorch/ / tools/ / torchgen/ with ruff format (#132577) 2024-10-11 18:30:26 +00:00
decompositions [BE][Easy] eliminate relative import in torchgen (#128872) 2024-06-21 14:11:46 +00:00
dest [Intel GPU] Add device guard for XPU structured operator in torchgen (#138802) 2024-11-13 05:40:38 +00:00
executorch [Reland][7/N] Fix Wextra-semi warning (#140342) 2024-11-12 18:55:31 +00:00
fuse [BE] update type annotations for basic utilities in torch/__init__.py (#129001) 2024-06-24 18:04:38 +00:00
operator_versions [BE] Format .ci/ / .github/ / benchmarks/ / functorch/ / tools/ / torchgen/ with ruff format (#132577) 2024-10-11 18:30:26 +00:00
selective_build [BE][Easy] enable postponed annotations in torchgen (#129376) 2024-06-29 09:23:39 +00:00
shape_functions [BE][Easy] enable postponed annotations in torchgen (#129376) 2024-06-29 09:23:39 +00:00
static_runtime [6/N] Fix Wextra-semi warning (#139605) 2024-11-04 13:43:16 +00:00
__init__.py
BUCK.oss
BUILD.bazel
build.bzl
code_template.py [BE][Easy] enable postponed annotations in torchgen (#129376) 2024-06-29 09:23:39 +00:00
context.py [BE] Format .ci/ / .github/ / benchmarks/ / functorch/ / tools/ / torchgen/ with ruff format (#132577) 2024-10-11 18:30:26 +00:00
gen_aoti_c_shim.py [AOTI] Introduce an extensibility mechanism for the c shim codegen to make it easy to produce c shims for out-of-tree OP kernels as well. Add c_shim for XPU. (#136742) 2024-11-09 13:19:52 +00:00
gen_backend_stubs.py [Reland][7/N] Fix Wextra-semi warning (#140342) 2024-11-12 18:55:31 +00:00
gen_executorch.py [BE] Format .ci/ / .github/ / benchmarks/ / functorch/ / tools/ / torchgen/ with ruff format (#132577) 2024-10-11 18:30:26 +00:00
gen_functionalization_type.py [BE] Format .ci/ / .github/ / benchmarks/ / functorch/ / tools/ / torchgen/ with ruff format (#132577) 2024-10-11 18:30:26 +00:00
gen_lazy_tensor.py [BE] Format .ci/ / .github/ / benchmarks/ / functorch/ / tools/ / torchgen/ with ruff format (#132577) 2024-10-11 18:30:26 +00:00
gen_schema_utils.py [HOP] support generating schema for hop (#133521) 2024-08-21 17:34:21 +00:00
gen_vmap_plumbing.py Added batching rule for sdpa_math, sdpa_efficient_attention forward, cudnn, and flash attention (#133964) 2024-08-22 05:29:49 +00:00
gen.py [Reland][7/N] Fix Wextra-semi warning (#140342) 2024-11-12 18:55:31 +00:00
local.py [BE][Easy] enable postponed annotations in torchgen (#129376) 2024-06-29 09:23:39 +00:00
model.py [Intel GPU] Support RegisterSparseXPU.cpp codegen. (#139267) 2024-11-13 01:41:43 +00:00
native_function_generation.py [BE][Ez]: Use interned hardcoded string FURB156 (#138330) 2024-10-18 18:26:16 +00:00
utils.py [torchgen] reference generated comment to actual location of the generator and template (#130020) 2024-07-05 21:47:14 +00:00
yaml_utils.py