pytorch/cmake
Stephen Jia 545d2126f6 [pt-vulkan] Enable Python code blocks in shader templates and upgrade shader template generation (#115948)
Summary:
This change makes two major improvements to PyTorch Vulkan's shader authoring workflow.

## Review Guide

There are a lot of changed files because every GLSL shader had to be touched. The majority of changes is changing

```
#define PRECISION $precision
#define FORMAT $format
```

to

```
#define PRECISION ${PRECISION}
#define FORMAT ${FORMAT}
```

due to changes in how shader templates are processed.

For reviewers, the primary functional changes to review are:

* `gen_vulkan_spv.py`
  * Majority of functional changes are in this file, which controls how shader templates are processed.
* `shader_params.yaml`
  * controls how shader variants are generated

## Python Codeblocks in Shader Templates

From now on, every compute shader (i.e. `.glsl`) is treated as a shader template. To this effect, the `templates/` folder has been removed and there is now a global `shader_params.yaml` file to describe the shader variants that should be generated for all shader templates.

**Taking inspiration from XNNPACK's [`xngen` tool](https://github.com/google/XNNPACK/blob/master/tools/xngen.py), shader templates can now use Python codeblocks**.  One example is:

```
$if not INPLACE:
  layout(set = 0, binding = 0, FORMAT) uniform PRECISION restrict writeonly image3D uOutput;
  layout(set = 0, binding = 1) uniform PRECISION sampler3D uInput;
  layout(set = 0, binding = 2) uniform PRECISION sampler3D uOther;
  layout(set = 0, binding = 3) uniform PRECISION restrict Block {
    ivec4 output_sizes;
    ivec4 input_sizes;
    ivec4 other_sizes;
    float alpha;
  }
  uArgs;
$else:
  layout(set = 0, binding = 0, FORMAT) uniform PRECISION restrict image3D uOutput;
  layout(set = 0, binding = 1) uniform PRECISION sampler3D uOther;
  layout(set = 0, binding = 2) uniform PRECISION restrict Block {
    ivec4 output_sizes;
    ivec4 other_sizes;
    float alpha;
  }
  uArgs;
```

Another is:

```
  // PYTHON CODEBLOCK
  $if not IS_DIV:
    const int c_index = (pos.z % ((uArgs.output_sizes.z + 3) / 4)) * 4;
    if (uArgs.other_sizes.z != 1 && c_index + 3 >= uArgs.output_sizes.z) {
      ivec4 c_ind = ivec4(c_index) + ivec4(0, 1, 2, 3);
      vec4 mask = vec4(lessThan(c_ind, ivec4(uArgs.output_sizes.z)));
      other_texel = other_texel * mask + vec4(1, 1, 1, 1) - mask;
    }

  // PYTHON CODEBLOCK
  $if not INPLACE:
    ivec3 input_pos =
        map_output_pos_to_input_pos(pos, uArgs.output_sizes, uArgs.input_sizes);
    const vec4 in_texel =
        load_texel(input_pos, uArgs.output_sizes, uArgs.input_sizes, uInput);

    imageStore(uOutput, pos, OP(in_texel, other_texel, uArgs.alpha));
  $else:
    const vec4 in_texel = imageLoad(uOutput, pos);
    imageStore(uOutput, pos, OP(in_texel, other_texel, uArgs.alpha));
```

In addition to making it easier and clearer to write shader templates, this enables shaders that were previously unable to be consolidated into a single template to now be represented using a single template, such as non inplace and inplace variants of the same shader.

## `generate_variant_forall` in shader variant YAML configuration

YAML files that describe how shader variants should be generated can now use a `generate_variant_forall` field to iterate over various settings for a specific parameter for each variant defined. Example:

```
unary_op:
  parameter_names_with_default_values:
    OPERATOR: exp(X)
    INPLACE: 0
  generate_variant_forall:
    INPLACE:
      - VALUE: 0
        SUFFIX: ""
      - VALUE: 1
        SUFFIX: "inplace"
  shader_variants:
    - NAME: exp
      OPERATOR: exp(X)
    - NAME: sqrt
      OPERATOR: sqrt(X)
    - NAME: log
      OPERATOR: log(X)
```

Previously, the `inplace` variants would need to have separate `shader_variants` entries. If there are multiple variables that need to be iterated across, then all possible combinations will be generated. Would be good to take a look to see how the new YAML configuration works.

Test Plan:
There is no functional change to this diff; we only need to make sure that the generated shaders are still correct. Therefore, we only need to run `vulkan_api_test`.

```
# On Mac Laptop
buck run --target-platforms ovr_config//platform/macos:arm64-fbsource //xplat/caffe2:pt_vulkan_api_test_binAppleMac\#macosx-arm64 -c pt.vulkan_full_precision=1 -- --gtest_filter="*"
```

Reviewed By: digantdesai

Differential Revision: D52087084

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115948
Approved by: https://github.com/manuelcandales
2023-12-20 05:47:33 +00:00
..
External Revert "Initial Flash Attention support on ROCM (#114309)" (#115975) 2023-12-16 03:40:14 +00:00
Modules [cmake] set 'mcpu=generic' as the default build flag for mkldnn on aarch64 (#113820) 2023-11-22 02:49:33 +00:00
Modules_CUDA_fix Add 9.0a to cpp_extension supported compute archs (#110587) 2023-10-05 17:41:06 +00:00
public Revert "[ROCm] add hipblaslt support (#114329)" 2023-12-19 01:04:58 +00:00
Allowlist.cmake
BuildVariables.cmake
Caffe2Config.cmake.in Don't find MKL if it isn't used (#109426) 2023-09-17 03:39:39 +00:00
CheckAbi.cmake remove abi uncertainty and potential abi conflict (#94306) 2023-02-09 09:54:04 +00:00
cmake_uninstall.cmake.in
Codegen.cmake [Cmake] Check that gcc-9.4 or newer is used (#112858) 2023-11-06 17:19:53 +00:00
DebugHelper.cmake
Dependencies.cmake Revert "[ROCm] add hipblaslt support (#114329)" 2023-12-19 01:04:58 +00:00
FlatBuffers.cmake [pytorch][PR] Add ability for a mobile::Module to save as flatbuffer (#70201) 2022-01-12 16:30:39 -08:00
GoogleTestPatch.cmake Simplify cmake code (#91546) 2023-02-08 01:05:19 +00:00
IncludeSource.cpp.in
iOS.cmake [executorch] Update iOS toolchain with a modern cmake syntax. (#115799) 2023-12-15 00:51:30 +00:00
Metal.cmake [CI] Compile on M1 natively (#95719) 2023-03-01 04:20:42 +00:00
MiscCheck.cmake [BE] Cleanup CMake flag suppressions (#97584) 2023-03-27 18:46:09 +00:00
ProtoBuf.cmake [BE] Cleanup CMake flag suppressions (#97584) 2023-03-27 18:46:09 +00:00
ProtoBufPatch.cmake Migrate PyTorch to C++17 (#85969) 2022-12-08 02:27:48 +00:00
Summary.cmake Revert "Initial Flash Attention support on ROCM (#114309)" (#115975) 2023-12-16 03:40:14 +00:00
TorchConfig.cmake.in Revert "[Reland2] Update NVTX to NVTX3 (#109843)" 2023-12-05 16:10:20 +00:00
TorchConfigVersion.cmake.in
VulkanCodegen.cmake [pt-vulkan] Enable Python code blocks in shader templates and upgrade shader template generation (#115948) 2023-12-20 05:47:33 +00:00
VulkanDependencies.cmake [Vulkan] Remove GLSL Code Gen (#91912) 2023-01-10 20:29:47 +00:00