pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Xuehai Pan	8a67daf283	[BE][Easy] enable postponed annotations in `tools` (#129375 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/129375 Approved by: https://github.com/malfet	2024-06-29 09:23:35 +00:00
PyTorch MergeBot	a32ce5ce34	Revert "[BE][Easy] enable postponed annotations in `tools` (#129375 )" This reverts commit `59eb2897f1`. Reverted https://github.com/pytorch/pytorch/pull/129375 on behalf of https://github.com/huydhn due to Sorry for reverting your change but I need to revert to cleanly revert https://github.com/pytorch/pytorch/pull/129374, please do a rebase and reland this ([comment](https://github.com/pytorch/pytorch/pull/129375#issuecomment-2197800541))	2024-06-29 00:44:25 +00:00
Xuehai Pan	59eb2897f1	[BE][Easy] enable postponed annotations in `tools` (#129375 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/129375 Approved by: https://github.com/malfet	2024-06-28 15:37:54 +00:00
Stephen Jia	545d2126f6	[pt-vulkan] Enable Python code blocks in shader templates and upgrade shader template generation (#115948 ) Summary: This change makes two major improvements to PyTorch Vulkan's shader authoring workflow. ## Review Guide There are a lot of changed files because every GLSL shader had to be touched. The majority of changes is changing ``` #define PRECISION $precision #define FORMAT $format ``` to ``` #define PRECISION ${PRECISION} #define FORMAT ${FORMAT} ``` due to changes in how shader templates are processed. For reviewers, the primary functional changes to review are: * `gen_vulkan_spv.py` * Majority of functional changes are in this file, which controls how shader templates are processed. * `shader_params.yaml` * controls how shader variants are generated ## Python Codeblocks in Shader Templates From now on, every compute shader (i.e. `.glsl`) is treated as a shader template. To this effect, the `templates/` folder has been removed and there is now a global `shader_params.yaml` file to describe the shader variants that should be generated for all shader templates. Taking inspiration from XNNPACK's [`xngen` tool](https://github.com/google/XNNPACK/blob/master/tools/xngen.py), shader templates can now use Python codeblocks. One example is: ``` $if not INPLACE: layout(set = 0, binding = 0, FORMAT) uniform PRECISION restrict writeonly image3D uOutput; layout(set = 0, binding = 1) uniform PRECISION sampler3D uInput; layout(set = 0, binding = 2) uniform PRECISION sampler3D uOther; layout(set = 0, binding = 3) uniform PRECISION restrict Block { ivec4 output_sizes; ivec4 input_sizes; ivec4 other_sizes; float alpha; } uArgs; $else: layout(set = 0, binding = 0, FORMAT) uniform PRECISION restrict image3D uOutput; layout(set = 0, binding = 1) uniform PRECISION sampler3D uOther; layout(set = 0, binding = 2) uniform PRECISION restrict Block { ivec4 output_sizes; ivec4 other_sizes; float alpha; } uArgs; ``` Another is: ``` // PYTHON CODEBLOCK $if not IS_DIV: const int c_index = (pos.z % ((uArgs.output_sizes.z + 3) / 4)) * 4; if (uArgs.other_sizes.z != 1 && c_index + 3 >= uArgs.output_sizes.z) { ivec4 c_ind = ivec4(c_index) + ivec4(0, 1, 2, 3); vec4 mask = vec4(lessThan(c_ind, ivec4(uArgs.output_sizes.z))); other_texel = other_texel * mask + vec4(1, 1, 1, 1) - mask; } // PYTHON CODEBLOCK $if not INPLACE: ivec3 input_pos = map_output_pos_to_input_pos(pos, uArgs.output_sizes, uArgs.input_sizes); const vec4 in_texel = load_texel(input_pos, uArgs.output_sizes, uArgs.input_sizes, uInput); imageStore(uOutput, pos, OP(in_texel, other_texel, uArgs.alpha)); $else: const vec4 in_texel = imageLoad(uOutput, pos); imageStore(uOutput, pos, OP(in_texel, other_texel, uArgs.alpha)); ``` In addition to making it easier and clearer to write shader templates, this enables shaders that were previously unable to be consolidated into a single template to now be represented using a single template, such as non inplace and inplace variants of the same shader. ## `generate_variant_forall` in shader variant YAML configuration YAML files that describe how shader variants should be generated can now use a `generate_variant_forall` field to iterate over various settings for a specific parameter for each variant defined. Example: ``` unary_op: parameter_names_with_default_values: OPERATOR: exp(X) INPLACE: 0 generate_variant_forall: INPLACE: - VALUE: 0 SUFFIX: "" - VALUE: 1 SUFFIX: "inplace" shader_variants: - NAME: exp OPERATOR: exp(X) - NAME: sqrt OPERATOR: sqrt(X) - NAME: log OPERATOR: log(X) ``` Previously, the `inplace` variants would need to have separate `shader_variants` entries. If there are multiple variables that need to be iterated across, then all possible combinations will be generated. Would be good to take a look to see how the new YAML configuration works. Test Plan: There is no functional change to this diff; we only need to make sure that the generated shaders are still correct. Therefore, we only need to run `vulkan_api_test`. ``` # On Mac Laptop buck run --target-platforms ovr_config//platform/macos:arm64-fbsource //xplat/caffe2:pt_vulkan_api_test_binAppleMac\#macosx-arm64 -c pt.vulkan_full_precision=1 -- --gtest_filter="*" ``` Reviewed By: digantdesai Differential Revision: D52087084 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115948 Approved by: https://github.com/manuelcandales	2023-12-20 05:47:33 +00:00
Justin Chu	14d87bb5ff	[BE] Enable ruff's UP rules and autoformat tools and scripts (#105428 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105428 Approved by: https://github.com/albanD, https://github.com/soulitzer, https://github.com/malfet	2023-07-19 01:24:44 +00:00
Lucy Qiu	a4021af42e	[Pytorch] General broadcast for arithmetic operators (#104718 ) Summary: Currently, broadcast is supported for 4D tensors where, if the batch or channel dimensions are not equal, then the batch and channel of one tensor must both be 1, ie: ``` tensorA NCHW: 5, 2, 3, 3 tensorB NCHW: 1, 1, 3, 3 --> batch=1, channel=1 ``` This diff adds broadcast support for 4D tensors where the batch and channel of a tensor are different, ie: ``` tensorA NCHW: 5, 1, 3, 3 tensorB NCHW: 1, 5, 3, 3 ``` Broadcast rules: ``` - tensorA.dim()[x] = tensorB.dim()[x] - tensorA.dim()[x] == 1 \|\| tensorB.dim()[x] == 1 - tensorA.dim()[x] does not exist \|\| tensorB.dim()[x] does not exist ``` Broadcast method: 1. Pass `output`, `input` and `other` tensors to the shader 2. Iterate through the output texture to calculate the value of each texel (no repeating) 3. Mapping NHW positions: use modulo 4. Mapping C position: divide pos.z by ceil(C/4) to map to original tensor range --- Also some test refactoring to reduce repeated setup code. Test Plan: New tests: Add ``` [ RUN ] VulkanAPITest.add_broadcast5 [ OK ] VulkanAPITest.add_broadcast5 (0 ms) [ RUN ] VulkanAPITest.add_broadcast6 [ OK ] VulkanAPITest.add_broadcast6 (0 ms) ``` Sub ``` [ RUN ] VulkanAPITest.sub_broadcast5 [ OK ] VulkanAPITest.sub_broadcast5 (0 ms) [ RUN ] VulkanAPITest.sub_broadcast6 [ OK ] VulkanAPITest.sub_broadcast6 (0 ms) ``` Mul ``` [ RUN ] VulkanAPITest.mul_broadcast5 [ OK ] VulkanAPITest.mul_broadcast5 (1 ms) [ RUN ] VulkanAPITest.mul_broadcast6 [ OK ] VulkanAPITest.mul_broadcast6 (1 ms) ``` Div ``` [ RUN ] VulkanAPITest.div_broadcast5 [ OK ] VulkanAPITest.div_broadcast5 (1 ms) [ RUN ] VulkanAPITest.div_broadcast6 [ OK ] VulkanAPITest.div_broadcast6 (2 ms) ``` All tests: https://www.internalfb.com/phabricator/paste/view/P781794761 Run clang-format on glsl files and Arithmetic.cpp Differential Revision: D46874508 Pull Request resolved: https://github.com/pytorch/pytorch/pull/104718 Approved by: https://github.com/SS-JIA	2023-07-18 00:15:19 +00:00
salilsdesai	ec94cbc66a	[Vulkan] Remove GLSL Code Gen (#91912 ) @bypass-github-export-checks GLSL Code Gen is not used, so this diff removes - GLSL parts of ShaderSource - Anything enclosed by USE_VULKAN_SHADERC_RUNTIME, as well as the flag itself - gen_vulkan_glsl script Plus some additional refactoring Differential Revision: [D41358861](https://our.internmc.facebook.com/intern/diff/D41358861/) NOTE FOR REVIEWERS: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D41358861/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/91912 Approved by: https://github.com/mcr229	2023-01-10 20:29:47 +00:00
Kimish Patel	bd456fb549	[Pytorch][Vulkan] shader codegen use ordered dictionary (#89951 ) When not using ordered dictionary, it can result in parameter values have different order for each specialization. This can result shader names which are not consistent in their naming and meaning of the template parameter values that appear in the meaning of their names. For example if you have: conv2d_pw: default_values: - X: 1 - Y: 2 parameter_values: - Y: 3 Default parameter value can generate shader with 'my_shader_1x2' where 1x2 is for X, Y parameters respectively. Then, for non default values, of which there is only 1, we have Y=3 and with existing implementation you can end up genreating shader with 'my_shader_3x1'. Here 3 is for Y and 1 is for X. This leads to confusing shader names. THis diff fixes this by 1. using ordered dict. 2. non default values are updated by first copying default values and then updating them. Differential Revision: [D41006639](https://our.internmc.facebook.com/intern/diff/D41006639/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89951 Approved by: https://github.com/salilsdesai	2022-12-06 00:49:35 +00:00
Kimish Patel	893f8e3790	[PyTorch][Vulkan] Add template based codegen for shader generation (#88323 ) We would like to be able to parameterize kernels such that a parameterized algorithm can be implemented via templates. We can then profile performance of a kernel with different parameter values. This enables us to determine what parameters may work the best for a given kernel or a given device. In this diff one such kernel added in 1x1 conv which parameters across size of the tile being produced by each invocation. Few other options for parameters can be: - One can imagine dtype can also be a parameter such that we can do compute in fp16 or int8/int16. - Register blocking for input channels Differential Revision: [D40280336](https://our.internmc.facebook.com/intern/diff/D40280336/) NOTE FOR REVIEWERS: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D40280336/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/88323 Approved by: https://github.com/jmdetloff	2022-11-03 19:51:51 +00:00

9 Commits