pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
cyy	bfeb45e46b	[17/N] Fix clang-tidy warnings in jit (#132753 ) Follows #132604 Pull Request resolved: https://github.com/pytorch/pytorch/pull/132753 Approved by: https://github.com/Skylion007	2024-08-07 03:47:54 +00:00
Shubhraprakash Das	646dd1ab8d	Rewrite quantized conv transpose2d for vulkan (#122547 ) Summary: Vulkan rewrite sp that quantized transpose 2d ops can run in a model Test Plan: Run vulkan api test: # buck2 build --target-platforms ovr_config//platform/macos:arm64-fbsource //xplat/caffe2:pt_vulkan_api_test_binAppleMac\#macosx-arm64 -c pt.vulkan_full_precision=1 --show-output" # buck-out//v2/gen/fbsource/xplat/caffe2/pt_vulkan_api_test_binAppleMac Running main() from third-party/googletest/1.14.0/googletest/googletest/src/gtest_main.cc [==========] Running 418 tests from 1 test suite. [----------] Global test environment set-up. [----------] 418 tests from VulkanAPITest .... [----------] Global test environment tear-down [==========] 418 tests from 1 test suite ran. (4510 ms total) [ PASSED ] 417 tests. [ SKIPPED ] 1 test, listed below: [ SKIPPED ] VulkanAPITest.querypool_flushed_shader_log YOU HAVE 9 DISABLED TESTS Run quantized vulkan api test: Note the linear quantized are failing but all the convolution tests still pass. Linear failures are being debugged. # buck2 build --target-platforms ovr_config//platform/macos:arm64-fbsource //xplat/caffe2:pt_vulkan_quantized_api_test_binAppleMac\#macosx-arm64 -c pt.vulkan_full_precision=1 --show-output" # buck-out//v2/gen/fbsource/xplat/caffe2/pt_vulkan_quantized_api_test_binAppleMac Running main() from third-party/googletest/1.14.0/googletest/googletest/src/gtest_main.cc [==========] Running 86 tests from 1 test suite. [----------] Global test environment set-up. [----------] 86 tests from VulkanAPITest ... [ PASSED ] 77 tests. [ FAILED ] 9 tests, listed below: [ FAILED ] VulkanAPITest.linear_2d_flat [ FAILED ] VulkanAPITest.linear_2d_small [ FAILED ] VulkanAPITest.linear_2d_large [ FAILED ] VulkanAPITest.linear_3d_flat [ FAILED ] VulkanAPITest.linear_3d_small [ FAILED ] VulkanAPITest.linear_3d_large [ FAILED ] VulkanAPITest.linear_4d_flat [ FAILED ] VulkanAPITest.linear_4d_small [ FAILED ] VulkanAPITest.linear_4d_large 9 FAILED TESTS YOU HAVE 8 DISABLED TESTS # Run CUNET quantized model on hibiki board. Reviewed By: manuelcandales Differential Revision: D52344263 Pull Request resolved: https://github.com/pytorch/pytorch/pull/122547 Approved by: https://github.com/manuelcandales, https://github.com/copyrightly, https://github.com/yipjustin	2024-03-28 18:51:44 +00:00
Wei Lu	a1b3b5748f	[Pytoch][Vulkan] Create context for conv1d (#117780 ) Summary: `conv1d` has two arguments `weight` and `bias` which are stored as constant tensors on the CPU and they are transferred to GPU at every inference call. We create a context for this operator to avoid the repeated passing. Specifically, we - created `Conv1dPackedContext`,`create_conv1d_context` and `run_layernorm_context` in `Convolution.h` and `Convolution.cpp` - registered them in `Register.cpp` - rewrote the graph representation of the op in `vulkan_rewrite.cpp` Test Plan: ## Numerical test ``` [luwei@82308.od /data/sandcastle/boxes/fbsource (8a8d911dc)]$ LD_LIBRARY_PATH=third-party/swiftshader/lib/linux-x64/ buck run fbcode/mode/dev-nosan //xplat/caffe2:pt_vulkan_api_test_bin -- --gtest_filter="conv1d" Buck UI: https://www.internalfb.com/buck2/7760800b-fd75-479a-9368-be5fcd5a7fef Network: Up: 0B Down: 0B Jobs completed: 4. Time elapsed: 0.6s. BUILD SUCCEEDED Running main() from third-party/googletest/1.14.0/googletest/googletest/src/gtest_main.cc Note: Google Test filter = conv1d [==========] Running 2 tests from 1 test suite. [----------] Global test environment set-up. [----------] 2 tests from VulkanAPITest [ RUN ] VulkanAPITest.conv1d_simple [ OK ] VulkanAPITest.conv1d_simple (159 ms) [ RUN ] VulkanAPITest.conv1d [ OK ] VulkanAPITest.conv1d (57 ms) [----------] 2 tests from VulkanAPITest (217 ms total) [----------] Global test environment tear-down [==========] 2 tests from 1 test suite ran. (217 ms total) [ PASSED ] 2 tests. ``` Full test result in P1053644934, summary as below ``` [----------] 419 tests from VulkanAPITest (28080 ms total) [----------] Global test environment tear-down [==========] 419 tests from 1 test suite ran. (28080 ms total) [ PASSED ] 418 tests. [ SKIPPED ] 1 test, listed below: [ SKIPPED ] VulkanAPITest.querypool_flushed_shader_log ``` ## Graph representation comparison We created a model using `conv1d` and traced it as below ``` # Define a simple model that uses conv1d class MyModel(torch.nn.Module): def __init__(self): super(MyModel, self).__init__() self.conv1d = nn.Conv1d(16, 33, 3) def forward(self, x): return self.conv1d(x) # Create an instance of the model model = MyModel() # Create a dummy input tensor for tracing input_tensor = torch.randn(20, 16, 50) # Use torch.jit.trace to trace the model and generate a graph traced_model = torch.jit.trace(model, input_tensor) ``` Then we converted the traced model to Vulkan backend using `optimize_for_mobile` ``` from torch.utils import mobile_optimizer vulkan_model = mobile_optimizer.optimize_for_mobile( traced_model, backend="vulkan", preserved_methods=to_preserve ) ``` Next we can print the graph of the `vulkan_model` as `print(vk_model.graph)` - before this diff: `conv1d` was used ``` graph(%self.1 : __torch__.___torch_mangle_16.MyModel, %x : Tensor): %60 : Device = prim::Constant[value="cpu"]() %self.conv1d.bias : Float(33, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]() %37 : bool = prim::Constant[value=0]() %36 : NoneType = prim::Constant() %59 : Device = prim::Constant[value="vulkan"]() %self.conv1d.weight : Float(33, 16, 3, strides=[48, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]() %7 : int = prim::Constant[value=1](), scope: __module.conv1d # /mnt/xarfuse/uid-23453/243f3953-seed-nspid4026532834_cgpid7972545-ns-4026532831/torch/nn/modules/conv.py:306:0 %18 : int[] = prim::Constant[value=[1]]() %19 : int[] = prim::Constant[value=[0]]() %39 : Tensor = aten::to(%x, %59, %36, %37, %37) %20 : Tensor = aten::conv1d(%39, %self.conv1d.weight, %self.conv1d.bias, %18, %19, %18, %7) %58 : Tensor = aten::to(%20, %60, %36, %37, %37) return (%58) ``` - after this diff: `conv1d` was replaced with `run_conv1d_context` ``` graph(%self.1 : __torch__.___torch_mangle_6.MyModel, %x : Tensor): %85 : Device = prim::Constant[value="cpu"]() %51 : bool = prim::Constant[value=0]() %50 : NoneType = prim::Constant() %84 : Device = prim::Constant[value="vulkan"]() %53 : Tensor = aten::to(%x, %84, %50, %51, %51) %prepack_folding_forward._jit_pass_packed_weight_0 : __torch__.torch.classes.vulkan.Conv1dPackedContext = prim::GetAttr[name="prepack_folding_forward._jit_pass_packed_weight_0"](%self.1) %22 : Tensor = vulkan_prepack::run_conv1d_context(%53, %prepack_folding_forward._jit_pass_packed_weight_0) %83 : Tensor = aten::to(%22, %85, %50, %51, %51) return (%83) ``` Differential Revision: D52865379 Pull Request resolved: https://github.com/pytorch/pytorch/pull/117780 Approved by: https://github.com/yipjustin	2024-01-20 02:35:32 +00:00
Wei Lu	34ea0a2bdc	[Pytoch][Vulkan] Create context for layernorm (#114701 ) Summary: `Layernorm` has two arguments weight and bias which are stored as constant tensors on the CPU and they are transferred to GPU at every inference call. We create a context for this op to avoid the repeated passing. Specifically, we - created `create_layernorm_context` and `run_layernorm_context` in `Layernorm.h` and `Layernorm.cpp` - registered them in `Register.cpp` - rewrote the graph representation of the op in `vulkan_rewrite.cpp` Test Plan: ## Numerical test ``` [luwei@devbig984.prn1 /data/users/luwei/fbsource (b6ccc956c)]$ LD_LIBRARY_PATH=third-party/swiftshader/lib/linux-x64/ buck run fbcode/mode/dev-nosan //xplat/caffe2:pt_vulkan_api_test_bin -- --gtest_filter="layer_norm" Recommended: For faster builds try buck2: replace 'buck' with 'buck2' NOTE: buck-out/ has changed: look for files in fbsource/buck-out/v2/ 'buck2 build --show-output //xplat/caffe2:pt_vulkan_api_test_bin' will print the new output paths. If you are building in fbsource//xplat and have questions, post in 'Cross Platform Dev Discussions': https://fb.workplace.com/groups/xplat.qa Targets matching .buckconfig buck2.supported_projects: {'//xplat/caffe2:pt_vulkan_api_test_bin': '//xplat'} To suppress this warning: touch ~/.config/.dont_hint_buck2 Building: finished in 0.1 sec (100%) 339/339 jobs, 0/339 updated Total time: 0.2 sec BUILD SUCCEEDED Running main() from third-party/googletest/1.14.0/googletest/googletest/src/gtest_main.cc Note: Google Test filter = layer_norm [==========] Running 10 tests from 1 test suite. [----------] Global test environment set-up. [----------] 10 tests from VulkanAPITest [ RUN ] VulkanAPITest.packed_layer_norm_2d [ OK ] VulkanAPITest.packed_layer_norm_2d (342 ms) [ RUN ] VulkanAPITest.packed_layer_norm_3d [ OK ] VulkanAPITest.packed_layer_norm_3d (284 ms) [ RUN ] VulkanAPITest.packed_layer_norm_4d [ OK ] VulkanAPITest.packed_layer_norm_4d (5 ms) [ RUN ] VulkanAPITest.layer_norm_invalid_inputs [ OK ] VulkanAPITest.layer_norm_invalid_inputs (28 ms) [ RUN ] VulkanAPITest.layer_norm_2d [ OK ] VulkanAPITest.layer_norm_2d (1 ms) [ RUN ] VulkanAPITest.layer_norm_3d [ OK ] VulkanAPITest.layer_norm_3d (2 ms) [ RUN ] VulkanAPITest.layer_norm_4d [ OK ] VulkanAPITest.layer_norm_4d (4 ms) [ RUN ] VulkanAPITest.native_layer_norm_2d [ OK ] VulkanAPITest.native_layer_norm_2d (1 ms) [ RUN ] VulkanAPITest.native_layer_norm_3d [ OK ] VulkanAPITest.native_layer_norm_3d (2 ms) [ RUN ] VulkanAPITest.native_layer_norm_4d [ OK ] VulkanAPITest.native_layer_norm_4d (6 ms) [----------] 10 tests from VulkanAPITest (679 ms total) [----------] Global test environment tear-down [==========] 10 tests from 1 test suite ran. (679 ms total) [ PASSED ] 10 tests. ``` Full test result in P888496077, summary as below ``` [----------] 419 tests from VulkanAPITest (21652 ms total) [----------] Global test environment tear-down [==========] 419 tests from 1 test suite ran. (21652 ms total) [ PASSED ] 418 tests. [ SKIPPED ] 1 test, listed below: [ SKIPPED ] VulkanAPITest.querypool_flushed_shader_log ``` ## Graph representation comparison We created a model using `layer_norm` and traced it as below ``` class MyModel(torch.nn.Module): def __init__(self): super(MyModel, self).__init__() self.layer_norm = torch.nn.LayerNorm(normalized_shape=10) def forward(self, x): return self.layer_norm(x) # Create an instance of the model model = MyModel() # Create a dummy input tensor for tracing input_tensor = torch.randn(1, 10) # Use torch.jit.trace to trace the model and generate a graph traced_model = torch.jit.trace(model, input_tensor) ``` Then we converted the traced model to Vulkan backend using `optimize_for_mobile` ``` from torch.utils import mobile_optimizer vulkan_model = mobile_optimizer.optimize_for_mobile( traced_model, backend="vulkan", preserved_methods=to_preserve ) ``` Then we can print the graph of the `vulkan_model` as `print(vk_model.graph)` - Before this diff ``` %4 : bool = prim::Constant[value=1](), scope: __module.layer_norm # /mnt/xarfuse/uid-602118/33e18f68-seed-nspid4026531836_cgpid32066351-ns-4026531840/torch/nn/functional.py:2546:0 %5 : float = prim::Constant[value=1.0000000000000001e-05](), scope: __module.layer_norm # /mnt/xarfuse/uid-602118/33e18f68-seed-nspid4026531836_cgpid32066351-ns-4026531840/torch/nn/functional.py:2546:0 %14 : int[] = prim::Constant[value=[10]]() %33 : Tensor = aten::to(%x, %53, %30, %31, %31) %10 : Tensor = aten::layer_norm(%33, %14, %self.layer_norm.weight, %self.layer_norm.bias, %5, %4), scope: __module.layer_norm # /mnt/xarfuse/uid-602118/33e18f68-seed-nspid4026531836_cgpid32066351-ns-4026531840/torch/nn/functional.py:2546:0 ``` - after this diff ``` %14 : int[] = prim::Constant[value=[10]]() %47 : Tensor = aten::to(%x, %78, %44, %45, %45) %16 : Tensor = vulkan_prepack::run_layernorm_context(%47, %14, %17) ``` Reviewed By: SS-JIA Differential Revision: D51530478 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114701 Approved by: https://github.com/yipjustin	2023-11-30 01:33:50 +00:00
cyy	226384b460	[2/N] Cleanup header inclusions in torch_cpu by iwyu (#109964 ) Further cleaning up of torch_cpu header inclusions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109964 Approved by: https://github.com/ezyang, https://github.com/Skylion007	2023-11-19 20:56:32 +00:00
Shubhraprakash Das	5c3955200c	Add linear quantize function to custom ops (#111148 ) Summary: Add linear quantize for vulkan to custom ops so it can be used from a model. Test Plan: buck2 run --target-platforms ovr_config//platform/macos:arm64-fbsource -c pt.vulkan_full_precision=1 //xplat/caffe2/fb/custom_ops/vulkan_quantized:pt_vulkan_quantized_test_binAppleMac\#macosx-arm64 [ OK ] VulkanAPITest.convert_qconv2d_context (135 ms) [ RUN ] VulkanAPITest.linear_2d [ OK ] VulkanAPITest.linear_2d (4 ms) [----------] 2 tests from VulkanAPITest (139 ms total) [----------] Global test environment tear-down [==========] 2 tests from 1 test suite ran. (139 ms total) [ PASSED ] 2 tests. ############################################################## buck2 build --target-platforms ovr_config//platform/macos:arm64-fbsource //xplat/caffe2:pt_vulkan_quantized_api_test_binAppleMac\#macosx-arm64 -c pt.vulkan_full_precision=1 --show-output" buck-out//v2/gen/fbsource/xplat/caffe2/pt_vulkan_quantized_api_test_binAppleMac [ OK ] VulkanAPITest.conv2d_pw_quantized_prepack_random_params_int8_int32 (11 ms) [ RUN ] VulkanAPITest.linear_2d_flat [ OK ] VulkanAPITest.linear_2d_flat (4 ms) [ RUN ] VulkanAPITest.linear_2d_small [ OK ] VulkanAPITest.linear_2d_small (1 ms) [ RUN ] VulkanAPITest.linear_2d_large [ OK ] VulkanAPITest.linear_2d_large (1 ms) [ RUN ] VulkanAPITest.linear_3d_flat [ OK ] VulkanAPITest.linear_3d_flat (2 ms) [ RUN ] VulkanAPITest.linear_3d_small [ OK ] VulkanAPITest.linear_3d_small (2 ms) [ RUN ] VulkanAPITest.linear_3d_large [ OK ] VulkanAPITest.linear_3d_large (1 ms) [ RUN ] VulkanAPITest.linear_4d_flat [ OK ] VulkanAPITest.linear_4d_flat (1 ms) [ RUN ] VulkanAPITest.linear_4d_small [ OK ] VulkanAPITest.linear_4d_small (1 ms) [ RUN ] VulkanAPITest.linear_4d_large [ OK ] VulkanAPITest.linear_4d_large (1 ms) [ RUN ] VulkanAPITest.linear_custom [ OK ] VulkanAPITest.linear_custom (0 ms) [----------] 76 tests from VulkanAPITest (1811 ms total) [----------] Global test environment tear-down [==========] 76 tests from 1 test suite ran. (1811 ms total) [ PASSED ] 76 tests. YOU HAVE 8 DISABLED TESTS ############################################################## buck2 run --target-platforms ovr_configplatform/macos:arm64-fbsourcexplat/caffe2:pt_vulkan_api_test_binAppleMac\#macosx-arm64 -c pt.vulkan_full_precision=1 [----------] Global test environment tear-down [==========] 346 tests from 1 test suite ran. (5648 ms total) [ PASSED ] 345 tests. [ SKIPPED ] 1 test, listed below: [ SKIPPED ] VulkanAPITest.querypool_flushed_shader_log YOU HAVE 5 DISABLED TESTS Reviewed By: manuelcandales Differential Revision: D49609985 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111148 Approved by: https://github.com/yipjustin	2023-10-16 21:47:09 +00:00
cyy	77f2883c41	[Reland2] fix missing-prototypes warnings in torch_cpu (Part 4) (#102228 ) This PR relands the changes introduced in PR https://github.com/pytorch/pytorch/pull/100849. The old PR turnd nnc_* functions into static. We now add declarations for them and hope that inter builds will pass. Pull Request resolved: https://github.com/pytorch/pytorch/pull/102228 Approved by: https://github.com/albanD	2023-06-02 22:04:44 +00:00
PyTorch MergeBot	32ce06a5ab	Revert "[Reland] fix missing-prototypes warnings in torch_cpu (Part 4) (#101949 )" This reverts commit `4f2c007a1b`. Reverted https://github.com/pytorch/pytorch/pull/101949 on behalf of https://github.com/osalpekar due to As noted in @izaitsevfb's comment, we are still seeing linker errors, this time due to `nnc_prepacked_linear_clamp_run` being made a static function. ([comment](https://github.com/pytorch/pytorch/pull/101949#issuecomment-1560226880))	2023-05-23 22:53:47 +00:00
cyy	4f2c007a1b	[Reland] fix missing-prototypes warnings in torch_cpu (Part 4) (#101949 ) This PR relands the changes introduced in PR #100849. The old PR turnd nnc_aten_embedding into a static function, however, it is actually used in torch/csrc/jit/tensorexpr/operators/misc.cpp. Pull Request resolved: https://github.com/pytorch/pytorch/pull/101949 Approved by: https://github.com/albanD	2023-05-22 10:53:07 +00:00
PyTorch MergeBot	498c34e8e8	Revert " fix missing-prototypes warnings in torch_cpu (Part 4) (#100849 )" This reverts commit `c2f28d1c1d`. Reverted https://github.com/pytorch/pytorch/pull/100849 on behalf of https://github.com/izaitsevfb due to fails internal Meta builds, including fbcode and android, see D46009888: ld.lld: error: undefined symbol: nnc_aten_embedding ([comment](https://github.com/pytorch/pytorch/pull/100849#issuecomment-1555105800))	2023-05-19 19:05:15 +00:00
cyy	c2f28d1c1d	fix missing-prototypes warnings in torch_cpu (Part 4) (#100849 ) This PR fixes more missing-prototypes violations in the torch_cpu source following PRs #100053, #100147 and #100245 Pull Request resolved: https://github.com/pytorch/pytorch/pull/100849 Approved by: https://github.com/albanD	2023-05-18 03:49:45 +00:00
Manuel Candales	cb23191523	[Vulkan] rewrite quantized add, mul, conv2d and conv2d_relu ops (#97468 ) Summary: This diffs registers the vulkan quantized binary ops (add/sub/mul/div), and adds graph rewrites for quantized add, mul, conv2d and conv2d_relu. The rewrites for conv2d and conv2d_relu make use of the convert_qconv2d_context introduced in D41595032 Test Plan: export quantized mcs model to vulkan Reviewed By: SS-JIA Differential Revision: D44189363 Pull Request resolved: https://github.com/pytorch/pytorch/pull/97468 Approved by: https://github.com/SS-JIA	2023-04-15 00:08:11 +00:00
Salil Desai	7bcc446ede	[Vulkan][Optimize for Mobile] Avoid dereferencing element [0] if the vector is empty (#92918 ) Summary: Avoid dereferencing element [0] if the vector is empty. ___ In ```transferInputOutputBackends```, one of the rewrite passes for Vulkan ```optimize_for_mobile```, an out of bounds access happens when trying to insert a backend transfer for an input if that input's ```uses()``` is empty. This diff corrects that issue. Test Plan: Run tests ___ Phabricator + CI Tests Reviewed By: SS-JIA Differential Revision: D41296037 Pull Request resolved: https://github.com/pytorch/pytorch/pull/92918 Approved by: https://github.com/SS-JIA, https://github.com/kirklandsign	2023-02-01 01:09:19 +00:00
ssjia	b78b8727ff	[vulkan] enable prepacking for Batchnorm op (#88433 ) Adds a `BatchNormPackedContext` so that the `batchnorm` op can use prepacking. Differential Revision: [D40721546](https://our.internmc.facebook.com/intern/diff/D40721546/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88433 Approved by: https://github.com/manuelcandales	2022-11-04 19:24:13 +00:00
Salil Desai	df1cc0ef47	[Vulkan] Add Vulkan Rewrite to Transfer Inputs and Outputs to Vulkan and CPU Backends Respectively (#87432 ) With this change, we don't have to manually invoke transferring input and output backends when we run vulkan models. Graph rewrite code based off of: - `32efff45ba (diff-a473bddb458dc24225866a45092d6eca064eddd256245d93020e48e216eee4d5R160-R179)` Differential Revision: [D39519168](https://our.internmc.facebook.com/intern/diff/D39519168/) NOTE FOR REVIEWERS: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D39519168/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/87432 Approved by: https://github.com/mcr229, https://github.com/digantdesai	2022-10-31 14:18:45 +00:00
Salil Desai	bc68625151	[Vulkan] Add support for Optimization Blocklist to Vulkan Rewrite (#87431 ) Optimization Blocklist will be used in a future diff (D40315730) to make the rewrite to transfer input/output backends optional Differential Revision: [D40315729](https://our.internmc.facebook.com/intern/diff/D40315729/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87431 Approved by: https://github.com/mcr229, https://github.com/digantdesai	2022-10-31 14:15:51 +00:00
Manuel Candales	da520a43f2	[Vulkan] Fix issues in GRU and LSTM (#83722 ) Summary: This diffs fixes several issues in GRU and LSTM vulkan ops: - Add create_gru_context and create_lstm_context to vulkanFoldPrePackingOps - Add filter to insertPrePackedGruOp and insertPrePackedLstmOp to avoid matching gru.data and lstm.data usages - Fixed output dimension of GRU and LSTM - Allowed batch_first to be false when batch=1 and seq=1 Test Plan: Check that optimize_for_mobile runs and correctly folds the create context ops ``` buck run :export_for_mobile ~/ferraris/ferraris.ptl ~/ferraris ``` Check that vulkan api tests are still passing ``` buck run //xplat/caffe2:pt_vulkan_api_test_binAppleMac\#macosx-arm64 ``` Reviewed By: SS-JIA Differential Revision: D38811967 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83722 Approved by: https://github.com/SS-JIA	2022-08-19 16:28:59 +00:00
Manuel Candales	9ec8d64d0c	[Vulkan] Implement packed contexts (#82730 ) Summary: High level description of this diff: - VulkanOpContext is eliminated - LinearPackedContext, Conv2dPackedContext, GruPackedContext & LstmPackedContext are introduced. - They are child classes of the virtual class VulkanPackedContext. - Their purpose is to pack and unpack the context for each of those ops. They unpack the context on serialization, and pack it on deserialization. - They are different than the old op specific context (LinearOpContext, Conv2dOpContext, etc) in two important ways: they only store the packed data and they do not contain the logic for running the op. (In this diff, the unpacked functions for LinearPackedContext and Conv2dPackedContext haven't been implemented yet, so, we are cheating by including a private unpacked_ list inside each; but in a future diff, unpack functionality will be implemented for those two classes, and that private list removed). - The old LinearOpContext, GruOpContext & LstmOpContext are completely eliminated. Conv2dOpContext is maintained for backwards compatibility, but it is just a wrapper around Conv2dPackedContext - A lot of code from Convolution.cpp was repeated in the files TransposeConvolution2d.cpp and QuantizedConvolution.cpp. Therefore the logic was combined, introducing transposed and quantized flags where appropriate and everything was moved to Convolution.cpp & Convolution.h - The top level convolution functions defined in Register.cpp are moved to Convolution.cpp Test Plan: Run vulkan_api_test - On Mac: ``` buck run //xplat/caffe2:pt_vulkan_api_test_binAppleMac ``` - On Android: ``` buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_api_test_binAndroid\#android-arm64 --show-output adb push buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_api_test adb shell "/data/local/tmp/vulkan_api_test" ``` Run vulkan_quantized_api_test - On Mac: ``` buck run //xplat/caffe2:pt_vulkan_quantized_api_test_binAppleMac ``` - On Android: ``` buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_quantized_api_test_binAndroid\#android-arm64 --show-output adb push buck-out/gen/xplat/caffe2/pt_vulkan_quantized_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_quantized_api_test adb shell "/data/local/tmp/vulkan_quantized_api_test" ``` Reviewed By: SS-JIA Differential Revision: D38363981 Pull Request resolved: https://github.com/pytorch/pytorch/pull/82730 Approved by: https://github.com/SS-JIA	2022-08-03 23:43:49 +00:00
Manuel Candales	bab1ea8592	[Vulkan] Optimize LSTM operator with pre-packing (#79702 ) Summary: Optimized LSTM operator by using pre-packing for weights and biases in the Vulkan GPU backend - The weights and biases are always on the CPU side by design. - The packed and unpacked data are stored in a VulkanOpContext - Ops: - `at::native::vulkan::ops::create_lstm_context`: Creates a VulkanOpContext object with the packed and unpacked data, and returns a pointer to it. - `at::native::vulkan::ops::run_lstm_context`: Takes in the three input vulkan tensors (input sequence, initial hidden state and initial cell state) and a pointer to the context, and runs the LSTM operation. - Registered the ops in [Register.cpp](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/vulkan/ops/Register.cpp). - Rewrote the subgraph function of LSTM in [vulkan_rewrite.cpp](https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/passes/vulkan_rewrite.cpp) so that `create_lstm_context` and `run_lstm_context` can be executed instead in the Vulkan GPU backend. - Added new test for the LSTM pre-packing and run ops: `lstm_prepack_success` Test Plan: buck run //xplat/caffe2:pt_vulkan_api_test_binAppleMac Reviewed By: SS-JIA Differential Revision: D37052597 Pull Request resolved: https://github.com/pytorch/pytorch/pull/79702 Approved by: https://github.com/SS-JIA	2022-06-24 04:52:37 +00:00
Manuel Candales	abe55562cc	[Vulkan] Replace Conv2dOpContext by VulkanOpContext (#78818 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/78818 ghstack-source-id: 158143558 Reviewed By: SS-JIA Differential Revision: D36674287 fbshipit-source-id: e6b5886468ea405d150aeaf117e4072a52e1bab2	2022-06-08 13:36:19 -07:00
Manuel Candales	95806fce8e	[Vulkan] Replace TransposeConvolution2dContext by VulkanOpContext (#78817 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/78817 ghstack-source-id: 158143557 Reviewed By: SS-JIA Differential Revision: D36674288 fbshipit-source-id: 92db11edbded078133792d9ed965925148be5743	2022-06-08 13:36:19 -07:00
Manuel Candales	65ecc02a39	[Vulkan] Replace GruOpContext by VulkanOpContext (#78816 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/78816 ghstack-source-id: 158143559 Reviewed By: SS-JIA Differential Revision: D36605274 fbshipit-source-id: e83a1dc0518dac5a576a51e7c2c3d82871904cec	2022-06-08 13:36:19 -07:00
Manuel Candales	96445b907e	[Vulkan] Replace LinearOpContext by VulkanOpContext (#78815 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/78815 ghstack-source-id: 158143561 Reviewed By: SS-JIA Differential Revision: D36605273 fbshipit-source-id: 8deb0e6f857117ab3753cf73ca67598315b8651e	2022-06-08 13:36:19 -07:00
Sangbaek Park	6e30d1c512	[Vulkan] Optimize GRU operator with pre-packing (#73599 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73599 Optimized GRU operator by using pre-packing for weights and biases in the Vulkan GPU backend: * The weights and biases are always on the CPU side by design. * To reduce the overhead by retrieving the weight and bias tensors every time, it is the best way to store them by pre-packing. * A custom op context `GruOpContext` (derived from `torch::jit::CustomClassHolder`) is created to hold both packed and unpacked data. It corresponds to the unpacked_ struct which represents the data needed to construct the op context. This data will be pre-packed and be stored in the packed_ struct. The constructor of the `GruOpContext` loads the data into the unpacked_ and packed_ structs. * `at::native::vulkan::ops::gru_prepack` and `at::native::vulkan::ops::gru_run` methods use the op context. The `gru_prepack` takes in whatever data is needed to construct the op context and returns a pointer to a created context. The `gru_run` takes input tensors and a pointer to the op context that uses the data stored in the context to process the inputs. * Lastly, we need to register the op context class and ops in [Register.cpp](`11dc158129/aten/src/ATen/native/vulkan/ops/Register.cpp`). And rewrite the subgraph function of GRU op in [vulkan_rewrite.cpp](`11dc158129/torch/csrc/jit/passes/vulkan_rewrite.cpp`) so that `gru_prepack` and `gru_run` ops can be executed instead in the Vulkan GPU backend. * To avoid `"Undefined symbols for architecture x86_64"` compiler error on the x86_64 platform, `c10::Dispatcher::callBoxed()` API is used to call `vulkan_prepack::gru_prepack` and `vulkan_prepack::gru_run` by name. Otherwise, the test methods can't resolve the symbols. * Added new tests for the GRU pre-packing and run operations: `gru_prepack_success` and gru_prepack_invalidinputs_exceptions` * To build your PyTorch OSS on your local machine: ``` python setup.py clean git submodule update --init --recursive USE_VULKAN=1 USE_VULKAN_FP16_INFERENCE=1 python3 setup.py install --cmake python setup.py develop && python -c "import torch" ``` * To run and dump a model containing GRU operators in Python: ``` import torch from torch.utils import mobile_optimizer model = torch.jit.load("Mclaren_traced.pt") vk_model = mobile_optimizer.optimize_for_mobile(model, backend="vulkan") print(vk_model.graph) ``` * The following torch scripts are the updated version by GRU pre-packing: ``` %15 : Tensor[] = prim::ListConstruct(%weight_ih_l0.1, %weight_hh_l0.1, %bias_ih_l0.1, %bias_hh_l0.1, %weight_ih_l1.1, %weight_hh_l1.1, %bias_ih_l1.1, %bias_hh_l1.1) %19 : __torch__.torch.classes.vulkan.GruOpContext = vulkan_prepack::gru_prepack(%15, %4, %5, %6, %3, %3, %4) %20 : Tensor, %21 : Tensor = vulkan_prepack::gru_run(%input.1, %hx.1, %19) %18 : (Tensor, Tensor) = prim::TupleConstruct(%21, %20) return (%18) ``` * This implementation has some limitations: * Tensor dim should be 3 for input sequence and hidden state. * has_biases=True * train=False * bidirectional=False * batch_first=True * dropout=0.0 * D=1 since bidirectional=False * N=1 (batch size) * L=1 (sequence length) Test Plan: Build & test on Android: ``` cd ~/fbsource buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_api_test_binAndroid\#android-arm64 --show-output adb push buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_api_test adb shell "/data/local/tmp/vulkan_api_test" ``` Build & test on MacOS (x86_64): ``` cd ~/fbsource buck build //xplat/caffe2:pt_vulkan_api_test_binAppleMac ./buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAppleMac\#macosx-x86_64 ``` Test result on Android (Google Pixel 5): ``` Running main() from gtest_main.cc [==========] Running 4 tests from 1 test case. [----------] Global test environment set-up. [----------] 4 tests from VulkanAPITest [ RUN ] VulkanAPITest.gru_mclareninputs_success [ OK ] VulkanAPITest.gru_mclareninputs_success (1037 ms) [ RUN ] VulkanAPITest.gru_invalidinputs_exceptions [ OK ] VulkanAPITest.gru_invalidinputs_exceptions (16 ms) [ RUN ] VulkanAPITest.gru_prepack_success [ OK ] VulkanAPITest.gru_prepack_success (45 ms) [ RUN ] VulkanAPITest.gru_prepack_invalidinputs_exceptions [ OK ] VulkanAPITest.gru_prepack_invalidinputs_exceptions (16 ms) [----------] 4 tests from VulkanAPITest (1114 ms total) [----------] Global test environment tear-down [==========] 4 tests from 1 test case ran. (1114 ms total) [ PASSED ] 4 tests. ``` Test result on MacOS (x86_64): ``` Running main() from gtest_main.cc [==========] Running 4 tests from 1 test case. [----------] Global test environment set-up. [----------] 4 tests from VulkanAPITest [ RUN ] VulkanAPITest.gru_mclareninputs_success [ OK ] VulkanAPITest.gru_mclareninputs_success (1012 ms) [ RUN ] VulkanAPITest.gru_invalidinputs_exceptions [ OK ] VulkanAPITest.gru_invalidinputs_exceptions (40 ms) [ RUN ] VulkanAPITest.gru_prepack_success [ OK ] VulkanAPITest.gru_prepack_success (99 ms) [ RUN ] VulkanAPITest.gru_prepack_invalidinputs_exceptions [ OK ] VulkanAPITest.gru_prepack_invalidinputs_exceptions (39 ms) [----------] 4 tests from VulkanAPITest (1190 ms total) [----------] Global test environment tear-down [==========] 4 tests from 1 test case ran. (1190 ms total) [ PASSED ] 4 tests. ``` Reviewed By: SS-JIA Differential Revision: D34556940 fbshipit-source-id: dce918de238fb8a4a0ea5e966e05ca99ed910c28 (cherry picked from commit cd1d95ff8d0fa7810cf18a54ba64539e46daa26a)	2022-03-17 15:31:26 +00:00
Stephen Jia	cfc389f496	[vulkan] Enable Pytorch Vulkan to build in FBCode (#73872 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73872 This diff adds an equivalent target for [`aten_vulkan`](https://fburl.com/code/h9ybej5u) in FBCode as the `ATen-vulkan` target. This diff simply creates equivalent fbcode targets for all the xplat targets needed to build `aten_vulkan`: The following targets in `xplat/caffe2` have had equivalent targets created in `fbcode/caffe2/aten` * `aten_vulkan_glsl_src_path` * filegroup containing all Vulkan glsl files * `gen_aten_vulkan_spv_lib` * python library containing script to generate vulkan spv files * `gen_aten_vulkan_spv_bin` * python binary wrapping the above target * `gen_aten_vulkan_spv` * genrule to execute the above python script and create C++ headers containing the SPIR-V shader code * `generated_aten_headers_vulkan` * C++ library that points to the generated SPIR-V headers from above * `aten_vulkan` * Contains the Pytorch Vulkan backend FBCode targets have also been added for: * `Vulkan-Headers` which contains Vulkan API function signatures * `vulkan_wrapper` which loads the vulkan library * `dotslash:glslc` which wraps the glsl compiler in a target that can be executed by genrules Test Plan: Try building the new `ATen-vulkan` target: ``` cd fbsource/fbcode/caffe2/aten buck build :ATen-vulkan ``` Also tested in the next diff which tries to use this target in a Python script in FBCode. Reviewed By: beback4u Differential Revision: D34647445 fbshipit-source-id: 7330df1e3858c88b934b06e8e75f4fdcfa88068e (cherry picked from commit 25251bed83e97bb9ef96a5f611c6ed72ba4219fc)	2022-03-11 03:58:21 +00:00
Stephen Jia	b3e3eb9935	[vulkan] Remove dead code from previous Vulkan backend (#73243 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73243 The previous version of the Vulkan backend is no longer being used anymore. Delete the dead code from the codebase. Test Plan: Make sure everything still builds. Reviewed By: beback4u Differential Revision: D34400045 fbshipit-source-id: ae2a61452bf9199c11d81cc0369de8a9dd6692b1 (cherry picked from commit 22ee917f05f36ed16226dbe79da7892426eb09a2)	2022-02-24 03:30:04 +00:00
Peter Bell	ba8d5f6f75	[JIT] FuseLinear pass now handles CallFunction("linear", ...) (#61646 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61646 There are several passes which are written to handle both `CallFunction("linear", ...)` and `aten::linear(...)` despite the two being functionally identical. This changes `FuseLinear` to alse normalize the `CallFunction` variant to `aten::linear`. That way each subsequent transformation only has to handle one form instead of both. Test Plan: Imported from OSS Reviewed By: mikaylagawarecki Differential Revision: D33754261 Pulled By: albanD fbshipit-source-id: 42465cea790538481efc881a249dafdda4bba5d4 (cherry picked from commit `ebeca9434c`)	2022-02-01 16:59:26 +00:00
Sicheng Stephen Jia	ec6b472e0a	[vulkan] Add prepacking for conv2d_transpose (#67358 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67358 Test Plan: Imported from OSS Reviewed By: beback4u Differential Revision: D31970903 Pulled By: SS-JIA fbshipit-source-id: 128deb40dc14fb97aa61af9cddab4540b630359e	2021-11-01 17:59:32 -07:00
Ivan Kobzarev	73fbf43684	[vulkan] Fix asserts (#61495 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61495 Test Plan: Imported from OSS Reviewed By: SS-JIA Differential Revision: D29647357 Pulled By: IvanKobzarev fbshipit-source-id: cb4ba15f28625ea6e667883c9a2d31eba48b6f37	2021-07-20 16:07:13 -07:00
Ivan Kobzarev	82052b0a76	[vulkan] Remove constant duplication for Vulkan optimize_for_mobile (#59276 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59276 Test Plan: Imported from OSS Reviewed By: cccclai, ngimel Differential Revision: D28814072 Pulled By: IvanKobzarev fbshipit-source-id: d5cfd1352a2e07cdd4708d19fe4320444521db78	2021-06-02 11:38:18 -07:00
Ivan Kobzarev	0917061f43	[vulkan][jit_pass] Add optimized_for_vulkan attribute on vulkan pass (#56414 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56414 Test Plan: Imported from OSS Reviewed By: xta0 Differential Revision: D27865144 Pulled By: IvanKobzarev fbshipit-source-id: c59f0eb2722f3fce0a91d9bd0b7cae3e0436c496	2021-04-19 17:27:59 -07:00
Tao Xu	5748cc0d11	[Mobile GPU] Ban mutations in JIT passes (#56070 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56070 Summary Currently, we're returning copies instead of alias on mobile GPU (Metal/Vulkan). As suggested by ailzhang , we could use the JIT pass - `RemoveTensorMutation` to ban mutations ahead of time. I've tested two scenarios as shown below. They both work fine on mobile. - view ``` class Model (torch.nn.Module): def forward(self, x): y = x.view(-1) z = torch.tensor(2.0).float() y.add_(z) return x m = Model() x = torch.rand(2, 3) y = m(x) ``` - transpose ``` class Model (torch.nn.Module): def forward(self, x): y = x.transpose(1, 2) z = torch.tensor(2.0).float() x.add_(z) return y m = Model() x = torch.rand(1, 2, 3) y = m(x) ``` As we're adding more ops, we should add more tests to cover all the alias ops - https://github.com/pytorch/pytorch/blob/master/tools/autograd/gen_inplace_or_view_type.py#L31-L80 Next step Synced offline with eellison. Since mutation removal is also being used in ONNX, Static runtime, some jit optimizations, Torch -> TVM, etc, instead of inventing something new, we would continue to make it better in cases where it fails. Although this JIT pass could work for most of the mobile models, there are cases that it can't cover. What we're going to do next is to implement stub ops for GPU models to let them run on server side, such that users can compare results to see if there is any discrepancy. ghstack-source-id: 126802123 Test Plan: - Sandcastle - CircleCI Reviewed By: raziel Differential Revision: D27692683 fbshipit-source-id: 9d1be8a6c0a276032b1907807a54fbe2afd882f9	2021-04-19 10:43:53 -07:00
Stephen Jia	cfe3defd88	[vulkan] Enable prepacked addmm/mm for linear layers (#47815 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47815 Test Plan: Imported from OSS Reviewed By: IvanKobzarev Differential Revision: D24908605 Pulled By: SS-JIA fbshipit-source-id: e658bc2dbf23d5d911b979d3b8f467508f2fdf0c	2020-11-12 08:04:01 -08:00
Ivan Kobzarev	e9941a5dd4	[vulkan][py] torch.utils.optimize_for_vulkan (#44903 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44903 Test Plan: Imported from OSS Reviewed By: kimishpatel Differential Revision: D23766039 Pulled By: IvanKobzarev fbshipit-source-id: dbdf484ee7d3a7719aab105efba51b92ebc51568	2020-09-18 18:20:11 -07:00
Zhang, Xiaobing	87c5f02f3d	jit: Conv3d + BatchNorm3d fusion (#40082 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40082 Differential Revision: D22120340 Pulled By: jerryzh168 fbshipit-source-id: fce6c5f03fe7ab6c60620cbdf547d5a466a470e3	2020-06-22 11:15:52 -07:00
Ivan Kobzarev	3852215170	[vulkan] jit passes for vulkan conv2 prepack and fuse with clamp (#39282 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39282 Test Plan: Imported from OSS Differential Revision: D21962424 Pulled By: IvanKobzarev fbshipit-source-id: 2d20e827d2c3836b7e6b443293377c68dc1ffa5a	2020-06-20 14:12:21 -07:00

36 Commits