mirror of
https://github.com/zebrajr/opencv.git
synced 2025-12-06 12:19:50 +01:00
dnn: added neon intrinsics implementation of fastGEMM1T function #27785 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [X] The PR is proposed to the proper branch - This PR improves the performance of the LSTM function on ARM64 targets. - Added a NEON intrinsics implementation of the fastGEMM1T function and enabled its use in fully connected and recurrent layers file. - As a result, ARM64 now benefits from vectorized matrix–vector multiplications, leading to measurable performance improvements in the LSTM layer. - This change is limited to ARM64 and does not affect other architectures. **Performance impact:** - The optimization significantly improves the performance of lstm functions on ARM64 targets. <img width="930" height="313" alt="image" src="https://github.com/user-attachments/assets/92e251cd-dc6c-4cda-9586-acc19bf16dfd" /> |
||
|---|---|---|
| .. | ||
| cmake | ||
| include/opencv2 | ||
| misc | ||
| perf | ||
| src | ||
| test | ||
| CMakeLists.txt | ||