opencv/modules/dnn
pratham-mcw 8f3976ae97
Merge pull request #27785 from pratham-mcw:dnn-lstm-neon
dnn: added neon intrinsics implementation of fastGEMM1T function #27785

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [X] The PR is proposed to the proper branch

- This PR improves the performance of the LSTM function on ARM64 targets.
- Added a NEON intrinsics implementation of the fastGEMM1T function and enabled its use in fully connected and recurrent layers file. 
- As a result, ARM64 now benefits from vectorized matrix–vector multiplications, leading to measurable performance improvements in the LSTM layer.
- This change is limited to ARM64 and does not affect other architectures.

**Performance impact:**
- The optimization significantly improves the performance of lstm functions on ARM64 targets.
<img width="930" height="313" alt="image" src="https://github.com/user-attachments/assets/92e251cd-dc6c-4cda-9586-acc19bf16dfd" />
2025-10-03 10:50:50 +03:00
..
cmake Add Definition "_USE_MATH_DEFINES" for dnn plugin on Win32 build 2024-04-07 21:08:09 +09:00
include/opencv2 pre: OpenCV 4.12.0 (version++). 2025-06-19 11:03:59 +03:00
misc Add Java wrapper support for List<List<MatShape>> 2025-08-25 02:09:19 +09:00
perf Merge pull request #26127 from alexlyulkov:al/blob-from-images 2024-12-23 10:04:34 +03:00
src Merge pull request #27785 from pratham-mcw:dnn-lstm-neon 2025-10-03 10:50:50 +03:00
test Higher threshold for ViT on OpenVINO 2025-05-21 09:31:40 +03:00
CMakeLists.txt Merge pull request #27785 from pratham-mcw:dnn-lstm-neon 2025-10-03 10:50:50 +03:00