Compare commits

...

306 Commits

Author SHA1 Message Date
Alexander Smorkalov
b3cb517cac
Merge pull request #27385 from CodeLinaro:doc_update
Updating doc markdown to include API in FastCV gemm HAL
2025-05-30 15:24:15 +03:00
Liane Lin
8a0ea789e7
Merge pull request #27149 from liane-lin:4.x
Fix #25696: Solved the problem in Subdiv2D, empty delaunay triangulation #27149

Detailed description

Expected behaviour:
Given 4 points, where no three points are collinear, the Delaunay Triangulation Algorithm should return 2 triangles.

Actual:
The algorithm returns zero triangles in this particular case.

Fix:
The radius of the circumcircle tends to infinity when the points are closer to form collinear points, so the problem occurs because the super-triangles are not large enough,
which then results in certain edges are not swapped. The proposed solution just increases the super triangle, duplicating the value of constant for example.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-05-30 15:20:01 +03:00
Aditi Sharma
d26aa99a05 Updating doc markdown to include API in FastCV gemm HAL 2025-05-30 17:01:07 +05:30
Alexander Smorkalov
9f7793cdae
Merge pull request #27375 from CodeLinaro:doc_update
Updating doc markdown to include newly added FastCV HAL and Extension…
2025-05-28 16:06:42 +03:00
Aditi Sharma
47d0bedeb2 Updating doc markdown to include newly added FastCV HAL and Extension APIs 2025-05-28 15:43:04 +05:30
Myron Rodrigues
344f8c6400
Merge pull request #27363 from MRo47:openvino-npu-support
Feature: Add OpenVINO NPU support #27363

## Why
- OpenVINO now supports inference on integrated NPU devices in intel's Core Ultra series processors.
- Sometimes as fast as GPU, but should use considerably less power.

## How
- The NPU plugin is now available as "NPU" in openvino `ov::Core::get_available_devices()`.
- Removed the guards and checks for NPU in available targets for Inference Engine backend.

## Test example

### Pre-requisites
- Intel [Core Ultra series processor](https://www.intel.com/content/www/us/en/products/details/processors/core-ultra/edge.html#tab-blade-1-0)
- [Intel NPU driver](https://github.com/intel/linux-npu-driver/releases)
- OpenVINO 2023.3.0+ (Tested on 2025.1.0)

### Example
```cpp
#include <opencv2/dnn.hpp>
#include <iostream>

int main(){
    cv::dnn::Net net = cv::dnn::readNet("../yolov8s-openvino/yolov8s.xml", "../yolov8s-openvino/yolov8s.bin");
    cv::Size net_input_shape = cv::Size(640, 480);
    std::cout << "Setting backend to DNN_BACKEND_INFERENCE_ENGINE and target to DNN_TARGET_NPU" << std::endl;
    net.setPreferableBackend(cv::dnn::DNN_BACKEND_INFERENCE_ENGINE);
    net.setPreferableTarget(cv::dnn::DNN_TARGET_NPU);

    cv::Mat image(net_input_shape, CV_8UC3);
    cv::randu(image, cv::Scalar(0, 0, 0), cv::Scalar(255, 255, 255));
    cv::Mat blob = cv::dnn::blobFromImage(
        image, 1, net_input_shape, cv::Scalar(0, 0, 0), true, false, CV_32F);
    net.setInput(blob);
    std::cout << "Running forward" << std::endl;
    cv::Mat result = net.forward();
    std::cout << "Output shape: " << result.size << std::endl; // Output shape: 1 x 84 x 6300
}
```

model files [here](https://limewire.com/d/bPgiA#BhUeSTBnMc)

docker image used to build opencv: [ghcr.io/mro47/opencv-builder](https://github.com/MRo47/opencv-builder/blob/main/Dockerfile)

Closes #26240

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-05-27 14:13:49 +03:00
Alexander Smorkalov
a23baceb06
Merge pull request #27367 from CodeLinaro:xuezha_markdown
update building_fastcv.markdown
2025-05-27 14:10:36 +03:00
Xue Zhang
17f399f475 update building_fastcv.markdown 2025-05-27 13:34:29 +05:30
Alexander Smorkalov
575e3adc01
Merge pull request #27345 from asmorkalov:as/android_sdk_fastcv
Fixed FastCV linkage in OpenCV4AndroidSDK
2025-05-25 10:13:00 +03:00
Alexander Smorkalov
7578007d23
Merge pull request #27356 from asmorkalov:as/ipp_hal_copyright
Added default copyright header to IPP HAL
2025-05-24 18:54:27 +03:00
Alexander Smorkalov
0806124ac7 Try to add FastCV to OpenCV4AndroidSDK 2025-05-24 18:49:11 +03:00
Alexander Smorkalov
b4944c9375 Added default copyright header to IPP HAL. 2025-05-24 16:57:49 +03:00
Alexander Smorkalov
0a5352ee27
Merge pull request #27346 from asmorkalov:as/ipp_hal_sum
New HAL entry for cv::sum and IPP adoption
2025-05-24 16:53:42 +03:00
Myron Rodrigues
374ad41420
Merge pull request #27353 from MRo47:fix/segfault-on-forward#27352
Fix #27352: Add checks before getting latest pin in Net::Impl::getLatestLayerPin() #27353 

### Pull Request Readiness Checklist

Fixes #27352 

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-05-24 10:07:45 +03:00
Maxim Smolskiy
023d14ecc4
Merge pull request #27347 from MaximSmolskiy:improve-solveCubic-accuracy
Improve solveCubic accuracy #27347

### Pull Request Readiness Checklist

Fix #27323 

```
2e-13 * x^3 + x^2 - 2 * x + 1 = 0 -> x^3 + 5e12 * x^2 - 1e13 * x + 5e12 = 0
```

The problem that coefficients have quite big magnitudes and current calculations are subject to round-off error

```
Q = (a1 * a1 - 3 * a2) * (1./9)
R = (2 * a1 * a1 * a1 - 9 * a1 * a2 + 27 * a3) * (1./54)
Qcubed = Q * Q * Q = a1^6/729 - (a1^4 a2)/81 + (a1^2 a2^2)/27 - a2^3/27
R * R = R^2 = a1^6/729 - (a1^4 a2)/81 + (a1^2 a2^2)/36 + (a1^3 a3)/27 - (a1 a2 a3)/6 + a3^2/4
d = Qcubed - R * R
```

Let `a1`, `a2`, `a3` have quite big same magnitudes, then we see that `Qcubed` and `R * R` have same terms `a1^6/729` and `-(a1^4 a2)/81` (which will be reduced in `d`), but they level out the other terms (these terms have `6`th and `5`th degree and other terms - less or equal than `4`th degree).
So, if these terms will participate in the calculation, this will lead to a huge round-off error.
But if we expand the expression, then round-off error should be less
```
d = Qcubed - R * R = 1/108 (a1^2 a2^2 - 4 a2^3 - 4 a1^3 a3 + 18 a1 a2 a3 - 27 a3^2)
```

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2025-05-24 09:56:48 +03:00
Alexander Smorkalov
388b6dd81f New HAL entry for cv::sum and IPP adoption. 2025-05-23 12:35:11 +03:00
Alexander Smorkalov
b8099d3cc2
Merge pull request #27348 from fengyuentau:4x/hal/riscv_rvv/faster_div_f32
hal/riscv-rvv: further optimize div
2025-05-23 11:31:16 +03:00
Alexander Smorkalov
5a457842f1
Merge pull request #27351 from mshabunin:fix-intrin-legacy-ops-2
core: legacy intrin operators - fixed version warning condition
2025-05-23 07:50:39 +03:00
Maksim Shabunin
e6fb6c290c core: legacy intrin operators - fixed version warning condition 2025-05-22 21:00:49 +03:00
Yuantao Feng
2c4eab0969 perf: speed up vfdiv by Newton-Raphson routine 2025-05-22 12:58:40 +08:00
Alexander Smorkalov
aee828ac6e
Merge pull request #27344 from asmorkalov:as/kleidicv_no_cv_namespace
Disabled cv namespace usage inside KleidiCV.
2025-05-21 16:13:48 +03:00
Yuantao Feng
c37f54aeed
Merge pull request #27343 from fengyuentau:4x/build/fix_more_warnings
build: fix more warnings from recent gcc versions after #27337 #27343

More fixings after https://github.com/opencv/opencv/pull/27337

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-05-21 16:12:09 +03:00
Alexander Smorkalov
7ecb1d8cab Disabled cv namespace usage inside KleidiCV. 2025-05-21 13:04:23 +03:00
omahs
0bc95d9256
Merge pull request #27338 from omahs:patch-1
Fix typos #27338

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-05-21 12:13:50 +03:00
Alexander Smorkalov
dc610867e1
Merge pull request #27342 from MaximSmolskiy:fix-bug-in-solvePoly-test
Fix bug in solvePoly test
2025-05-21 11:19:01 +03:00
Alexander Smorkalov
5177c4a25a
Merge pull request #27341 from dkurt:vit_ov_test
Higher threshold for ViT on OpenVINO
2025-05-21 11:00:52 +03:00
Alexander Smorkalov
4530206445
Merge pull request #27340 from asmorkalov:apreetam_5thPost
Update hash for the fastcv libs for both Linux and Android #27340

Replaces https://github.com/opencv/opencv/pull/27290
Updated libs PR: https://github.com/opencv/opencv_3rdparty/pull/95

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-05-21 10:24:13 +03:00
MaximSmolskiy
d4c4493413 Fix bug in solvePoly test 2025-05-21 09:31:43 +03:00
Dmitry Kurtaev
3ff6c7f9fe
Higher threshold for ViT on OpenVINO 2025-05-21 09:31:40 +03:00
Yuantao Feng
166f76d224
Merge pull request #27337 from fengyuentau:4x/build/riscv/fix_warnings
build: fix warnings from recent gcc versions #27337

This PR addresses the following found warnings:
- [x] -Wmaybe-uninitialized
- [x] -Wunused-variable
- [x] -Wsign-compare

Tested building with GCC 14.2 (RISC-V 64).

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-05-21 09:28:29 +03:00
Gursimar Singh
c3fe92d813
Merge pull request #27270 from gursimarsingh:bug_fix_unstable_crf
Bug fix unstable crf #27270

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake

The PR resolves the issue for triangle Weights used by debevec algorithm being non zero at extremes. 
It resolves #24966 

The fix needs ground truth data to be changed in order to pass existing tests. PR to opencv_extra: https://github.com/opencv/opencv_extra/pull/1253
2025-05-21 08:40:11 +03:00
Maxim Smolskiy
d00738d97c
Merge pull request #27331 from MaximSmolskiy:add-test-for-solveCubic
Add tests for solveCubic #27331

### Pull Request Readiness Checklist

Related to #27323 

I found only randomized tests with number of roots always equal to `1` or `3`, `x^3 = 0` and some simple test for Java and Swift.
Obviously, they don't cover all cases (implementation has strong branching and number of roots can be equal to `-1`, `0` and `2` additionally).
So, I think it will be useful to try explicitly cover more cases (and implementation branches correspondingly)

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2025-05-21 08:36:35 +03:00
Alexander Smorkalov
79a5e5276a
Merge pull request #27334 from fengyuentau:4x/imgproc/compareHist_chisqr_simd
imgproc: vectorize mode CHISQR and CHISQR_ALT in compareHist
2025-05-21 07:07:57 +03:00
Yuantao Feng
9b08167769
hal/imgproc: add hal for calcHist and implement in hal:riscv-rvv (#27332)
hal/imgproc: add hal for calcHist and implement in hal/riscv-rvv #27332

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-05-21 07:07:22 +03:00
Alexander Smorkalov
23f8e523a0
Merge pull request #27325 from VadimLevin/dev:vlevin/header-parser-conditional-inclusion-directives-handling
feat: add conditional inclusion support to header parser
2025-05-20 18:11:12 +03:00
Yuantao Feng
7fe8ce19d9 perf: vectorize mode CHISQR and CHISQR_ALT in compareHist 2025-05-19 17:09:33 +08:00
Dmitry Kurtaev
1e3ab44cff
Merge pull request #27307 from dkurt:tflite_face_blendshape_model
TFLite fixes for Face Blendshapes V2 #27307

### Pull Request Readiness Checklist

* Scalars support
* Better handling of 1D tensors
* New ops import: SUB, SQRT, DIV, NEG, SQUARED_DIFFERENCE, SUM
* Number of NHWC<->NCHW layouts compatibility improvements

resolves #27211

**Merge with extra**: https://github.com/opencv/opencv_extra/pull/1257

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2025-05-19 10:45:18 +03:00
Vadim Levin
b3e17ea9d4 feat: add conditional inclusion support to header parser 2025-05-19 10:11:52 +03:00
Alexander Smorkalov
eae77dae86
Merge pull request #27327 from mshabunin:fix-intrin-legacy-ops
Restored legacy intrinsics operators in a separate header
2025-05-19 09:19:35 +03:00
Alexander Smorkalov
9d2d927fa9
Merge pull request #27326 from dkurt:handle_multi_output_eltwise_fusion
CUDA: Handle fusion of conv+eltwise in case of multi-output node (i.e. Split)
2025-05-19 09:12:38 +03:00
Madan mohan Manokar
84ea77a4be
Merge pull request #27299 from amd:fast_medianblur_simd
imgproc: medianblur: Performance improvement #27299

* Bottleneck in non-vectorized path reduced.
* AVX512 dispatch added for medianblur.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-05-19 08:56:57 +03:00
Maksim Shabunin
a9b298eb47 Restored legacy intrinsics operators in a separate header 2025-05-17 16:44:07 +03:00
Dmitry Kurtaev
fd5b33bb00 Handle fusion of conv+eltwise in case of multi-output node (i.e. Split) 2025-05-17 11:31:01 +03:00
chengolivia
250b5003ee
Merge pull request #27305 from chengolivia:add-check-sgbm-nondeterminism
Add image dimension check to avoid StereoSGBM non-determinism #27305 
 
Addresses #25828 

Users noticed that StereoSGBM would occasionally give non-deterministic results for `.compute(imgL, imgR)`.

I and others traced the cause to out-of-bounds access that was not being caught when the input images were not wide enough for the input block size and number of disparities to StereoSGBM. The specific math and logic can be found in the above issue's discussion.

This PR adds a CV_Check to make sure images are wider than 1/2 of the block size + the max disparity the algorithm will search.

The check was only added to the regular `compute` method for StereoSGBM and not to the other modes, as I did not observe the non-deterministic behavior with the other compute modes like HH.

In addition, this PR adds a test case to Calib3d to make sure the check is being thrown in the problem case and that the results are deterministic in the good case.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-05-17 10:19:09 +03:00
Vincent Rabaud
9201ca1af1
Merge pull request #27321 from vrabaud:norm
Make sure to not access outside normDiffTabMake sure to not access outside normDiffTab #27321 

If the norm is outside the array (e.g. Hamming), memory is read outside of the array, which does not matter because the invalid pointer is not used oustide of the function (e.g. the Hamming path is taken) but it triggers the sanitizer.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-05-17 09:59:08 +03:00
Alexander Smorkalov
f3cffcd85d
Merge pull request #27322 from vrabaud:zeros
Add missing Mat_<_Tp>::zeros(int _ndims, const int* _sizes)
2025-05-17 09:55:05 +03:00
Alexander Smorkalov
1c421be489
Merge pull request #27288 from ruisv:ruisv-cuda1209-npp-patch-1
CUDA 12.9 support: build NppStreamContext manually
2025-05-16 12:20:55 +03:00
Alexander Smorkalov
80a8c97dd7
Merge pull request #27317 from asmorkalov:as/disable_ipp_x86_android
Disable IPP for x86 32bit Android as it's incompatible with modern NDK.
2025-05-16 12:16:16 +03:00
Vincent Rabaud
1a624efc0f Add missing Mat_<_Tp>::zeros(int _ndims, const int* _sizes) 2025-05-16 10:50:15 +02:00
Alexander Smorkalov
2af8d0317e
Merge pull request #27311 from asmorkalov:as/draw_axes_warning
Added warning if projected axes are out of camera frame in drawAxes
2025-05-16 10:32:41 +03:00
Alexander Smorkalov
7e12c397d0 Disable IPP for x86 32bit Android as it's incompatible with modern NDK. 2025-05-16 08:34:28 +03:00
Yuantao Feng
f016c728f5
Merge pull request #27315 from fengyuentau:4x/hal/riscv-rvv/refactor_functab_elemsize
python3 "/opencv/platforms/android/build_sdk.py" --build_doc --config "/opencv/platforms/android/default.config.py" --sdk_path "$ANDROID_HOME" --ndk_path "$ANDROID_NDK_HOME" /build | tee /build/build-log.txt

python3 "/opencv/platforms/android/build_java_shared_aar.py" --offline --ndk_location="$ANDROID_NDK_HOME" --cmake_location=$(dirname $(dirname $(which cmake))) /build/OpenCV-android-sdk

hal/riscv-rvv: make use of function tab in copyToMasked and CV_ELEM_SIZE1 in place of elem_size_tab #27315

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-05-15 09:57:05 +03:00
Alexander Smorkalov
8ceddbff68
Merge pull request #27314 from parona-source:fix-libspng-pkgconfig
cmake: set SPNG_LIBRARY for pkgconfig as well
2025-05-15 09:27:57 +03:00
Alfred Wingate
8fae4a65fe
cmake: set SPNG_LIBRARY for pkgconfig as well
Pkgconfig will set SPNG_LIBRARIES but not SPNG_LIBRARY, this is an issue
as modules/imgcodecs/CmakeLists.txt uses SPNG_LIBRARY.

Bug: https://bugs.gentoo.org/955661
Fixes: c92815238e
Signed-off-by: Alfred Wingate <parona@protonmail.com>
2025-05-14 23:36:56 +03:00
Alexander Smorkalov
5d3a9788eb
Merge pull request #27309 from abhishek-gola:bilateral_filter_bug_fix
Fixed bilateral filter's sigma color and sigma space issue
2025-05-14 17:12:11 +03:00
Dmitry Kurtaev
67ba045e3b
Merge pull request #27284 from dkurt:java_video_capture_read
Java VideoCapture buffered stream constructor #27284

### Pull Request Readiness Checklist

resolves #26809

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2025-05-14 17:01:45 +03:00
Alexander Smorkalov
e1a74e6d1b Added warning if projected axes are out of camera frame in drawAxes function. 2025-05-14 15:34:51 +03:00
Yuantao Feng
547cef4e88
Merge pull request #27301 from fengyuentau:4x/hal/riscv_rvv/refactor_build
hal/riscv-rvv: refactor the building process #27301

Current hal/riscv-rvv is built with all headers without building an object. This slows down the compilation progress, especially when re-compiling for minor changes in those headers (~170 files need to be re-compiled). This patch solves the problem.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-05-14 14:04:58 +03:00
Abhishek Gola
838babe351 Fixed bilateral filter's sigma color and sigma space issue 2025-05-14 14:22:05 +05:30
Alexander Smorkalov
868fc5c581
Merge pull request #27302 from asmorkalov:as/openjpeg_status
Cherry-pick OpenJPEG deconding status fix.
2025-05-13 10:34:36 +03:00
Alexander Smorkalov
a39db41390 Cherry-pick OpenJPEG deconding status fix. 2025-05-13 08:56:14 +03:00
Alexander Smorkalov
90e7119ce0
Merge pull request #27293 from shirriff:patch-1
Update match_template.py to fix doc bug
2025-05-12 17:48:15 +03:00
Ken Shirriff
d5a5b0e85f Update match_template.py to fix doc bug
shape has Y first, then X. See issue #27292
2025-05-12 17:06:24 +03:00
Alexander Smorkalov
ce4c3f64e0
Merge pull request #27300 from asmorkalov:as/license_update
License update in CPack generated packages
2025-05-12 12:55:24 +03:00
Alexander Smorkalov
2364056aa3 License update in CPack generated packages. 2025-05-12 11:46:51 +03:00
Kumataro
3a69b11b6d
Merge pull request #27297 from Kumataro:fix27295
imgcodecs: png: add log if first chunk is not IHDR #27297

Close https://github.com/opencv/opencv/issues/27295

To optimize for the native pixel format of the iPhone's early PowerVR GPUs, Apple implemented a non-standard PNG format.

Details: https://theapplewiki.com/wiki/PNG_CgBI_Format

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-05-12 10:44:54 +03:00
Alexander Smorkalov
8035aade11
Merge pull request #27296 from sturkmen72:bugfix-gif
Fix a bug on rendering some animated gif
2025-05-12 09:14:07 +03:00
Alexander Smorkalov
59a17bc41a
Merge pull request #27287 from asmorkalov:as/hsv_init
Reworked HSV color conversion tables initialization for OpenCL branch
2025-05-12 09:09:04 +03:00
Alexander Smorkalov
11aefa2338
Merge pull request #27291 from asmorkalov:as/js_charuco_corners
Fixed std::vector<Point3f> handling in JS wrappers
2025-05-12 09:06:29 +03:00
Alexander Smorkalov
306204089f Reworked HSV color conversion tables initialization for OpenCL branch. 2025-05-12 09:02:47 +03:00
Suleyman TURKMEN
9d07c25175
Update grfmt_gif.cpp 2025-05-10 16:06:06 +03:00
Alexander Smorkalov
2f97718bc1 Fixed std::vector<Point3f> handling in JS wrappers. 2025-05-07 16:03:12 +03:00
ruisv
9ab3a249c2
remove private.cuda.hpp:158 space 2025-05-07 11:46:43 +08:00
ruisv
8a2903c190
CUDA 12.9 support: build NppStreamContext manually 2025-05-06 23:47:12 +08:00
Alexander Smorkalov
16a3d37dc1
Merge pull request #27282 from asmorkalov:as/qt_resize
Fixed QT window resize logic
2025-05-06 15:13:33 +03:00
Alexander Smorkalov
7d66431f8e Fixed QT window resize logic. 2025-05-05 14:52:15 +03:00
Alexander Smorkalov
c248d47110
Merge pull request #27268 from Kumataro:fix27267
doc: hal: replace C++ operators with wrapper functions
2025-05-05 09:20:46 +03:00
Dmitry Kurtaev
0ea3c156a4
Merge pull request #27273 from dkurt:dnn_tflite_slice
* TFLite StridedSllice (without strides but just Slice)

* Enable strides for TF importers. Update OpenVINO backend for StridedSlice
2025-05-03 14:47:45 +03:00
Alexander Smorkalov
806eb4767c
Merge pull request #27274 from opencv-pushbot:gitee/alalek/fix_26328
core(OCL): fix POWN OpenCL implementation
2025-05-03 14:44:39 +03:00
Alexander Alekhin
7a9ce585f0 core(ocl): fix POWN OpenCL implementation 2025-05-01 20:57:23 +00:00
Kumataro
37be2a2a68 doc: hal: replace C++ operators with wrapper functions 2025-04-30 05:40:16 +09:00
Alexander Smorkalov
4ad4bd5dc0
Merge pull request #27227 from s-trinh:improve_calib3d_doc_homogeneous_transformation
Add additional information about homogeneous transformations in the calib3d doc
2025-04-28 19:51:34 +03:00
Alexander Smorkalov
956f583b69
Merge pull request #27263 from fengyuentau:4x/hal_rvv/rotate
hal/riscv_rvv: implemented flip_inplace to boost cv::rotate
2025-04-28 11:03:23 +03:00
fengyuentau
ab5a65b5a2 perf: implemented flip_inplace in hal_rvv to boost cv::rotate on RISC-V platforms 2025-04-28 14:18:48 +08:00
Alexander Smorkalov
fe5bd15cdd
Merge pull request #27260 from utibenkei:add_cv_wrap_to_dnn_registeroutput
Add CV_WRAP to registerOutput for language bindings support
2025-04-26 15:07:42 +03:00
Alexander Smorkalov
c9a73061ca
Merge pull request #27258 from asmorkalov:as/gif_on
Enable GIF support by default
2025-04-26 11:16:05 +03:00
Yuantao Feng
2fb786532a
Merge pull request #27257 from fengyuentau:4x/hal_rvv/flip_opt
hal_rvv: further optimized flip #27257

Checklist:
- [x] flipX
- [x] flipY
- [x] flipXY

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-04-26 11:08:29 +03:00
utibenkei
c774dd41cf Add CV_WRAP to registerOutput for language bindings support 2025-04-26 01:32:11 +09:00
Alexander Smorkalov
55a063f025 Enable GIF support by default. 2025-04-25 15:03:22 +03:00
Alexander Smorkalov
19c4d97638
Merge pull request #27252 from asmorkalov:as/extract_hal
Extract all HALs from 3rdparty to dedicated folder. #27252

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-04-25 14:56:42 +03:00
Alexander Smorkalov
9533c5633d
Merge pull request #27255 from nina16448:houghcircles_fix
Update houghcircles.py
2025-04-25 11:58:51 +03:00
Kumataro
86a963cec9
Merge pull request #27226 from Kumataro:fix27225
imgproc: cvtColor: remove to copy edge pixels for COLOR_Bayer*_VNGs. #27226 

Close https://github.com/opencv/opencv/issues/27225
Close https://github.com/opencv/opencv/issues/5089
Related https://github.com/opencv/opencv_extra/pull/1249

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-04-25 11:29:22 +03:00
adsha-quic
edccfa7961
Merge pull request #27184 from CodeLinaro:gemm_fastcv_hal
FastCV gemm hal #27184

FastCV hal for gemm 32f

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-04-25 11:07:26 +03:00
sirudoi
485c7d5be7
Merge pull request #27230 from sirudoi:4.x
videoio: add Orbbec Gemini 330 camera support #27230

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] The feature is well documented and sample code can be built with the project CMake

### Description of Changes
#### motivated:
- Orbbec has launched a new RGB-D camera — the Gemini 330. To fully leverage the capabilities of the Gemini 330, Orbbec simultaneously released version 2 of the open-source OrbbecSDK. This PR adapts the support for the Gemini 330 series cameras to better meet and respond to users’ application requirements.
#### change:
- Add support for the Orbbec Gemini330 camera.
- Fixed an issue with Femto Mega on Windows 10/11; for details, see [issue](https://github.com/opencv/opencv/pull/23237#issuecomment-2242347295).
- When enabling `HAVE_OBSENSOR_ORBBEC_SDK`, the build now fetches version 2 of the OrbbecSDK, and the sample API calls have been updated to the v2 format.

### Testing
|     OS     |                Compiler                 |      Camera       | Result |
|:----------:|:---------------------------------------:|:-----------------:|:------:|
| Windows 11 | (VS2022) MSVC runtime library version 14.40       | Gemini 335/336L   | Pass   |
| Windows 11 | (VS2022) MSVC runtime library version 14.19       | Gemini 335/336L   | Pass   |
| Ubuntu22.04| GCC 11.4                               | Gemini 335/336L   | Pass   |
| Ubuntu18.04| GCC 7.5                                | Gemini 335/336L   | Pass   |

### Acknowledgements
Thank you to the OpenCV team for the continuous support and for creating such a robust open source project. I appreciate the valuable feedback from the community and reviewers, which has helped improve the quality of this contribution!
2025-04-25 11:04:19 +03:00
nina16448
6f8f846288 Update houghcircles.py 2025-04-25 15:39:50 +08:00
Alexander Smorkalov
829495355d
Merge pull request #27253 from asmorkalov:as/kleidicv_release_check
Added KleidiCV check for Android SDK release builds
2025-04-24 09:47:14 +03:00
Alexander Smorkalov
9241e0a9f6 Added KleidiCV check for Android SDK release builds. 2025-04-24 07:30:16 +03:00
Yuantao Feng
325e59bd4c
Merge pull request #27229 from fengyuentau:4x/hal_rvv/transpose
HAL: implemented cv_hal_transpose in hal_rvv #27229

Checklists:

- [x] transpose2d_8u
- [x] transpose2d_16u
- [ ] ~transpose2d_8uC3~
- [x] transpose2d_32s
- [ ] ~transpose2d_16uC3~
- [x] transpose2d_32sC2
- [ ] ~transpose_32sC3~
- [ ] ~transpose_32sC4~
- [ ] ~transpose_32sC6~
- [ ] ~transpose_32sC8~
- [ ] ~inplace transpose~


### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-04-22 11:03:26 +03:00
Alexander Smorkalov
cd5a636459
Merge pull request #27249 from fengyuentau:4x/hal_rvv/bugfix-norm2-int
HAL: aligned behavior of normDiff 32s kernels in hal_rvv in 4.x
2025-04-22 10:54:04 +03:00
fengyuentau
a7749c3813 aligned behavior in normDiff in hal_rvv for 4.x 2025-04-22 14:44:42 +08:00
Skreg
e37819c2ac
Merge pull request #27221 from shyama7004:docChanges
Minor changes in calib3d docs for clarity #27221

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-04-21 21:15:37 +03:00
Alexander Smorkalov
29ef2a1da3
Merge pull request #27245 from utibenkei:fix_java_wrapper_get_set_blobColor
Enable Java bindings for SimpleBlobDetector::blobColor
2025-04-21 20:53:35 +03:00
utibenkei
97f73ba0b5
Merge pull request #27228 from utibenkei:fix_java_enum_wrapper
Explicitly specify enum type scopes to improve Java wrapper generation #27228 

Changed DataLayout and ImagePaddingMode to dnn::DataLayout and dnn::ImagePaddingMode to explicitly specify their scopes. This allows gen_java.py to correctly register  disc_type, preventing constructors and methods using these enum types from being skipped during Java wrapper generation.

Similarly updated QRCodeEncoder::CorrectionLevel and QRCodeEncoder::EncodeMode with explicit scope declarations.

Also added a new Java test class `DnnBlobFromImageWithParamsTest` based on: https://github.com/opencv/opencv/blob/4.x/modules/dnn/test/test_misc.cpp#L133-L243

Related issues
#23753 

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-04-21 20:51:38 +03:00
Adrian Kretz
a08b1b6566
Merge pull request #27244 from akretz:fix_issue_27183
Fix QR code encoder with autoversion #27244

The autodetected version is not honored in the `QRCodeEncoderImpl::encode*` methods. This fixes #27183

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-04-21 17:40:33 +03:00
Ivan Avdeev
a8a3b93043
Merge pull request #27239 from avdivan:4.x
Android-SDK: check flag IPP package #27239

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-04-21 17:33:14 +03:00
Alexander Smorkalov
0bea67f57b
Merge pull request #27224 from fengyuentau:4x/hal_rvv/compare
HAL: implemented cv_hal_cmp* in hal_rvv
2025-04-21 15:48:31 +03:00
fengyuentau
8eb9d27a31 implemented cv_hal_cmp* in hal_rvv 2025-04-21 18:09:57 +08:00
YooLc
f20facc60a
Merge pull request #27060 from YooLc:hal-rvv-integral
[hal_rvv] Add cv::integral implementation and more types of input for test #27060

This patch introduces an RVV-optimized implementation of `cv::integral()` in hal_rvv, along with performance and accuracy tests for all valid input/output type combinations specified in `modules/imgproc/src/hal_replacement.hpp`:
2a8d4b8e43/modules/imgproc/src/hal_replacement.hpp (L960-L974)

The vectorized prefix sum algorithm follows the approach described in [Prefix Sum with SIMD - Algorithmica](https://en.algorithmica.org/hpc/algorithms/prefix/).

I intentionally omitted support for the following cases by returning `CV_HAL_ERROR_NOT_IMPLEMENTED`, as they are harder to implement or show limited performance gains:
1. **Tilted Sum**: The data access pattern for tilted sums requires multi-row operations, making effective vectorization difficult.
2. **3-channel images (`cn == 3`)**: Current implementation requires `VLEN/SEW` (a.k.a. number of elements in a vector register) to be a multiple of channel count, which 3-channel formats typically cannot satisfy.
    - Support for 1, 2 and 4 channel images is implemented
4. **Small images (`!(width >> 8 || height >> 8)`)**: The scalar implementation demonstrates better performance for images with limited dimensions. 
    - This is the same as `3rdparty/ndsrvp/src/integral.cpp` 09c71aed14/3rdparty/ndsrvp/src/integral.cpp (L24-L26)

Test configuration:

- Platform: SpacemiT Muse Pi (K1 @ 1.60 Ghz)
- Toolchain: GCC 14.2.0
- `integral_sqsum_full` test is disabled by default, so `--gtest_also_run_disabled_tests` is needed

Test results:

```plaintext
Geometric mean (ms)

                                     Name of Test                                       imgproc-gcc-scalar imgproc-gcc-hal  imgproc-gcc-hal  
                                                                                                                                   vs        
                                                                                                                           imgproc-gcc-scalar
                                                                                                                               (x-factor)      
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32F)                                   1.973             1.415             1.39       
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32S)                                   1.343             1.351             0.99       
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_64F)                                   2.021             2.756             0.73       
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32F)                                   4.695             2.874             1.63       
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32S)                                   4.028             2.801             1.44       
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_64F)                                   5.965             4.926             1.21       
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32F)                                   9.970             4.440             2.25       
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32S)                                   7.934             4.244             1.87       
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_64F)                                   14.696            8.431             1.74       
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32F)                                  5.949             4.108             1.45       
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32S)                                  4.064             4.080             1.00       
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_64F)                                  6.137             7.975             0.77       
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32F)                                  13.896            8.721             1.59       
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32S)                                  10.948            8.513             1.29       
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_64F)                                  18.046           15.234             1.18       
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32F)                                  35.105           13.778             2.55       
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32S)                                  27.135           13.417             2.02       
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_64F)                                  43.477           25.616             1.70       
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32F)                                 13.386            9.281             1.44       
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32S)                                 9.159             9.194             1.00       
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_64F)                                 13.776           17.836             0.77       
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32F)                                 31.943           19.435             1.64       
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32S)                                 24.747           18.946             1.31       
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_64F)                                 35.925           33.943             1.06       
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32F)                                 66.493           29.692             2.24       
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32S)                                 54.737           28.250             1.94       
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_64F)                                 91.880           57.495             1.60            
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32F)                             4.384             4.016             1.09       
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32S)                             3.676             3.960             0.93       
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_64F)                             5.620             5.224             1.08       
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32F)                             9.971             7.696             1.30       
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32S)                             8.934             7.632             1.17       
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_64F)                             9.927             9.759             1.02       
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32F)                             21.556           12.288             1.75       
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32S)                             21.261           12.089             1.76       
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_64F)                             23.989           16.278             1.47       
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32F)                            15.232           11.752             1.30       
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32S)                            12.976           11.721             1.11       
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_64F)                            16.450           15.627             1.05       
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32F)                            25.932           23.243             1.12       
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32S)                            24.750           23.019             1.08       
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_64F)                            28.228           29.605             0.95       
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32F)                            61.665           37.477             1.65       
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32S)                            61.536           37.126             1.66       
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_64F)                            73.989           48.994             1.51       
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32F)                           49.640           26.529             1.87       
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32S)                           35.869           26.417             1.36       
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_64F)                           34.378           35.056             0.98       
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32F)                           82.138           52.661             1.56       
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32S)                           54.644           52.089             1.05       
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_64F)                           75.073           66.670             1.13       
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32F)                          143.283           83.943             1.71       
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32S)                          156.851           82.378             1.90       
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_64F)                          521.594           111.375            4.68            
integral_sqsum_full::Size_MatType_OutMatDepthArray::(640x480, (8UC1, DEPTH_32F_32F))          3.529             2.787             1.27       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(640x480, (8UC1, DEPTH_32F_64F))          4.396             3.998             1.10       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(640x480, (8UC1, DEPTH_32S_32F))          3.229             2.774             1.16       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(640x480, (8UC1, DEPTH_32S_32S))          2.945             2.780             1.06       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(640x480, (8UC1, DEPTH_32S_64F))          3.857             3.995             0.97       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(640x480, (8UC1, DEPTH_64F_64F))          5.872             5.228             1.12       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(640x480, (16UC1, DEPTH_64F_64F))         6.075             5.277             1.15       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(640x480, (16SC1, DEPTH_64F_64F))         5.680             5.296             1.07       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(640x480, (32FC1, DEPTH_32F_32F))         3.355             2.896             1.16       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(640x480, (32FC1, DEPTH_32F_64F))         4.183             4.000             1.05       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(640x480, (32FC1, DEPTH_64F_64F))         6.237             5.143             1.21       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(640x480, (64FC1, DEPTH_64F_64F))         4.753             4.783             0.99       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(640x480, (8UC2, DEPTH_32F_32F))          8.021             5.793             1.38       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(640x480, (8UC2, DEPTH_32F_64F))          9.963             7.704             1.29       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(640x480, (8UC2, DEPTH_32S_32F))          7.864             5.720             1.37       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(640x480, (8UC2, DEPTH_32S_32S))          7.141             5.699             1.25       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(640x480, (8UC2, DEPTH_32S_64F))          9.228             7.646             1.21       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(640x480, (8UC2, DEPTH_64F_64F))          9.940             9.759             1.02       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(640x480, (16UC2, DEPTH_64F_64F))         10.606            9.716             1.09       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(640x480, (16SC2, DEPTH_64F_64F))         9.933             9.751             1.02       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(640x480, (32FC2, DEPTH_32F_32F))         7.986             5.962             1.34       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(640x480, (32FC2, DEPTH_32F_64F))         9.243             7.598             1.22       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(640x480, (32FC2, DEPTH_64F_64F))         10.573            9.425             1.12       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(640x480, (64FC2, DEPTH_64F_64F))         11.029            8.977             1.23       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(640x480, (8UC4, DEPTH_32F_32F))          17.236            8.881             1.94       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(640x480, (8UC4, DEPTH_32F_64F))          20.905           12.322             1.70       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(640x480, (8UC4, DEPTH_32S_32F))          16.011            8.666             1.85       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(640x480, (8UC4, DEPTH_32S_32S))          15.932            8.507             1.87       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(640x480, (8UC4, DEPTH_32S_64F))          20.713           12.115             1.71       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(640x480, (8UC4, DEPTH_64F_64F))          23.953           16.284             1.47       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(640x480, (16UC4, DEPTH_64F_64F))         25.127           16.341             1.54       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(640x480, (16SC4, DEPTH_64F_64F))         24.950           16.441             1.52       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(640x480, (32FC4, DEPTH_32F_32F))         17.261            8.906             1.94       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(640x480, (32FC4, DEPTH_32F_64F))         21.944           12.073             1.82       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(640x480, (32FC4, DEPTH_64F_64F))         25.921           15.539             1.67       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(640x480, (64FC4, DEPTH_64F_64F))         27.938           14.824             1.88       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1280x720, (8UC1, DEPTH_32F_32F))         11.156            8.260             1.35       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1280x720, (8UC1, DEPTH_32F_64F))         14.777           11.869             1.24       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1280x720, (8UC1, DEPTH_32S_32F))         9.693             8.221             1.18       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1280x720, (8UC1, DEPTH_32S_32S))         9.023             8.256             1.09       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1280x720, (8UC1, DEPTH_32S_64F))         13.276           11.821             1.12       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1280x720, (8UC1, DEPTH_64F_64F))         15.406           15.618             0.99       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1280x720, (16UC1, DEPTH_64F_64F))        16.799           15.749             1.07       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1280x720, (16SC1, DEPTH_64F_64F))        15.054           15.806             0.95       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1280x720, (32FC1, DEPTH_32F_32F))        10.055            7.999             1.26       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1280x720, (32FC1, DEPTH_32F_64F))        13.506           11.253             1.20       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1280x720, (32FC1, DEPTH_64F_64F))        14.952           15.021             1.00       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1280x720, (64FC1, DEPTH_64F_64F))        13.761           14.002             0.98       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1280x720, (8UC2, DEPTH_32F_32F))         22.677           17.330             1.31       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1280x720, (8UC2, DEPTH_32F_64F))         26.283           23.237             1.13       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1280x720, (8UC2, DEPTH_32S_32F))         20.126           17.118             1.18       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1280x720, (8UC2, DEPTH_32S_32S))         19.337           17.041             1.13       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1280x720, (8UC2, DEPTH_32S_64F))         24.973           23.004             1.09       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1280x720, (8UC2, DEPTH_64F_64F))         29.959           29.585             1.01       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1280x720, (16UC2, DEPTH_64F_64F))        33.598           29.599             1.14       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1280x720, (16SC2, DEPTH_64F_64F))        46.213           29.741             1.55       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1280x720, (32FC2, DEPTH_32F_32F))        33.077           17.556             1.88       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1280x720, (32FC2, DEPTH_32F_64F))        33.960           22.991             1.48       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1280x720, (32FC2, DEPTH_64F_64F))        41.792           28.803             1.45       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1280x720, (64FC2, DEPTH_64F_64F))        34.660           28.532             1.21       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1280x720, (8UC4, DEPTH_32F_32F))         52.989           27.659             1.92       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1280x720, (8UC4, DEPTH_32F_64F))         62.418           37.515             1.66       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1280x720, (8UC4, DEPTH_32S_32F))         50.902           27.310             1.86       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1280x720, (8UC4, DEPTH_32S_32S))         47.301           27.019             1.75       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1280x720, (8UC4, DEPTH_32S_64F))         61.982           37.140             1.67       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1280x720, (8UC4, DEPTH_64F_64F))         79.403           49.041             1.62       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1280x720, (16UC4, DEPTH_64F_64F))        86.550           49.180             1.76       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1280x720, (16SC4, DEPTH_64F_64F))        85.715           49.468             1.73       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1280x720, (32FC4, DEPTH_32F_32F))        63.932           28.019             2.28       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1280x720, (32FC4, DEPTH_32F_64F))        68.180           36.858             1.85       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1280x720, (32FC4, DEPTH_64F_64F))        83.063           46.483             1.79       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1280x720, (64FC4, DEPTH_64F_64F))        91.990           44.545             2.07       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1920x1080, (8UC1, DEPTH_32F_32F))        25.503           18.609             1.37       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1920x1080, (8UC1, DEPTH_32F_64F))        29.544           26.635             1.11       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1920x1080, (8UC1, DEPTH_32S_32F))        22.581           18.514             1.22       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1920x1080, (8UC1, DEPTH_32S_32S))        20.860           18.547             1.12       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1920x1080, (8UC1, DEPTH_32S_64F))        26.046           26.373             0.99       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1920x1080, (8UC1, DEPTH_64F_64F))        34.831           34.997             1.00       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1920x1080, (16UC1, DEPTH_64F_64F))       36.428           35.214             1.03       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1920x1080, (16SC1, DEPTH_64F_64F))       32.435           35.314             0.92       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1920x1080, (32FC1, DEPTH_32F_32F))       22.548           18.845             1.20       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1920x1080, (32FC1, DEPTH_32F_64F))       28.589           25.790             1.11       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1920x1080, (32FC1, DEPTH_64F_64F))       32.625           33.791             0.97       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1920x1080, (64FC1, DEPTH_64F_64F))       30.158           31.889             0.95       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1920x1080, (8UC2, DEPTH_32F_32F))        53.374           38.938             1.37       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1920x1080, (8UC2, DEPTH_32F_64F))        73.892           52.747             1.40       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1920x1080, (8UC2, DEPTH_32S_32F))        47.392           38.572             1.23       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1920x1080, (8UC2, DEPTH_32S_32S))        45.638           38.225             1.19       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1920x1080, (8UC2, DEPTH_32S_64F))        69.966           52.156             1.34       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1920x1080, (8UC2, DEPTH_64F_64F))        68.560           66.963             1.02       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1920x1080, (16UC2, DEPTH_64F_64F))       71.487           65.420             1.09       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1920x1080, (16SC2, DEPTH_64F_64F))       68.127           65.718             1.04       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1920x1080, (32FC2, DEPTH_32F_32F))       72.967           39.987             1.82       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1920x1080, (32FC2, DEPTH_32F_64F))       63.933           51.408             1.24       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1920x1080, (32FC2, DEPTH_64F_64F))       73.334           63.354             1.16       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1920x1080, (64FC2, DEPTH_64F_64F))       80.983           60.778             1.33       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1920x1080, (8UC4, DEPTH_32F_32F))       116.981           59.908             1.95       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1920x1080, (8UC4, DEPTH_32F_64F))       155.085           83.974             1.85       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1920x1080, (8UC4, DEPTH_32S_32F))       109.567           58.525             1.87       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1920x1080, (8UC4, DEPTH_32S_32S))       105.457           57.124             1.85       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1920x1080, (8UC4, DEPTH_32S_64F))       157.325           82.485             1.91       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1920x1080, (8UC4, DEPTH_64F_64F))       265.776           111.577            2.38       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1920x1080, (16UC4, DEPTH_64F_64F))      585.218           110.583            5.29       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1920x1080, (16SC4, DEPTH_64F_64F))      585.418           111.302            5.26       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1920x1080, (32FC4, DEPTH_32F_32F))      126.456           60.415             2.09       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1920x1080, (32FC4, DEPTH_32F_64F))      169.278           81.460             2.08       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1920x1080, (32FC4, DEPTH_64F_64F))      281.256           104.732            2.69       
integral_sqsum_full::Size_MatType_OutMatDepthArray::(1920x1080, (64FC4, DEPTH_64F_64F))      620.885           99.953             6.21       
```

The vectorized implementation shows progressively better acceleration for larger image sizes and higher channel counts, achieving up to 6.21× speedup for 64FC4 (1920×1080) inputs with `DEPTH_64F_64F` configuration.

This is my first time proposing patch for the OpenCV Project 🥹, if there's anything that can be improved, please tell me.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2025-04-21 09:50:13 +03:00
Yuantao Feng
11e46cda86
Merge pull request #27201 from fengyuentau:4x/hal_rvv/dotprod
HAL: implemented cv_hal_dotProduct in hal_rvv #27201

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-04-21 09:05:52 +03:00
utibenkei
d42b6f438e Enable Java bindings for SimpleBlobDetector::blobColor
The C++ uchar type is properly mapped to Java byte type.
2025-04-20 15:57:41 +09:00
quic-xuezha
b5d38ea4cb
Merge pull request #27217 from CodeLinaro:gaussianBlur_hal_fix
Optimize gaussian blur performance in FastCV HAL #27217

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-04-17 09:56:58 +03:00
Alexander Smorkalov
767dd838d3
Merge pull request #27192 from ddennedy:cmake4
Fix configuring with CMake version 4
2025-04-16 19:05:18 +03:00
adsha-quic
6ffc515b2a
Merge pull request #27182 from CodeLinaro:boxFilter_hal_changes
Parallel_for in box Filter and support for 32f box filter in Fastcv hal #27182

Added parallel_for in box filter hal and support for 32f box filter

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-04-16 18:33:38 +03:00
Alexander Smorkalov
3962803e7a
Merge pull request #27216 from CodeLinaro:xuezha_3rdPost
Add SVD into FastCV HAL
2025-04-16 17:48:36 +03:00
Alexander Smorkalov
5170f56a1e
Merge pull request #27234 from asmorkalov:fastcv_lib_hash_update
CMake tuning after FastCV binaries update
2025-04-16 17:23:28 +03:00
Alexander Smorkalov
250ea3d7c6 Fixed Android build with FastCV. 2025-04-16 16:26:05 +03:00
adsha-quic
ba6eb8d952
Merge pull request #27214 from CodeLinaro:fastcv_lib_hash_update
Adding latest FastCV static libs
updated libs PR: [opencv/opencv_3rdparty/pull/94](https://github.com/opencv/opencv_3rdparty/pull/94)


### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-04-16 16:24:40 +03:00
Alexander Smorkalov
050dfab749
Merge pull request #27218 from gfrankliu:tbb-lib-upgrade
upgrade tbb to version 2022.1.0
2025-04-16 15:29:09 +03:00
Souriya Trinh
7f7be9bab0 Add additional information about homogeneous transformations. Add quick formulas for conversions between physical focal length, sensor size, fov and camera intrinsic params. 2025-04-13 22:28:11 +02:00
Alexander Smorkalov
6ef5746391
Merge pull request #27194 from asmorkalov:as/ipp_transpose
Migrated IPP impl for flip and transpose to HAL
2025-04-10 21:04:46 +03:00
Kumataro
c1d71d5375
Merge pull request #27220 from Kumataro:fix24757
imgproc: disable SIMD for compareHist(INTERSECT) if f64 is unsupported #27220

Close https://github.com/opencv/opencv/issues/24757

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-04-10 17:24:01 +03:00
Frank Liu
70ab545b90 upgrade tbb to version 2022.1.0
upgrade tbb from version 2021.11.0 to 2022.1.0 to fix https://github.com/opencv/opencv/issues/25187
2025-04-09 22:57:08 -07:00
Alexander Smorkalov
fa7a0c1e12 Migrated IPP impl for flip and transpose to HAL. 2025-04-10 08:51:12 +03:00
Xue Zhang
0d092c7b1e Add SVD into HAL 2025-04-10 10:51:25 +05:30
Alexander Smorkalov
db9df33d33
Merge pull request #27213 from asmorkalov:as/ipp_cart_polar
Transfer IPP polarToCart to HAL
2025-04-10 07:48:42 +03:00
Alexander Smorkalov
78662ac085 Transfer IPP polarToCart to HAL. 2025-04-09 19:13:18 +03:00
Alexander Smorkalov
09a85e97aa
Merge pull request #27196 from FurkanTahaSaranda:patch-1
Update js_houghcircles_HoughCirclesP.html
2025-04-08 09:10:39 +03:00
Alexander Smorkalov
e826a41eeb
Merge pull request #27202 from asmorkalov:as/drop_ipp_lut
Dropped inefficient (disabled) IPP integration for LUT.
2025-04-08 09:07:52 +03:00
Alexander Smorkalov
e148a2c4aa
Merge pull request #27203 from asmorkalov:as/drop_dead_ipp_convertTo
Drop commented out convertTo impl with IPP.
2025-04-08 09:07:13 +03:00
Alexander Smorkalov
8f74086d3f Drop commented out convertTo impl with IPP. 2025-04-07 14:18:37 +03:00
Alexander Smorkalov
91e078be93 Dropped inefficient (disabled) IPP integration for LUT. 2025-04-07 14:11:13 +03:00
Yuantao Feng
1b3db545a3
Merge pull request #27145 from fengyuentau:4x/core/copyMask-simd
core: further vectorize copyTo with mask #27145

Merge with https://github.com/opencv/opencv_extra/pull/1247.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-04-07 10:56:02 +03:00
Yuantao Feng
81859255ca
Merge pull request #27175 from fengyuentau:4x/hal_rvv/div_recip
HAL: implemented cv_hal_div* and cv_hal_recip* in hal_rvv #27175

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-04-07 10:36:19 +03:00
FurkanTahaSaranda
55a4a713fd
Update js_houghcircles_HoughCirclesP.html 2025-04-04 17:36:42 +03:00
Alexander Smorkalov
0b3155980a
Merge pull request #25394 from Gao-HaoYuan:in_place_convertTo
Added reinterpret() method to Mat to convert meta-data without actual data conversion
2025-04-04 10:40:33 +03:00
Alexander Smorkalov
d49dee83bf
Merge pull request #27193 from mshabunin:fix-v4l-size
videoio: fixed V4L frame size for non-BGR output
2025-04-03 09:49:36 +03:00
Maksim Shabunin
3f9ed93da2 videoio: fixed V4L frame size for non-BGR output 2025-04-03 07:09:52 +03:00
Dan Dennedy
cb8030809e Fix configuring with CMake version 4
fixes #27122
2025-04-02 13:45:08 -07:00
Maxim Smolskiy
c8e88d8984
Merge pull request #27185 from MaximSmolskiy:specify_dls_and_upnp_mappings_to_epnp_in_all_places_for_solvepnp_tests
Specify DLS and UPnP mappings to EPnP in all places for solvePnP* tests #27185

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2025-04-02 21:21:56 +03:00
Alexander Smorkalov
8b4b382fa4
Merge pull request #27029 from mshabunin:fix-openblas
build: fix OpenBLAS detection on Linux
2025-04-01 08:13:48 +03:00
Maksim Shabunin
009fdbbea2 build: fix OpenBLAS detection on Linux 2025-03-31 18:54:39 +03:00
Kumataro
09c71aed14
Merge pull request #27107 from Kumataro:fix27105
build: Check supported C++ standard features and user setting #27107

Close #27105 

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2025-03-31 11:31:14 +03:00
Kumataro
1f468dd586
Merge pull request #27169 from Kumataro:fix27168
Imgcodec: gif: remove unnecessary warning #27169

Close https://github.com/opencv/opencv/issues/27168

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-03-31 11:01:22 +03:00
Yuantao Feng
ec1cbe294a
Merge pull request #27162 from fengyuentau:4x/hal_rvv/copyMask
HAL: added copyToMask and implemented in hal_rvv #27162

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-03-31 10:49:37 +03:00
sujal
8c7288676e
Merge pull request #26682 from 5usu:4.x
Adding AddRgbFeature(), and improving robustness in ComputeRgbDistance().

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-03-31 09:47:34 +03:00
Alexander Smorkalov
9d34a88597
Merge pull request #27174 from MaximSmolskiy:more_elegant_skipping_SOLVEPNP_IPPE_methods_in_non_planar-accuracy-tests-for_solvePnP
More elegant skipping SOLVEPNP_IPPE* methods in non-planar accuracy tests for solvePnP*
2025-03-31 09:09:35 +03:00
Alexander Smorkalov
d0c5a04ddb
Merge pull request #27170 from Osse:qt-fix-closing-window
Fix closing of windows when using the Qt backend
2025-03-31 08:58:23 +03:00
天音あめ
14e1f6ce96
Merge pull request #27160 from amane-ame:resize_hal_rvv
Add RISC-V HAL implementation for cv::resize #27160

This patch implements `cv_hal_resize` using native intrinsics, optimizing the performance of `cv::resize` for `CV_INTER_NEAREST/CV_INTER_NEAREST_EXACT/CV_INTER_LINEAR/CV_INTER_LINEAR_EXACT/CV_INTER_AREA` modes.

Tested on MUSE-PI (Spacemit X60) for both gcc 14.2 and clang 20.1.

```
$ ./opencv_test_imgproc --gtest_filter="*Resize*:*resize*"
$ ./opencv_perf_imgproc --gtest_filter="*Resize*:*resize*" --perf_min_samples=300 --perf_force_samples=300
```

View the full perf table here: [hal_rvv_resize.pdf](https://github.com/user-attachments/files/19480756/hal_rvv_resize.pdf)

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-03-31 08:19:18 +03:00
MaximSmolskiy
289884adc5 More elegant skipping SOLVEPNP_IPPE* methods in non-planar accuracy tests for solvePnP* 2025-03-30 17:00:11 +03:00
Ryan Wong
afc7c0a89c
Merge pull request #27154 from kinchungwong:logging_callback_simple_c
User-defined logger callback, C-style. #27154

This is a competing PR, an alternative to #27140 

Both functions accept C-style pointer to static functions. Both functions allow restoring the OpenCV built-in implementation by passing in a nullptr.
- replaceWriteLogMessage
- replaceWriteLogMessageEx

This implementation is not compatible with C++ log handler objects.

This implementation has minimal thread safety, in the sense that the function pointer are stored and read atomically. But otherwise, the user-defined static functions must accept calls at all times, even after having been deregistered, because some log calls may have started before deregistering.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-03-30 16:17:07 +03:00
Alexander Smorkalov
767407e711
Merge pull request #27164 from cudawarped:cmake_add_comment_for_cmp0104
cuda:: Add comment describing code around enable_language(CUDA) in CMakeLists.txt
2025-03-30 13:36:29 +03:00
Øystein Walle
6b4ef1dccd Fix closing of windows when using the Qt backend
Notify the main GUI thread upon receiving a close event instead of when
the windows is destroyed. Additionally there was a logic error in in
GuiReceiver::isLastWindow() that is corrected.

Fixes #6479 and #20822
2025-03-28 14:59:15 +01:00
cudawarped
3d5ab56a68 Add comment for CMake 3.18+: if CMAKE_CUDA_ARCHITECTURES is empty enable_language(CUDA) sets it to the default architecture chosen by the compiler, to trigger the OpenCV custom CUDA architecture search an empty value needs to be respected see https://github.com/opencv/opencv/pull/25941. 2025-03-27 16:40:01 +02:00
Vincent Rabaud
42a132088c
Merge pull request #27138 from vrabaud:lzw
Fix heap buffer overflow and use after free in imgcodecs #27138

This fixes:
- https://g-issues.oss-fuzz.com/issues/405243132
- https://g-issues.oss-fuzz.com/issues/405456349

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-03-26 17:14:50 +03:00
Alexander Smorkalov
8e2826ddd6
Merge pull request #27142 from asmorkalov:as/cuda_standard_all
Set C++ standard for all CUDA configurations
2025-03-26 08:53:16 +03:00
Alexander Smorkalov
9a1f71c14b
Merge pull request #27148 from Kumataro:fix27129
doc: js: unwrap promise-typed cv object
2025-03-26 08:50:53 +03:00
Kumataro
8948faa394 doc: unwrap promise-typed cv object 2025-03-25 23:16:19 +09:00
Alexander Smorkalov
68a595d88b Set C++ standard for all CUDA configurations. 2025-03-25 16:47:12 +03:00
Alexander Smorkalov
c72c527bfe
Merge pull request #27144 from asmorkalov:as/fix_doc_stereobm
Fixed JavaDoc generation for StereoBM.
2025-03-25 16:36:33 +03:00
Alexander Smorkalov
ae443a904b Fixed JavaDoc generation for StereoBM. 2025-03-25 12:33:58 +03:00
天音あめ
fa58c1205b
Merge pull request #27119 from amane-ame:warp_hal_rvv
Add RISC-V HAL implementation for cv::warp series #27119

This patch implements `cv_hal_remap`, `cv_hal_warpAffine` and `cv_hal_warpPerspective` using native intrinsics, optimizing the performance of `cv::remap/cv::warpAffine/cv::warpPerspective` for `CV_HAL_INTER_NEAREST/CV_HAL_INTER_LINEAR/CV_HAL_INTER_CUBIC/CV_HAL_INTER_LANCZOS4` modes.

Tested on MUSE-PI (Spacemit X60) for both gcc 14.2 and clang 20.0.

```
$ ./opencv_test_imgproc --gtest_filter="*Remap*:*Warp*"
$ ./opencv_perf_imgproc --gtest_filter="*Remap*:*remap*:*Warp*" --perf_min_samples=200 --perf_force_samples=200
```

View the full perf table here: [hal_rvv_warp.pdf](https://github.com/user-attachments/files/19403718/hal_rvv_warp.pdf)

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-03-25 11:57:47 +03:00
Gianluca Nordio
931af518d9
Merge pull request #27139 from GianlucaNordio:patch-4
Fix minAreaRect and boxPoints docs (#26799) #27139

As requested from issue #26799 the docs regarding minAreaRect and boxPoints are extended specifying the order of the corners for boxPoints and the way the angle is computed for the rotated rect returned by minAreaRect
2025-03-25 09:30:56 +03:00
Yuantao Feng
a2a2f37ebb
Merge pull request #27115 from fengyuentau:4x/hal_rvv/normDiff
core: refactored normDiff in hal_rvv and extended with support of more data types #27115 

Merge wtih https://github.com/opencv/opencv_extra/pull/1246.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-03-25 07:59:59 +03:00
Alexander Smorkalov
7d87f3cda6
Merge pull request #27132 from MaximSmolskiy:add_planar_accuracy_tests_for_solvePnPRansac
Add planar accuracy tests for solvePnPRansac
2025-03-24 10:46:08 +03:00
Alexander Smorkalov
4f4767cb9c
Merge pull request #27125 from asmorkalov:as/ipp_minmax
Move IPP minMaxIdx to HAL
2025-03-24 10:33:32 +03:00
Alexander Smorkalov
bc5545c3e6
Merge pull request #27133 from shyama7004:fix-ptp
minor changes : Replace ndarray.ptp() with np.ptp() for NumPy 2.0 Compatibility
2025-03-24 10:11:44 +03:00
Alexander Smorkalov
d6966f82a3
Merge pull request #27130 from Ma-gi-cian:add-stereobm-docs
Add documentation for StereoBM parameters (fixes #26816)
2025-03-24 09:31:03 +03:00
Alexander Smorkalov
a77623a32b Move IPP minMaxIdx to HAL. 2025-03-24 09:21:22 +03:00
Alexander Smorkalov
0944f7ad26
Merge pull request #27128 from asmorkalov:as/ipp_norm
Move IPP norm and normDiff to HAL #27128

Continues https://github.com/opencv/opencv/pull/26880

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-03-24 09:17:22 +03:00
shyama7004
ef474e06fc minor changes : Replace ndarray.ptp() with np.ptp() for NumPy 2.0 Compatibility 2025-03-23 23:38:33 +05:30
MaximSmolskiy
5db60e1621 Add planar accuracy tests for solvePnPRansac 2025-03-23 18:24:10 +03:00
Aditya Jha
64535757df Add documentation for StereoBM parameters (fixes #26816) 2025-03-23 13:18:17 +05:30
Alexander Smorkalov
01ef38dcad
Merge pull request #26880 from asmorkalov:as/ipp_hal
Initial version of IPP-based HAL for x86 and x86_64 platforms #26880

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-03-22 09:31:42 +03:00
Alexander Smorkalov
8566930922
Merge pull request #27114 from CyberWarrior5466:4.x
Update old URL
2025-03-21 15:23:54 +03:00
cudawarped
1d9dda3f09
Merge pull request #27112 from cudawarped:add_cuda_c++17
cuda: Force C++17 Standard for CUDA targets when CUDA Toolkit >=12.8 #27112

Fix https://github.com/opencv/opencv/issues/27095.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-03-21 14:41:49 +03:00
Alexander Smorkalov
afc4a9ac51
Merge pull request #25027 from opencv-pushbot:gitee/alalek/tests_filter_debug
video(test): filter very long debug tests
2025-03-21 10:22:17 +03:00
天音あめ
46bd22abad
Fix RISC-V HAL solve:SVD and BGRtoLab (#27046)
Fix RISC-V HAL solve/SVD and BGRtoLab #27046

Closes #27044.

Also suppressed some warnings in other HAL.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-03-21 10:18:51 +03:00
Ali Saleem
4e488a0b16
Update old URL 2025-03-20 19:45:58 +00:00
Scorpion1234567
2e9345570f
Merge pull request #27108 from Scorpion1234567:Multithreading-wrapPolar
When WARP_INVERSE_MAP is used, accelerate the calculation with multi-threading #27108

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-03-20 17:46:18 +03:00
Vincent Rabaud
c10b800837
Merge pull request #27081 from vrabaud:lzw
GIF: Make sure to resize lzwExtraTable before each block #27081

This fixes https://g-issues.oss-fuzz.com/issues/403364362

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-03-20 17:31:47 +03:00
Alexander Smorkalov
41e3fcc73e
Merge pull request #27087 from sturkmen72:apng_CV_16U
Fixing imread() function 16 bit APNG reading problem
2025-03-20 17:29:45 +03:00
天音あめ
ec5f7bb9f1
Merge pull request #27097 from amane-ame:blur_hal_rvv
Add RISC-V HAL implementation for cv::blur series #27097

This patch implements `cv_hal_gaussianBlurBinomial`, `cv_hal_medianBlur`, `cv_hal_boxFilter` and `cv_hal_bilateralFilter` using native intrinsics, optimizing the performance of `cv::GaussianBlur/cv::medianBlur/cv::boxFilter/cv::bilateralFilter` for `3x3/5x5` kernels.

Tested on MUSE-PI (Spacemit X60) for both gcc 14.2 and clang 20.0.

```
$ ./opencv_test_imgproc --gtest_filter="*Filter*:*Blur*"
$ ./opencv_perf_imgproc --gtest_filter="*gauss*:*box*:*Bilateral*:*median*" --perf_min_samples=2000 --perf_force_samples=2000
```

View the full perf table here: [hal_rvv_blur.pdf](https://github.com/user-attachments/files/19335582/hal_rvv_blur.pdf)

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-03-20 12:59:59 +03:00
Alexander Smorkalov
50072f8d4f
Merge pull request #27089 from amane-ame:hist_hal_rvv
Add RISC-V HAL implementation for cv::equalizeHist
2025-03-20 12:21:00 +03:00
天音あめ
46fbe1895a
Merge pull request #27096 from amane-ame:moments_hal_rvv
Add RISC-V HAL implementation for cv::moments #27096

This patch implements `cv_hal_imageMoments` using native intrinsics, optimizing the performance of `cv::moments` for data types `CV_16U/CV_16S/CV_32F/CV_64F`.

Tested on MUSE-PI (Spacemit X60) for both gcc 14.2 and clang 20.0.

```
$ ./opencv_test_imgproc --gtest_filter="*Moments*"
$ ./opencv_perf_imgproc --gtest_filter="*Moments*" --perf_min_samples=1000 --perf_force_samples=1000
```

![image](https://github.com/user-attachments/assets/0efbae10-c022-4f15-a81c-682514cdb372)

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-03-20 10:50:06 +03:00
Alexander Smorkalov
67ffb230f1
Merge pull request #27104 from JavaTypedScript:fix#27092
fixed typo
2025-03-20 08:59:02 +03:00
Suleyman TURKMEN
0ed5556cee Add a test to check whether cv::imread successfully reads 16-bit APNG images.
Make proper fixes to pass the test
2025-03-19 21:08:01 +03:00
JavaTypedScript
259ec3674d fixed typo 2025-03-19 21:38:08 +05:30
Kumataro
3e43d0cfca
Merge pull request #26971 from Kumataro:fix26970
imgcodecs: gif: support animated gif without loop #26971

Close #26970

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2025-03-19 14:24:08 +03:00
amane-ame
b902a8e792 Add equalize_hist.
Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
2025-03-18 15:53:05 +08:00
Yuantao Feng
8207549638
Merge pull request #26991 from fengyuentau:4x/core/norm2hal_rvv
core: improve norm of hal rvv #26991

Merge with https://github.com/opencv/opencv_extra/pull/1241

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-03-18 09:42:55 +03:00
天音あめ
0142231e4d
Merge pull request #27072 from amane-ame:thresh_hal_rvv
Add RISC-V HAL implementation for cv::threshold and cv::adaptiveThreshold #27072

This patch implements `cv_hal_threshold_otsu` and `cv_hal_adaptiveThreshold` using native intrinsics, optimizing the performance of `cv::threshold(THRESH_OTSU)` and `cv::adaptiveThreshold`.

Since UI is as fast as HAL `cv_hal_rvv::threshold::threshold` so `cv_hal_threshold` is not redirected, but this part of HAL is keeped because `cv_hal_threshold_otsu` depends on it.

Tested on MUSE-PI (Spacemit X60) for both gcc 14.2 and clang 20.0.

```
$ ./opencv_test_imgproc --gtest_filter="*thresh*:*Thresh*"
$ ./opencv_perf_imgproc --gtest_filter="*otsu*:*adaptiveThreshold*" --perf_min_samples=1000 --perf_force_samples=1000
```

![image](https://github.com/user-attachments/assets/4bb953f8-8589-4af1-8f1c-99e2c506be3c)

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-03-18 09:24:00 +03:00
iiiuhuy
855f20fdfe
Merge pull request #27082 from iiiuhuy:fix_bug
displayOverlay doesn't disappear after timeout #27082

Fixes #26555

### Expected Behaviour
An overlay should be displayed atop an image and then disappear after `delayms` has timed out, but it doesn't. Also, `displayStatusBar` doesn't appear to set any text on the window.

### Actual Behaviour
The overlay appears but doesn't disappear unless a mouse move event happens on the image.

### Changes
- Fixed the issue with `displayOverlay` not disappearing after the timeout.

### Checklist
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV.
- [x] The PR is proposed to the proper branch.
- [x] There is a reference to the original bug report and related work.
- [ ] There is accuracy test, performance test, and test data in the opencv_extra repository, if applicable.
- [ ] The feature is well documented, and sample code can be built with the project CMake.
2025-03-17 19:36:06 +03:00
GenshinImpactStarts
2090407002
Merge pull request #26999 from GenshinImpactStarts:polar_to_cart
[HAL RVV] unify and impl polar_to_cart | add perf test #26999

### Summary

1. Implement through the existing `cv_hal_polarToCart32f` and `cv_hal_polarToCart64f` interfaces.
2. Add `polarToCart` performance tests
3. Make `cv::polarToCart` use CALL_HAL in the same way as `cv::cartToPolar`
4. To achieve the 3rd point, the original implementation was moved, and some modifications were made.

Tested through:
```sh
opencv_test_core --gtest_filter="*PolarToCart*:*Core_CartPolar_reverse*" 
opencv_perf_core --gtest_filter="*PolarToCart*" --perf_min_samples=300 --perf_force_samples=300
```

### HAL performance test

***UPDATE***: Current implementation is no more depending on vlen.

**NOTE**: Due to the 4th point in the summary above, the `scalar` and `ui` test is based on the modified code of this PR. The impact of this patch on `scalar` and `ui` is evaluated in the next section, `Effect of Point 4`.

Vlen 256 (Muse Pi):
```
                   Name of Test                     scalar    ui     rvv       ui        rvv    
                                                                               vs         vs    
                                                                             scalar     scalar  
                                                                           (x-factor) (x-factor)
PolarToCart::PolarToCartFixture::(127x61, 32FC1)     0.315  0.110  0.034     2.85       9.34   
PolarToCart::PolarToCartFixture::(127x61, 64FC1)     0.423  0.163  0.045     2.59       9.34   
PolarToCart::PolarToCartFixture::(640x480, 32FC1)   13.695  4.325  1.278     3.17      10.71   
PolarToCart::PolarToCartFixture::(640x480, 64FC1)   17.719  7.118  2.105     2.49       8.42   
PolarToCart::PolarToCartFixture::(1280x720, 32FC1)  40.678  13.114 3.977     3.10      10.23   
PolarToCart::PolarToCartFixture::(1280x720, 64FC1)  53.124  21.298 6.519     2.49       8.15   
PolarToCart::PolarToCartFixture::(1920x1080, 32FC1) 95.158  29.465 8.894     3.23      10.70   
PolarToCart::PolarToCartFixture::(1920x1080, 64FC1) 119.262 47.743 14.129    2.50       8.44   
```

### Effect of Point 4

To make `cv::polarToCart` behave the same as `cv::cartToPolar`, the implementation detail of the former has been moved to the latter's location (from `mathfuncs.cpp` to `mathfuncs_core.simd.hpp`).

#### Reason for Changes:

This function works as follows:  
$y = \text{mag} \times \sin(\text{angle})$ and $x = \text{mag} \times \cos(\text{angle})$. The original implementation first calculates the values of $\sin$ and $\cos$, storing the results in the output buffers $x$ and $y$, and then multiplies the result by $\text{mag}$. 

However, when the function is used as an in-place operation (one of the output buffers is also an input buffer), the original implementation allocates an extra buffer to store the $\sin$ and $\cos$ values in case the $\text{mag}$ value gets overwritten. This extra buffer allocation prevents `cv::polarToCart` from functioning in the same way as `cv::cartToPolar`.

Therefore, the multiplication is now performed immediately without storing intermediate values. Since the original implementation also had AVX2 optimizations, I have applied the same optimizations to the AVX2 version of this implementation.

***UPDATE***: UI use v_sincos from #25892 now. The original implementation has AVX2 optimizations but is slower much than current UI so it's removed, and AVX2 perf test is below. Scalar implementation isn't changed because it's faster than using UI's method.

#### Test Result

`scalar` and `ui` test is done on Muse PI, and AVX2 test is done on Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz.

`scalar` test:
```
                   Name of Test                      orig     pr        pr    
                                                                        vs    
                                                                       orig   
                                                                    (x-factor)
PolarToCart::PolarToCartFixture::(127x61, 32FC1)     0.333   0.294     1.13   
PolarToCart::PolarToCartFixture::(127x61, 64FC1)     0.385   0.403     0.96   
PolarToCart::PolarToCartFixture::(640x480, 32FC1)   14.749  12.343     1.19   
PolarToCart::PolarToCartFixture::(640x480, 64FC1)   19.419  16.743     1.16   
PolarToCart::PolarToCartFixture::(1280x720, 32FC1)  44.155  37.822     1.17   
PolarToCart::PolarToCartFixture::(1280x720, 64FC1)  62.108  50.358     1.23   
PolarToCart::PolarToCartFixture::(1920x1080, 32FC1) 99.011  85.769     1.15   
PolarToCart::PolarToCartFixture::(1920x1080, 64FC1) 127.740 112.874    1.13   
```

`ui` test:
```
                   Name of Test                      orig     pr        pr    
                                                                        vs    
                                                                       orig   
                                                                    (x-factor)
PolarToCart::PolarToCartFixture::(127x61, 32FC1)     0.306  0.110     2.77   
PolarToCart::PolarToCartFixture::(127x61, 64FC1)     0.455  0.163     2.79   
PolarToCart::PolarToCartFixture::(640x480, 32FC1)   13.381  4.325     3.09   
PolarToCart::PolarToCartFixture::(640x480, 64FC1)   21.851  7.118     3.07   
PolarToCart::PolarToCartFixture::(1280x720, 32FC1)  39.975  13.114    3.05   
PolarToCart::PolarToCartFixture::(1280x720, 64FC1)  67.006  21.298    3.15   
PolarToCart::PolarToCartFixture::(1920x1080, 32FC1) 90.362  29.465    3.07   
PolarToCart::PolarToCartFixture::(1920x1080, 64FC1) 129.637 47.743    2.72   
```

AVX2 test:
```
                   Name of Test                     orig   pr       pr    
                                                                    vs    
                                                                   orig   
                                                                (x-factor)
PolarToCart::PolarToCartFixture::(127x61, 32FC1)    0.019 0.009    2.11   
PolarToCart::PolarToCartFixture::(127x61, 64FC1)    0.022 0.013    1.74   
PolarToCart::PolarToCartFixture::(640x480, 32FC1)   0.788 0.355    2.22   
PolarToCart::PolarToCartFixture::(640x480, 64FC1)   1.102 0.618    1.78   
PolarToCart::PolarToCartFixture::(1280x720, 32FC1)  2.383 1.042    2.29   
PolarToCart::PolarToCartFixture::(1280x720, 64FC1)  3.758 2.316    1.62   
PolarToCart::PolarToCartFixture::(1920x1080, 32FC1) 5.577 2.559    2.18   
PolarToCart::PolarToCartFixture::(1920x1080, 64FC1) 9.710 6.424    1.51   
```

A slight performance loss occurs because the check for whether $mag$ is nullptr is performed with every calculation, instead of being done once per batch. This is to reuse current `SinCos_32f` function.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-03-17 14:16:09 +03:00
Maxim Smolskiy
b6f213a8c7
Merge pull request #27079 from MaximSmolskiy:add-test-for-ArucoDetector-detectMarkers
Add test for ArucoDetector::detectMarkers #27079

### Pull Request Readiness Checklist

Related to #26968 and #26922

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2025-03-17 11:24:46 +03:00
Alexander Smorkalov
0a39f98bee
Merge pull request #27067 from amane-ame:sepfilter_optimize
Optimize RISC-V HAL cv::sepFilter
2025-03-17 09:21:33 +03:00
Liutong HAN
6eaaaa410e
Merge pull request #27056 from hanliutong:rvv-hal-copyright
[RVV HAL] Add copyright and replace '#pragma once'. #27056

Add copyright and in RVV HAL, since other companies or teams may join the development and add their copyright.

And the '#pragma once' are replaced.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-03-15 17:25:31 +03:00
amane-ame
2c16f3b7d2 Optimize cv::sepFilter.
Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
2025-03-14 18:33:57 +08:00
Alexander Smorkalov
14396b8029
Merge pull request #27047 from vrabaud:lzw
Move the CV_Assert above the << operation to not trigger the fuzzer
2025-03-14 09:01:15 +03:00
GenshinImpactStarts
2a8d4b8e43
Merge pull request #27000 from GenshinImpactStarts:cart_to_polar
[HAL RVV] reuse atan | impl cart_to_polar | add perf test #27000

Implement through the existing `cv_hal_cartToPolar32f` and `cv_hal_cartToPolar64f` interfaces.

Add `cartToPolar` performance tests.

cv_hal_rvv::fast_atan is modified to make it more reusable because it's needed in cartToPolar.

**UPDATE**: UI enabled. Since the vec type of RVV can't be stored in struct. UI implementation of `v_atan_f32` is modified. Both `fastAtan` and `cartToPolar` are affected so the test result for `atan` is also appended. I have tested the modified UI on RVV and AVX2 and no regressions appears.

Perf test done on MUSE-PI. AVX2 test done on Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz.

```sh
$ opencv_test_core --gtest_filter="*CartToPolar*:*Core_CartPolar_reverse*:*Phase*" 
$ opencv_perf_core --gtest_filter="*CartToPolar*:*phase*" --perf_min_samples=300 --perf_force_samples=300
```

Test result between enabled UI and HAL:
```
                   Name of Test                       ui    rvv      rvv    
                                                                      vs    
                                                                      ui    
                                                                  (x-factor)
CartToPolar::CartToPolarFixture::(127x61, 32FC1)    0.106  0.059     1.80   
CartToPolar::CartToPolarFixture::(127x61, 64FC1)    0.155  0.070     2.20   
CartToPolar::CartToPolarFixture::(640x480, 32FC1)   4.188  2.317     1.81   
CartToPolar::CartToPolarFixture::(640x480, 64FC1)   6.593  2.889     2.28   
CartToPolar::CartToPolarFixture::(1280x720, 32FC1)  12.600 7.057     1.79   
CartToPolar::CartToPolarFixture::(1280x720, 64FC1)  19.860 8.797     2.26   
CartToPolar::CartToPolarFixture::(1920x1080, 32FC1) 28.295 15.809    1.79   
CartToPolar::CartToPolarFixture::(1920x1080, 64FC1) 44.573 19.398    2.30   
phase32f::VectorLength::128                         0.002  0.002     1.20   
phase32f::VectorLength::1000                        0.008  0.006     1.32   
phase32f::VectorLength::131072                      1.061  0.731     1.45   
phase32f::VectorLength::524288                      3.997  2.976     1.34   
phase32f::VectorLength::1048576                     8.001  5.959     1.34   
phase64f::VectorLength::128                         0.002  0.002     1.33   
phase64f::VectorLength::1000                        0.012  0.008     1.58   
phase64f::VectorLength::131072                      1.648  0.931     1.77   
phase64f::VectorLength::524288                      6.836  3.837     1.78   
phase64f::VectorLength::1048576                     14.060 7.540     1.86   
```

Test result before and after enabling UI on RVV:
```
                   Name of Test                      perf   perf     perf   
                                                      ui     ui       ui    
                                                     orig    pr       pr    
                                                                      vs    
                                                                     perf   
                                                                      ui    
                                                                     orig   
                                                                  (x-factor)
CartToPolar::CartToPolarFixture::(127x61, 32FC1)    0.141  0.106     1.33   
CartToPolar::CartToPolarFixture::(127x61, 64FC1)    0.187  0.155     1.20   
CartToPolar::CartToPolarFixture::(640x480, 32FC1)   5.990  4.188     1.43   
CartToPolar::CartToPolarFixture::(640x480, 64FC1)   8.370  6.593     1.27   
CartToPolar::CartToPolarFixture::(1280x720, 32FC1)  18.214 12.600    1.45   
CartToPolar::CartToPolarFixture::(1280x720, 64FC1)  25.365 19.860    1.28   
CartToPolar::CartToPolarFixture::(1920x1080, 32FC1) 40.437 28.295    1.43   
CartToPolar::CartToPolarFixture::(1920x1080, 64FC1) 56.699 44.573    1.27   
phase32f::VectorLength::128                         0.003  0.002     1.54   
phase32f::VectorLength::1000                        0.016  0.008     1.90   
phase32f::VectorLength::131072                      2.048  1.061     1.93   
phase32f::VectorLength::524288                      8.219  3.997     2.06   
phase32f::VectorLength::1048576                     16.426 8.001     2.05   
phase64f::VectorLength::128                         0.003  0.002     1.44   
phase64f::VectorLength::1000                        0.020  0.012     1.60   
phase64f::VectorLength::131072                      2.621  1.648     1.59   
phase64f::VectorLength::524288                      10.780 6.836     1.58   
phase64f::VectorLength::1048576                     22.723 14.060    1.62   
```

Test result before and after modifying UI on AVX2:
```
                   Name of Test                     perf  perf     perf   
                                                    avx2  avx2     avx2   
                                                    orig   pr       pr    
                                                                    vs    
                                                                   perf   
                                                                   avx2   
                                                                   orig   
                                                                (x-factor)
CartToPolar::CartToPolarFixture::(127x61, 32FC1)    0.006 0.005    1.14   
CartToPolar::CartToPolarFixture::(127x61, 64FC1)    0.010 0.009    1.08   
CartToPolar::CartToPolarFixture::(640x480, 32FC1)   0.273 0.264    1.03   
CartToPolar::CartToPolarFixture::(640x480, 64FC1)   0.511 0.487    1.05   
CartToPolar::CartToPolarFixture::(1280x720, 32FC1)  0.760 0.723    1.05   
CartToPolar::CartToPolarFixture::(1280x720, 64FC1)  2.009 1.937    1.04   
CartToPolar::CartToPolarFixture::(1920x1080, 32FC1) 1.996 1.923    1.04   
CartToPolar::CartToPolarFixture::(1920x1080, 64FC1) 5.721 5.509    1.04   
phase32f::VectorLength::128                         0.000 0.000    0.98   
phase32f::VectorLength::1000                        0.001 0.001    0.97   
phase32f::VectorLength::131072                      0.105 0.111    0.95   
phase32f::VectorLength::524288                      0.402 0.402    1.00   
phase32f::VectorLength::1048576                     0.775 0.767    1.01   
phase64f::VectorLength::128                         0.000 0.000    1.00   
phase64f::VectorLength::1000                        0.001 0.001    1.01   
phase64f::VectorLength::131072                      0.163 0.162    1.01   
phase64f::VectorLength::524288                      0.669 0.653    1.02   
phase64f::VectorLength::1048576                     1.660 1.634    1.02   
```

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-03-13 15:56:56 +03:00
Alexander Smorkalov
b129abfdaa
Merge pull request #27055 from hanliutong:UI-loop-condition
Fix some vectorized loop conditions.
2025-03-13 14:12:30 +03:00
Vincent Rabaud
186537a315 Move the CV_Assert above the << operation to not trigger the fuzzer 2025-03-13 10:00:49 +01:00
Liutong HAN
fd62bd0991 Relax the loop condition to process the final batch. 2025-03-13 07:54:41 +00:00
GenshinImpactStarts
e30697fd42
Merge pull request #27002 from GenshinImpactStarts:magnitude
[HAL RVV] impl magnitude | add perf test #27002

Implement through the existing `cv_hal_magnitude32f` and `cv_hal_magnitude64f` interfaces.

**UPDATE**: UI is enabled. The only difference between UI and HAL now is HAL use a approximate `sqrt`.

Perf test done on MUSE-PI.

```sh
$ opencv_test_core --gtest_filter="*Magnitude*"
$ opencv_perf_core --gtest_filter="*Magnitude*" --perf_min_samples=300 --perf_force_samples=300
```

Test result between enabled UI and HAL:
```
                 Name of Test                     ui    rvv      rvv    
                                                                  vs    
                                                                  ui    
                                                              (x-factor)
Magnitude::MagnitudeFixture::(127x61, 32FC1)    0.029  0.016     1.75   
Magnitude::MagnitudeFixture::(127x61, 64FC1)    0.057  0.036     1.57   
Magnitude::MagnitudeFixture::(640x480, 32FC1)   1.063  0.648     1.64   
Magnitude::MagnitudeFixture::(640x480, 64FC1)   2.261  1.530     1.48   
Magnitude::MagnitudeFixture::(1280x720, 32FC1)  3.261  2.118     1.54   
Magnitude::MagnitudeFixture::(1280x720, 64FC1)  6.802  4.682     1.45   
Magnitude::MagnitudeFixture::(1920x1080, 32FC1) 7.287  4.738     1.54   
Magnitude::MagnitudeFixture::(1920x1080, 64FC1) 15.226 10.334    1.47   
```

Test result before and after enabling UI:
```
                 Name of Test                    orig    pr       pr    
                                                                  vs    
                                                                 orig   
                                                              (x-factor)
Magnitude::MagnitudeFixture::(127x61, 32FC1)    0.032  0.029     1.11   
Magnitude::MagnitudeFixture::(127x61, 64FC1)    0.067  0.057     1.17   
Magnitude::MagnitudeFixture::(640x480, 32FC1)   1.228  1.063     1.16   
Magnitude::MagnitudeFixture::(640x480, 64FC1)   2.786  2.261     1.23   
Magnitude::MagnitudeFixture::(1280x720, 32FC1)  3.762  3.261     1.15   
Magnitude::MagnitudeFixture::(1280x720, 64FC1)  8.549  6.802     1.26   
Magnitude::MagnitudeFixture::(1920x1080, 32FC1) 8.408  7.287     1.15   
Magnitude::MagnitudeFixture::(1920x1080, 64FC1) 18.884 15.226    1.24   
```

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-03-13 08:34:11 +03:00
Vincent Rabaud
71fe903121
Merge pull request #27040 from vrabaud:png_leak
Make sure there are enough channels to check for opacity #27040

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-03-12 21:06:01 +03:00
Alexander Smorkalov
7481cb50b5
Merge pull request #27013 from asmorkalov:as/imencode_animation
Test for in-memory animation encoding and decoding #27013
 
Tests for https://github.com/opencv/opencv/pull/26964

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-03-12 18:10:06 +03:00
Alexander Smorkalov
bbcdbca872
Merge pull request #27041 from asmorkalov:as/decolor_opt
Local decolor pipeline optimization
2025-03-12 18:03:13 +03:00
Pierre Chatelier
d83df66ff0
Merge pull request #26834 from chacha21:findContours_speedup
Find contours speedup #26834

It is an attempt, as suggested by #26775, to restore lost speed when migrating `findContours()` implementation from C to C++

The patch adds an "Arena" (a pool) of pre-allocated memory so that contours points (and TreeNodes) can be picked from the Arena.
The code of `findContours()` is mostly unchanged, the arena usage being implicit through a utility class Arena::Item that provides C++ overloaded operators and construct/destruct logic.

As mentioned in #26775, the contour points are allocated and released in order, and can be represented by ranges of indices in their arena. No range subset will be released and drill a hole, that's why the internal representation as a range of indices makes sense.

The TreeNodes use another Arena class that does not comply to that range logic.

Currently, there is a significant improvement of the run-time on the test mentioned in #26775, but it is still far from the `findContours_legacy()` performance.


- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [X] The PR is proposed to the proper branch
- [X] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-03-12 18:00:01 +03:00
Pierre Chatelier
0db6a496ba
Merge pull request #26842 from chacha21:threshold_with_mask
Added optional mask to cv::threshold #26842
 
Proposal for #26777

To avoid code duplication, and keep performance when no mask is used, inner implementation always propagate the const cv::Mat& mask, but they use a template<bool useMask> parameter that let the compiler optimize out unnecessary tests when the mask is not to be used.

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [X] The PR is proposed to the proper branch
- [X] There is a reference to the original bug report and related work
- [X] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-03-12 17:55:07 +03:00
Alexander Smorkalov
49ab8121b7
Merge pull request #27050 from hanliutong:rvv-fix-27003
RISC-V: Fix #27003.
2025-03-12 17:32:19 +03:00
Yuantao Feng
eefa327f30
Merge pull request #27042 from fengyuentau:4x/core/normDiff_simd
core: vectorize normDiff with universal intrinsics #27042

Merge with https://github.com/opencv/opencv_extra/pull/1242.

Performance results on Desktop Intel i7-12700K, Apple M2, Jetson Orin and SpaceMIT K1:

[perf-normDiff.zip](https://github.com/user-attachments/files/19178689/perf-normDiff.zip)


### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-03-12 16:43:10 +03:00
Liutong HAN
2969b67bd7 Fix 27003. 2025-03-12 12:15:05 +00:00
Maxim Smolskiy
46dbc57a86
Merge pull request #26968 from MaximSmolskiy:fix-Aruco-marker-incorrect-detection-near-image-edge
Fix Aruco marker incorrect detection near image edge #26968

### Pull Request Readiness Checklist

Fix #26922 

As I understood the algorithm, at the first stage we search for the contours of the marker several times (adaptive threshold with different windows sizes). Therefore, for the same marker, we get several contours (inner and outer with different sizes due to the different windows sizes). In the second stage, we group the contours for the same marker into one group, from which we take the largest contour as the best candidate (which should best match the border of the marker).

The problem is that using the `minDistanceToBorder` parameter, we discard contours at the first stage. Thus, we discard the best candidates most appropriate to the marker border, and inner contours may remain, representing a significantly smaller marker border (which we observe in the issue).

But if we use the `minDistanceToBorder` parameter to discard the best candidate of the group at the second stage, then there will be no such problems and we will completely discard markers located too close to the border of the image.

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2025-03-12 09:47:49 +03:00
GenshinImpactStarts
60de3ff24f
Merge pull request #27015 from GenshinImpactStarts:sqrt
[HAL RVV] impl sqrt and invSqrt #27015

Implement through the existing interfaces `cv_hal_sqrt32f`, `cv_hal_sqrt64f`, `cv_hal_invSqrt32f`, `cv_hal_invSqrt64f`.

Perf test done on MUSE-PI and CanMV K230. Because the performance of scalar is much worse than universal intrinsic, only ui and hal rvv is compared.

In RVV's UI, `invSqrt` is computed using `1 / sqrt()`. This patch first uses `frsqrt` and then applies the Newton-Raphson method to achieve higher precision. For the initial value, I tried using the famous [fast inverse square root algorithm](https://en.wikipedia.org/wiki/Fast_inverse_square_root), which involves one bit shift and one subtraction. However, on both MUSE-PI and CanMV K230, the performance was slightly lower (about 3%), so I chose to use `frsqrt` for the initial value instead. 

BTW, I think this patch can directly replace RVV's UI.

**UPDATE**: Due to strange vector registers allocation strategy in clang, for `invSqrt`, clang use LMUL m4 while gcc use LMUL m8, which leads to some performance loss in clang. So the test for clang is appended.

```sh
$ opencv_test_core --gtest_filter="Core_HAL/mathfuncs.*"
$ opencv_perf_core --gtest_filter="SqrtFixture.*" --perf_min_samples=300 --perf_force_samples=300
```

CanMV K230:
```
              Name of Test                 ui    rvv      rvv    
                                                           vs    
                                                           ui    
                                                       (x-factor)
Sqrt::SqrtFixture::(127x61, 5, false)    0.052  0.027     1.96   
Sqrt::SqrtFixture::(127x61, 5, true)     0.101  0.026     3.80   
Sqrt::SqrtFixture::(127x61, 6, false)    0.106  0.059     1.79   
Sqrt::SqrtFixture::(127x61, 6, true)     0.207  0.058     3.55   
Sqrt::SqrtFixture::(640x480, 5, false)   1.988  0.956     2.08   
Sqrt::SqrtFixture::(640x480, 5, true)    3.920  0.948     4.13   
Sqrt::SqrtFixture::(640x480, 6, false)   4.179  2.342     1.78   
Sqrt::SqrtFixture::(640x480, 6, true)    8.220  2.290     3.59   
Sqrt::SqrtFixture::(1280x720, 5, false)  5.969  2.881     2.07   
Sqrt::SqrtFixture::(1280x720, 5, true)   11.731 2.857     4.11   
Sqrt::SqrtFixture::(1280x720, 6, false)  12.533 7.031     1.78   
Sqrt::SqrtFixture::(1280x720, 6, true)   24.643 6.917     3.56   
Sqrt::SqrtFixture::(1920x1080, 5, false) 13.423 6.483     2.07   
Sqrt::SqrtFixture::(1920x1080, 5, true)  26.379 6.436     4.10   
Sqrt::SqrtFixture::(1920x1080, 6, false) 28.200 15.833    1.78   
Sqrt::SqrtFixture::(1920x1080, 6, true)  55.434 15.565    3.56   
```

MUSE-PI:
```
                                                 GCC              |        clang            
              Name of Test                 ui    rvv      rvv     |   ui    rvv      rvv    
                                                           vs     |                   vs    
                                                           ui     |                   ui    
                                                       (x-factor) |               (x-factor)
Sqrt::SqrtFixture::(127x61, 5, false)    0.027  0.018     1.46    | 0.027  0.016     1.65   
Sqrt::SqrtFixture::(127x61, 5, true)     0.050  0.017     2.98    | 0.050  0.017     2.99   
Sqrt::SqrtFixture::(127x61, 6, false)    0.053  0.031     1.72    | 0.052  0.032     1.64   
Sqrt::SqrtFixture::(127x61, 6, true)     0.100  0.030     3.31    | 0.101  0.035     2.86   
Sqrt::SqrtFixture::(640x480, 5, false)   0.955  0.483     1.98    | 0.959  0.499     1.92   
Sqrt::SqrtFixture::(640x480, 5, true)    1.873  0.489     3.83    | 1.873  0.520     3.60   
Sqrt::SqrtFixture::(640x480, 6, false)   2.027  1.163     1.74    | 2.037  1.218     1.67   
Sqrt::SqrtFixture::(640x480, 6, true)    3.961  1.153     3.44    | 3.961  1.341     2.95   
Sqrt::SqrtFixture::(1280x720, 5, false)  2.916  1.538     1.90    | 2.912  1.598     1.82   
Sqrt::SqrtFixture::(1280x720, 5, true)   5.735  1.534     3.74    | 5.726  1.661     3.45   
Sqrt::SqrtFixture::(1280x720, 6, false)  6.121  3.585     1.71    | 6.109  3.725     1.64   
Sqrt::SqrtFixture::(1280x720, 6, true)   12.059 3.501     3.44    | 12.053 4.080     2.95   
Sqrt::SqrtFixture::(1920x1080, 5, false) 6.540  3.535     1.85    | 6.540  3.643     1.80   
Sqrt::SqrtFixture::(1920x1080, 5, true)  12.943 3.445     3.76    | 12.908 3.706     3.48   
Sqrt::SqrtFixture::(1920x1080, 6, false) 13.714 8.062     1.70    | 13.711 8.376     1.64   
Sqrt::SqrtFixture::(1920x1080, 6, true)  27.011 7.989     3.38    | 27.115 9.245     2.93   
```

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-03-12 08:34:27 +03:00
Suleyman TURKMEN
656038346b
Merge pull request #26441 from sturkmen72:upd_tutorials
Update tutorials #26441

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-03-11 16:17:21 +03:00
Alexander Smorkalov
1f63b986a1
Merge pull request #26976 from MaximSmolskiy/refactor-ArucoDetector-ArucoDetectorImpl-filterTooCloseCandidates
Refactor ArucoDetector::ArucoDetectorImpl::filterTooCloseCandidates
2025-03-11 16:10:48 +03:00
Alexander Smorkalov
a48e78cdfc
Merge pull request #27026 from amane-ame/filter_hal_rvv
Add RISC-V HAL implementation for cv::filter series
2025-03-11 16:09:45 +03:00
Alexander Smorkalov
d9956fc24f
Merge pull request #26934 from BenjaminKnecht/new_4.x
Extend ArUcoDetector to run multiple dictionaries in an efficient manner.
2025-03-11 14:37:00 +03:00
amane-ame
2dd72201af Remove CV_ASSERT.
Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
2025-03-11 18:37:58 +08:00
Alexander Smorkalov
6fb082ae7f
Merge pull request #27001 from DanBmh/opt_newoptcm
Optimize camera matrix undistortion
2025-03-11 12:47:35 +03:00
amane-ame
d9ec808b15 Use the macro from interface.h.
Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
2025-03-11 17:44:55 +08:00
Alexander Smorkalov
fa092b4597
Merge pull request #27043 from asmorkalov/as/debayer_warn_fix
Warning fix on Windows.
2025-03-11 12:07:57 +03:00
Alexander Smorkalov
f833519506 Warning fix on Windows. 2025-03-11 11:17:20 +03:00
Alexander Smorkalov
4be88e934f
Merge pull request #27010 from GenshinImpactStarts/exp_log
[HAL RVV] impl exp and log | add log perf test
2025-03-11 10:51:03 +03:00
Alexander Smorkalov
e342d2f339 Local decolor pipeline optimization. 2025-03-11 10:16:01 +03:00
Alexander Smorkalov
4bb57ceb73
Merge pull request #26868 from FantasqueX/bayer2gray-simd-2
Use universal intrinsics in bayer2gray
2025-03-11 09:55:09 +03:00
Alexander Smorkalov
2fbb310265
Merge pull request #27037 from sturkmen72/ImageCollection_animations
Add a test to ensure ImageCollection class works good with animations
2025-03-11 08:18:22 +03:00
Suleyman TURKMEN
6004badce2 ImageCollection animations 2025-03-10 21:02:43 +03:00
Pierre Chatelier
e813326c17
Merge pull request #27039 from chacha21:threshold_otsu_doc_update
Threshold otsu doc update #27039 
 
PR for #27038

(I had already done that, but encounters git madness after branch renaming)

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [X] The PR is proposed to the proper branch
- [X] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-03-10 19:40:45 +03:00
Alexander Smorkalov
3236436892
Merge pull request #27036 from CodeLinaro:xuezha_3rdPost
Fix gaussianBlur5x5 performance regression
2025-03-10 18:21:20 +03:00
Xue Zhang
accebdecf7 Fix gaussianBlur5x5 performance regression 2025-03-10 16:16:56 +05:30
Alexander Smorkalov
316b5d7b08
Merge pull request #27031 from sturkmen72:libjpeg-turbo_ver_3.1.0
Libjpeg-turbo update to version 3.1.0
2025-03-10 13:44:00 +03:00
Daniel
f4a2c35c73 Small updates. 2025-03-10 11:22:24 +01:00
amane-ame
54da5c3e77 Add some algorithm comments.
Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
2025-03-10 16:42:58 +08:00
GenshinImpactStarts
830d031213
Merge pull request #26977 from GenshinImpactStarts:helper_hal_rvv
[Refactor](HAL RVV): Consolidate Helpers for Code Reusability #26977

This PR introduces a new helper file with utility types and templates to standardize function interfaces. This refactor allows us to avoid duplicate code when types differ but logic remains the same.

The `flip` and `minmax` implementations have been updated to use the new generic helpers, replacing the previously defined, redundant classes.

Due to the large number of functions, not all interfaces are unified yet. Future development can extend the types as needed. While the usage of function templates is currently limited, this will ease future development.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-03-10 10:36:48 +03:00
amane-ame
02253dd76b Copy cv::borderInterpolate from core.
Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
2025-03-10 15:26:41 +08:00
quic-xuezha
797068853f
Merge pull request #27033 from CodeLinaro:xuezha_3rdPost
Fix assert failure in Sobel test when enable FastCV #27033

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-03-10 10:24:28 +03:00
Suleyman TURKMEN
6d161c25ef Update libjpeg-turbo version:3.1.0 2025-03-09 00:02:20 +03:00
GenshinImpactStarts
0fed1fa184 fix exp, log | enable ui for log | strengthen test
Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
2025-03-07 17:11:26 +00:00
GenshinImpactStarts
524d8ae01c impl exp and log | add log perf test
Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
2025-03-07 17:11:26 +00:00
amane-ame
e06502a254 Add Morph for MORPH_ERODE and MORPH_DILATE.
Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
2025-03-08 00:35:50 +08:00
Alexander Smorkalov
40843d06ab Disable CV_SIMD_SCALABLE for demosaicing as the implementation is not efficient on RISC-V RVV. 2025-03-07 16:24:20 +03:00
amane-ame
a2d784b6f5 Add sepFilter.
Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
2025-03-07 20:56:04 +08:00
Alexander Smorkalov
12d182bf9e
Merge pull request #27025 from shyama7004:link
fix the not working link
2025-03-07 15:55:59 +03:00
Alexander Smorkalov
648424eaf2 Code review fixes. 2025-03-07 15:33:54 +03:00
shyama7004
a9b2467868 fix the not working link 2025-03-07 17:39:49 +05:30
Alexander Smorkalov
fbffaa5276 Warning fix. 2025-03-07 11:56:26 +03:00
天音あめ
e89e2fd7ea
Merge pull request #27007 from amane-ame:color_hal_rvv
Add RISC-V HAL implementation for cv::cvtColor #27007

This patch implements the following functions in RVV_HAL using native intrinsics, optimizing the performance of `cv::cvtColor` for all possible data types and modes (except for `COLOR_Bayer`, `COLOR_YUV2GRAY_420` and `COLOR_mRGBA`, as these modes have no HAL interface):

```
cv_hal_cvtBGRtoBGR
cv_hal_cvtBGRtoBGR5x5
cv_hal_cvtBGR5x5toBGR
cv_hal_cvtBGRtoGray
cv_hal_cvtGraytoBGR
cv_hal_cvtBGR5x5toGray
cv_hal_cvtGraytoBGR5x5
cv_hal_cvtBGRtoYUV
cv_hal_cvtYUVtoBGR
cv_hal_cvtBGRtoXYZ
cv_hal_cvtXYZtoBGR
cv_hal_cvtBGRtoHSV
cv_hal_cvtHSVtoBGR
cv_hal_cvtBGRtoLab
cv_hal_cvtLabtoBGR
cv_hal_cvtTwoPlaneYUVtoBGR
cv_hal_cvtBGRtoTwoPlaneYUV
cv_hal_cvtThreePlaneYUVtoBGR
cv_hal_cvtBGRtoThreePlaneYUV
cv_hal_cvtOnePlaneYUVtoBGR
cv_hal_cvtOnePlaneBGRtoYUV
```

Tested on MUSE-PI (Spacemit X60) for both gcc 14.2 and clang 20.0.

```
$ ./opencv_test_imgproc --gtest_filter="*Color*-*Bayer*"
$ ./opencv_perf_imgproc --gtest_filter="*Color*-*Bayer*" --gtest_also_run_disabled_tests --perf_min_samples=100 --perf_force_samples=100
```

View the full perf table here: [hal_rvv_color.pdf](https://github.com/user-attachments/files/19055417/hal_rvv_color.pdf)

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
2025-03-07 11:24:48 +03:00
天音あめ
00956d5c15
Merge pull request #26892 from amane-ame:solve_hal_rvv
Add RISC-V HAL implementation for cv::solve #26892

This patch implements `cv_hal_LU/cv_hal_Cholesky/cv_hal_SVD/cv_hal_QR` function in RVV_HAL using native intrinsics, optimizing the performance for `cv::solve` with method `DECOMP_LU/DECOMP_SVD/DECOMP_CHOLESKY/DECOMP_QR` and data types `32FC1/64FC1`.

Tested on MUSE-PI (Spacemit X60) for both gcc 14.2 and clang 20.0.

```
$ ./opencv_test_core --gtest_filter="*Solve*:*SVD*:*Cholesky*"
$ ./opencv_perf_core --gtest_filter="*SolveTest*" --perf_min_samples=100 --perf_force_samples=100
```

The tail of the perf table is shown below since the table is too long.

View the full perf table here: [hal_rvv_solve.pdf](https://github.com/user-attachments/files/18725067/hal_rvv_solve.pdf)

<img width="1078" alt="Untitled" src="https://github.com/user-attachments/assets/c01d849c-f000-4bcc-bfe0-a302d6605d9e" />

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-03-07 11:14:09 +03:00
天音あめ
bb525fe91d
Merge pull request #26865 from amane-ame:dxt_hal_rvv
Add RISC-V HAL implementation for cv::dft and cv::dct #26865

This patch implements `static cv::DFT` function in RVV_HAL using native intrinsic, optimizing the performance for `cv::dft` and `cv::dct` with data types `32FC1/64FC1/32FC2/64FC2`.

The reason I chose to create a new `cv_hal_dftOcv` interface is that if I were to use the existing interfaces (`cv_hal_dftInit1D` and `cv_hal_dft1D`), it would require handling and parsing the dft flags within HAL, as well as performing preprocessing operations such as handling unit roots. Since these operations are not performance hotspots and do not require optimization, reusing the existing interfaces would result in copying approximately 300 lines of code from `core/src/dxt.cpp` into HAL, which I believe is unnecessary.

Moreover, if I insert the new interface into `static cv::DFT`, both `static cv::RealDFT` and `static cv::DCT` can be optimized as well. The processing performed before and after calling `static cv::DFT` in these functions is also not a performance hotspot.

Tested on MUSE-PI (Spacemit X60) for both gcc 14.2 and clang 20.0.

```
$ opencv_test_core --gtest_filter="*DFT*"
$ opencv_perf_core --gtest_filter="*dft*:*dct*" --perf_min_samples=30 --perf_force_samples=30
```

The head of the perf table is shown below since the table is too long.

View the full perf table here: [hal_rvv_dxt.pdf](https://github.com/user-attachments/files/18622645/hal_rvv_dxt.pdf)

<img width="1017" alt="Untitled" src="https://github.com/user-attachments/assets/609856e7-9c7d-4a95-9923-45c1b77eb3a2" />

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-03-07 11:08:41 +03:00
GenshinImpactStarts
57a78cb9df
Merge pull request #26941 from GenshinImpactStarts:lut_hal_rvv
Impl hal_rvv LUT | Add more LUT test #26941 

Implement through the existing `cv_hal_lut` interfaces.

Add more LUT accuracy and performance tests:
- **Accuracy test**: Multi-channel table tests are added, and the boundary of `randu` used for generating test data is broadened to make the test more robust.
- **Performance test**: Multi-channel input and multi-channel table tests are added.

Perf test done on
- MUSE-PI (vlen=256)
- Compiler: gcc 14.2 (riscv-collab/riscv-gnu-toolchain Nightly: December 16, 2024)


```sh

$ opencv_test_core --gtest_filter="Core_LUT*"
$ opencv_perf_core --gtest_filter="SizePrm_LUT*" --perf_min_samples=300 --perf_force_samples=300
```
```sh
Geometric mean (ms)

         Name of Test          scalar   ui    rvv       ui        rvv    
                                                        vs         vs    
                                                      scalar     scalar  
                                                    (x-factor) (x-factor)
LUT::SizePrm::320x240          0.248  0.249  0.052     1.00       4.74   
LUT::SizePrm::640x480          0.277  0.275  0.085     1.01       3.28   
LUT::SizePrm::1920x1080        0.950  0.947  0.634     1.00       1.50   
LUT_multi2::SizePrm::320x240   2.051  2.045  2.049     1.00       1.00   
LUT_multi2::SizePrm::640x480   2.128  2.134  2.125     1.00       1.00   
LUT_multi2::SizePrm::1920x1080 7.397  7.380  7.390     1.00       1.00   
LUT_multi::SizePrm::320x240    0.715  0.747  0.154     0.96       4.64   
LUT_multi::SizePrm::640x480    0.741  0.766  0.257     0.97       2.88   
LUT_multi::SizePrm::1920x1080  2.766  2.765  1.925     1.00       1.44  
```

This optimization is achieved by loading the entire lookup table into vector registers. Due to register size limitations, the optimization is only effective under the following conditions:  
- For the U8C1 table type, the optimization works when `vlen >= 256`
- For U16C1, it works when `vlen >= 512`
- For U32C1, it works when `vlen >= 1024`

Since I don’t have real hardware with `vlen > 256`, the corresponding accuracy tests were conducted on QEMU built from the `riscv-collab/riscv-gnu-toolchain`.

This patch does not implement optimizations for multi-channel tables.

Previous attempts:
1. For the U8C1 table type, when `vlen = 128`, it is possible to use four `u8m4` vectors to load the entire table, perform gathering, and merge the results. However, the performance is almost the same as the scalar version.
2. Loading part of the table and repeatedly loading the source data is faster for small sizes. But as the table size grows, the performance quickly degrades compared to the scalar version.
3. Using `vluxei8` as a general solution does not show any performance improvement.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-03-06 11:17:00 +03:00
amane-ame
83104bed32 Add Filter2D.
Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
2025-03-06 14:10:06 +08:00
Suleyman TURKMEN
dbd4e4549d
Merge pull request #26849 from sturkmen72:apng-writeanimation
APNG encoding optimization #26849

related #26840

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-03-05 10:42:43 +03:00
Benjamin Knecht
d80fd565b4 Attempt to fix Windows int type warning 2025-03-04 16:24:50 +01:00
Benjamin Knecht
1aa658fa75 Address more comments
Use map to manage unique marker size candidate trees.
Avoid code duplication.
Add a test to show double detection with overlapping dictionaries.
Generalize to marker sizes of not only predefined dictionaries.
2025-03-04 15:24:03 +01:00
Liutong HAN
97abffbdac
Merge pull request #27006 from hanliutong:rvv-fix-ui-1024
Fix issues in RISC-V Vector (RVV) Universal Intrinsic #27006

This PR aims to make `opencv_test_core` pass on RVV, via following two parts:

1. Fix bug in Universal Intrinsic when VLEN >= 512:
- `max_nlanes` should be multiplied by 2, because we use LMUL=2 in RVV Universal Intrinsic since #26318.
- Related tests are also expanded to match longer registers
- Relax the precision threshold of `v_erf` to make the tests pass

2. Temporary fix  #26936
- Disable 3 Universal Intrinsic code blocks on GCC
- This is just a temporary fix until we figure out if it's our issue or GCC/something else's

This patch is tested under the following conditions:
- Compier: GCC 14.2, Clang 19.1.7
- Device: Muse-Pi (VLEN=256), QEMU (VLEN=512, 1024)


### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-03-04 16:49:59 +03:00
天音あめ
cbcfd772ce
Merge pull request #26958 from amane-ame:pyramids_hal_rvv
Add RISC-V HAL implementation for cv::pyrDown and cv::pyrUp #26958

This patch implements `cv_hal_pyrdown/cv_hal_pyrup` function in RVV_HAL using native intrinsics, optimizing the performance for `cv::pyrDown`, `cv::pyrUp` and `cv::buildPyramids` with data types `{8U,16S,32F} x {C1,C2,C3,C4,Cn}`.

Tested on MUSE-PI (Spacemit X60) for both gcc 14.2 and clang 20.0.

```
$ ./opencv_test_imgproc --gtest_filter="*pyr*:*Pyr*"
$ ./opencv_perf_imgproc --gtest_filter="*pyr*:*Pyr*" --perf_min_samples=300 --perf_force_samples=300
```

<img width="1112" alt="Untitled" src="https://github.com/user-attachments/assets/235a9fba-0d29-434e-8a10-498212bac657" />


### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-03-04 15:41:15 +03:00
Alexander Smorkalov
5c6c6af4ec
Merge pull request #27004 from asmorkalov:as/minMax_backport
Backported some CALL_HAL improvements from 5.x #26946
2025-03-04 08:07:30 +03:00
Daniel Bermuth
8a24d41b54
Merge pull request #26988 from DanBmh:opt_undistort
Optimize undistort points #26988

Skips unnecessary rotation with identity matrix if no R or P mats are given.

---------

Co-authored-by: Daniel <daniel@mail.de>
2025-03-03 17:16:09 +03:00
Alexander Smorkalov
1aa69292b0 Backported some CALL_HAL improvements from 5.x #26946 2025-03-03 16:22:48 +03:00
sssanjee-quic
a62b78d6e3
Merge pull request #26910 from CodeLinaro:FastcvHAL_Documentation
Documentation to enable FastCV based OpenCV HAL and Extensions #26910

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-03-03 15:14:08 +03:00
Skreg
3f1e7fcb8f
Merge pull request #26996 from shyama7004:outofBound
Fix Logical defect in FilterSpecklesImpl #26996

Fixes : #24963

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-03-03 15:12:18 +03:00
Daniel
e39eb949ea Use only image contour for camera matrix undistortion. 2025-03-03 11:35:05 +01:00
Maxim Smolskiy
dbd3ef9a6f
Merge pull request #26926 from MaximSmolskiy:fix-getPerspectiveTransform-for-singular-case
Fix getPerspectiveTransform for singular case #26926

### Pull Request Readiness Checklist

Fix #26916 

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2025-03-02 12:44:39 +03:00
Anshuprem
87cc1643f4
Merge pull request #26992 from Anshuprem:4.x
Some minor fixes #26992

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-03-01 14:54:26 +03:00
Alexander Smorkalov
b65fd3b51c
Merge pull request #26985 from xi-guo-0:fix-qnx7.0-build
fix: qnx7.0 build
2025-02-28 15:08:07 +03:00
Benjamin Knecht
3084f950cf Fix dictionary comparison in test 2025-02-27 10:52:27 +01:00
xi-guo
0b8cab368b fix: qnx7.0 build 2025-02-27 14:24:18 +08:00
Alexander Smorkalov
3db1247745
Merge pull request #26918 from GenshinImpactStarts:norm_hamming
Impl RISC-V HAL for norm_hamming
2025-02-26 21:23:04 +03:00
Alexander Smorkalov
4d6d6fb18f
Merge pull request #26983 from AsyaPronina:wa_for_ort_env
G-API/ORT: Workaround exception during OV EP append
2025-02-26 21:19:33 +03:00
Alexander Smorkalov
4f6996b5dd
Merge pull request #26982 from asmorkalov:as/backport_c_api
Backported some C API cleanup from 5.x to 4.x to reduce conflicts in 4.x->5.x merge
2025-02-26 20:30:47 +03:00
Anastasiya Pronina
76d3bf0a3b Workaround for successfull append of OpenVINO Execution Provider: Moved creation of 'Ort::Env' before it 2025-02-26 16:55:48 +00:00
Kumataro
a63ede6b1d
Merge pull request #26930 from Kumataro:fix26924
Imgcodecs: gif: support Disposal Method #26930

Close https://github.com/opencv/opencv/issues/26924

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2025-02-26 17:15:41 +03:00
Alexander Smorkalov
d3792dad86 Backported some C API cleanup from 5.x to 4.x to reduce conflicts in 4.x->5.x merge. 2025-02-26 17:11:31 +03:00
Maksim Shabunin
43551b72d7
Merge pull request #26948 from mshabunin:fix-videoio-test-params
videoio: print test params instead of indexes #26948
_videoio_ test names changed - use string instead of index.
E.g. `videoio_read.threads/0` is now `videoio_read.threads/h264_0_RAW`.
It allows to filter tests independently of the platform.

**Notes:**
- not all tests has been updated - only simpler ones and those which have varying parameters depending on platform
2025-02-26 14:04:37 +03:00
Suleyman TURKMEN
39bc5df72a
Merge pull request #26973 from sturkmen72:png_test
Add a test related IMWRITE_PNG_COMPRESSION parameter #26973

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-02-25 15:24:25 +03:00
Benjamin Knecht
d869b12e89 Fixing warnings in tests 2025-02-25 11:50:13 +01:00
GenshinImpactStarts
33d632f85e impl hal_rvv norm_hamming
Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
2025-02-25 02:31:02 +00:00
Benjamin Knecht
314f99f7a0 Remove add/removeDictionary and retain ABI of set/getDictionary
functions
2025-02-24 18:01:10 +01:00
Benjamin Knecht
6c3b195a57 Make sure serialization with single dict preserves old behavior 2025-02-24 17:33:09 +01:00
MaximSmolskiy
63ad15c243 Refactor ArucoDetector::ArucoDetectorImpl::filterTooCloseCandidates 2025-02-24 19:26:48 +03:00
Benjamin Knecht
3c88a001a2 Add docs to Dictionary get/set/add/remove functions 2025-02-24 14:30:48 +01:00
GenshinImpactStarts
6a6a5a765d
Merge pull request #26943 from GenshinImpactStarts:flip_hal_rvv
Impl RISC-V HAL for cv::flip | Add perf test for flip #26943 

Implement through the existing `cv_hal_flip` interfaces.

Add perf test for `cv::flip`.

The reason why select these args for testing:
- **size**: copied from perf_lut
- **type**:
    - U8C1: basic situation
    - U8C3: unaligned element size
    - U8C4: large element size

Tested on
- MUSE-PI (vlen=256)
- Compiler: gcc 14.2 (riscv-collab/riscv-gnu-toolchain Nightly: December 16, 2024)

```sh
$ opencv_test_core --gtest_filter="Core_Flip/ElemWiseTest.*"
$ opencv_perf_core --gtest_filter="Size_MatType_FlipCode*" --perf_min_samples=300 --perf_force_samples=300
```

```
Geometric mean (ms)

                     Name of Test                       scalar   ui    rvv       ui        rvv    
                                                                                 vs         vs    
                                                                               scalar     scalar  
                                                                             (x-factor) (x-factor)
flip::Size_MatType_FlipCode::(320x240, 8UC1, FLIP_X)    0.026  0.033  0.031     0.81       0.84   
flip::Size_MatType_FlipCode::(320x240, 8UC1, FLIP_XY)   0.206  0.212  0.091     0.97       2.26   
flip::Size_MatType_FlipCode::(320x240, 8UC1, FLIP_Y)    0.185  0.189  0.082     0.98       2.25   
flip::Size_MatType_FlipCode::(320x240, 8UC3, FLIP_X)    0.070  0.084  0.084     0.83       0.83   
flip::Size_MatType_FlipCode::(320x240, 8UC3, FLIP_XY)   0.616  0.612  0.235     1.01       2.62   
flip::Size_MatType_FlipCode::(320x240, 8UC3, FLIP_Y)    0.587  0.603  0.204     0.97       2.88   
flip::Size_MatType_FlipCode::(320x240, 8UC4, FLIP_X)    0.263  0.110  0.109     2.40       2.41   
flip::Size_MatType_FlipCode::(320x240, 8UC4, FLIP_XY)   0.930  0.831  0.316     1.12       2.95   
flip::Size_MatType_FlipCode::(320x240, 8UC4, FLIP_Y)    1.175  1.129  0.313     1.04       3.75   
flip::Size_MatType_FlipCode::(640x480, 8UC1, FLIP_X)    0.303  0.118  0.111     2.57       2.73   
flip::Size_MatType_FlipCode::(640x480, 8UC1, FLIP_XY)   0.949  0.836  0.405     1.14       2.34   
flip::Size_MatType_FlipCode::(640x480, 8UC1, FLIP_Y)    0.784  0.783  0.409     1.00       1.92   
flip::Size_MatType_FlipCode::(640x480, 8UC3, FLIP_X)    1.084  0.360  0.355     3.01       3.06   
flip::Size_MatType_FlipCode::(640x480, 8UC3, FLIP_XY)   3.768  3.348  1.364     1.13       2.76   
flip::Size_MatType_FlipCode::(640x480, 8UC3, FLIP_Y)    4.361  4.473  1.296     0.97       3.37   
flip::Size_MatType_FlipCode::(640x480, 8UC4, FLIP_X)    1.252  0.469  0.451     2.67       2.78   
flip::Size_MatType_FlipCode::(640x480, 8UC4, FLIP_XY)   5.732  5.220  1.303     1.10       4.40   
flip::Size_MatType_FlipCode::(640x480, 8UC4, FLIP_Y)    5.041  5.105  1.203     0.99       4.19   
flip::Size_MatType_FlipCode::(1920x1080, 8UC1, FLIP_X)  2.382  0.903  0.903     2.64       2.64   
flip::Size_MatType_FlipCode::(1920x1080, 8UC1, FLIP_XY) 8.606  7.508  2.581     1.15       3.33   
flip::Size_MatType_FlipCode::(1920x1080, 8UC1, FLIP_Y)  8.421  8.535  2.219     0.99       3.80   
flip::Size_MatType_FlipCode::(1920x1080, 8UC3, FLIP_X)  6.312  2.416  2.429     2.61       2.60   
flip::Size_MatType_FlipCode::(1920x1080, 8UC3, FLIP_XY) 29.174 26.055 12.761    1.12       2.29   
flip::Size_MatType_FlipCode::(1920x1080, 8UC3, FLIP_Y)  25.373 25.500 13.382    1.00       1.90   
flip::Size_MatType_FlipCode::(1920x1080, 8UC4, FLIP_X)  7.620  3.204  3.115     2.38       2.45   
flip::Size_MatType_FlipCode::(1920x1080, 8UC4, FLIP_XY) 32.876 29.310 12.976    1.12       2.53   
flip::Size_MatType_FlipCode::(1920x1080, 8UC4, FLIP_Y)  28.831 29.094 14.919    0.99       1.93   
```

The optimization for vlen <= 256 and > 256 are different, but I have no real hardware with vlen > 256. So accuracy tests for that like 512 and 1024 are conducted on QEMU built from the `riscv-collab/riscv-gnu-toolchain`.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-02-24 08:56:23 +03:00
Daniil Anufriev
b5f5540e8a
Merge pull request #26886 from sk1er52:feature/exp64f
Enable SIMD_SCALABLE for exp and sqrt #26886

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
```
CPU - Banana Pi k1, compiler - clang 18.1.4
```
```
Geometric mean (ms)

              Name of Test               baseline  hal     ui      hal         ui    
                                                                    vs         vs
                                                                 baseline   baseline
                                                                (x-factor) (x-factor)
Exp::ExpFixture::(127x61, 32FC1)          0.358     --   0.033      --       10.70   
Exp::ExpFixture::(640x480, 32FC1)         14.304    --   1.167      --       12.26   
Exp::ExpFixture::(1280x720, 32FC1)        42.785    --   3.538      --       12.09
Exp::ExpFixture::(1920x1080, 32FC1)       96.206    --   7.927      --       12.14   
Exp::ExpFixture::(127x61, 64FC1)          0.433   0.050  0.098     8.59       4.40   
Exp::ExpFixture::(640x480, 64FC1)         17.315  1.935  3.813     8.95       4.54   
Exp::ExpFixture::(1280x720, 64FC1)        52.181  5.877  11.519    8.88       4.53   
Exp::ExpFixture::(1920x1080, 64FC1)      117.082  13.157 25.854    8.90       4.53
```
Additionally, this PR brings Sqrt optimization with UI:
```
Geometric mean (ms)

              Name of Test                     baseline    ui       ui    
                                                                    vs
                                                                 baseline
                                                                (x-factor)
Sqrt::SqrtFixture::(127x61, 5, false)            0.111   0.027     4.11   
Sqrt::SqrtFixture::(127x61, 6, false)            0.149   0.053     2.82   
Sqrt::SqrtFixture::(640x480, 5, false)           4.374   0.967     4.52   
Sqrt::SqrtFixture::(640x480, 6, false)           5.885   2.046     2.88   
Sqrt::SqrtFixture::(1280x720, 5, false)          12.960  2.915     4.45   
Sqrt::SqrtFixture::(1280x720, 6, false)          17.648  6.107     2.89   
Sqrt::SqrtFixture::(1920x1080, 5, false)         29.178  6.524     4.47   
Sqrt::SqrtFixture::(1920x1080, 6, false)         39.709  13.670    2.90   
```

Reference
Muller, J.-M. Elementary Functions: Algorithms and Implementation. 2nd ed. Boston: Birkhäuser, 2006.
https://www.springer.com/gp/book/9780817643720
2025-02-21 17:36:54 +03:00
Alexander Smorkalov
a256886838
Merge pull request #26949 from shyama7004:Fix
replace deprecated np.fromstring() by np.frombuffer()
2025-02-21 13:55:23 +03:00
Yuantao Feng
e2803bee5c
Merge pull request #26885 from fengyuentau:4x/core/normalize_simd
core: vectorize cv::normalize / cv::norm #26885

Checklist:
|      | normInf | normL1 | normL2 |
| ---- | ------- | ------ | ------ |
| bool |    -    |   -    |   -    |
| 8u   |    √    |   √    |   √    |
| 8s   |    √    |   √    |   √    |
| 16u  |    √    |   √    |   √    |
| 16s  |    √    |   √    |   √    |
| 16f  |    -    |   -    |   -    |
| 16bf |    -    |   -    |   -    |
| 32u  |    -    |   -    |   -    |
| 32s  |    √    |   √    |   √    |
| 32f  |    √    |   √    |   √    |
| 64u  |    -    |   -    |   -    |
| 64s  |    -    |   -    |   -    |
| 64f  |    √    |   √    |   √    |

*: Vectorization of data type bool, 16f, 16bf, 32u, 64u and 64s needs to be done on 5.x.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-02-21 13:49:11 +03:00
shyama7004
a47f0f00cb replace deprecated np.fromstring() by np.frombuffer() 2025-02-21 10:37:11 +05:30
Benjamin Knecht
364eedb87e Undo multi dict functionality of refineDetectedMarkers method 2025-02-20 15:37:44 +01:00
Dmitry Kurtaev
7a2b048c92
Merge pull request #26923 from dkurt:merge_rvv_opt
Further optimization of cv::merge RVV HAL for 8U and 16S #26923

### Pull Request Readiness Checklist


* Banana Pi BF3 (SpacemiT K1) RISC-V
* Compiler: Syntacore Clang 18.1.4 (build 2024.12)

```
Geometric mean (ms)

                     Name of Test                       baseline   pr       pr
                                                         merge              vs    
                                                                         baseline
                                                                          merge
                                                                        (x-factor)
merge::Size_SrcDepth_DstChannels::(127x61, 8UC1, 2)      0.013   0.003     3.76   
merge::Size_SrcDepth_DstChannels::(127x61, 8UC1, 3)      0.020   0.006     3.46   
merge::Size_SrcDepth_DstChannels::(127x61, 8UC1, 4)      0.026   0.010     2.61   
merge::Size_SrcDepth_DstChannels::(127x61, 8UC1, 5)      0.043   0.028     1.56   
merge::Size_SrcDepth_DstChannels::(127x61, 8UC1, 6)      0.054   0.035     1.53   
merge::Size_SrcDepth_DstChannels::(127x61, 8UC1, 7)      0.065   0.050     1.30   
merge::Size_SrcDepth_DstChannels::(127x61, 8UC1, 8)      0.070   0.036     1.95   
merge::Size_SrcDepth_DstChannels::(127x61, 16SC1, 2)     0.015   0.008     1.82   
merge::Size_SrcDepth_DstChannels::(127x61, 16SC1, 3)     0.022   0.015     1.48   
merge::Size_SrcDepth_DstChannels::(127x61, 16SC1, 4)     0.029   0.018     1.63   
merge::Size_SrcDepth_DstChannels::(127x61, 16SC1, 5)     0.067   0.044     1.54   
merge::Size_SrcDepth_DstChannels::(127x61, 16SC1, 6)     0.088   0.056     1.58   
merge::Size_SrcDepth_DstChannels::(127x61, 16SC1, 7)     0.104   0.076     1.38   
merge::Size_SrcDepth_DstChannels::(127x61, 16SC1, 8)     0.116   0.065     1.79   
merge::Size_SrcDepth_DstChannels::(640x480, 8UC1, 2)     0.421   0.176     2.39   
merge::Size_SrcDepth_DstChannels::(640x480, 8UC1, 3)     0.792   0.284     2.79   
merge::Size_SrcDepth_DstChannels::(640x480, 8UC1, 4)     1.090   0.370     2.95   
merge::Size_SrcDepth_DstChannels::(640x480, 8UC1, 5)     1.835   1.399     1.31   
merge::Size_SrcDepth_DstChannels::(640x480, 8UC1, 6)     2.389   1.776     1.35   
merge::Size_SrcDepth_DstChannels::(640x480, 8UC1, 7)     3.000   2.471     1.21   
merge::Size_SrcDepth_DstChannels::(640x480, 8UC1, 8)     3.178   2.104     1.51   
merge::Size_SrcDepth_DstChannels::(640x480, 16SC1, 2)    0.490   0.377     1.30   
merge::Size_SrcDepth_DstChannels::(640x480, 16SC1, 3)    1.348   0.602     2.24   
merge::Size_SrcDepth_DstChannels::(640x480, 16SC1, 4)    1.827   0.813     2.25   
merge::Size_SrcDepth_DstChannels::(640x480, 16SC1, 5)    3.283   2.692     1.22   
merge::Size_SrcDepth_DstChannels::(640x480, 16SC1, 6)    4.922   3.334     1.48   
merge::Size_SrcDepth_DstChannels::(640x480, 16SC1, 7)    5.725   4.399     1.30   
merge::Size_SrcDepth_DstChannels::(640x480, 16SC1, 8)    6.278   4.748     1.32   
merge::Size_SrcDepth_DstChannels::(1280x720, 8UC1, 2)    1.267   0.603     2.10   
merge::Size_SrcDepth_DstChannels::(1280x720, 8UC1, 3)    2.394   0.934     2.56   
merge::Size_SrcDepth_DstChannels::(1280x720, 8UC1, 4)    3.236   1.434     2.26   
merge::Size_SrcDepth_DstChannels::(1280x720, 8UC1, 5)    5.398   4.345     1.24   
merge::Size_SrcDepth_DstChannels::(1280x720, 8UC1, 6)    7.127   5.459     1.31   
merge::Size_SrcDepth_DstChannels::(1280x720, 8UC1, 7)    8.590   7.298     1.18   
merge::Size_SrcDepth_DstChannels::(1280x720, 8UC1, 8)    9.360   6.152     1.52   
merge::Size_SrcDepth_DstChannels::(1280x720, 16SC1, 2)   1.482   1.242     1.19   
merge::Size_SrcDepth_DstChannels::(1280x720, 16SC1, 3)   4.008   1.817     2.21   
merge::Size_SrcDepth_DstChannels::(1280x720, 16SC1, 4)   6.079   2.468     2.46   
merge::Size_SrcDepth_DstChannels::(1280x720, 16SC1, 5)   11.300  8.644     1.31   
merge::Size_SrcDepth_DstChannels::(1280x720, 16SC1, 6)   15.125  12.126    1.25   
merge::Size_SrcDepth_DstChannels::(1280x720, 16SC1, 7)   17.555  14.804    1.19   
merge::Size_SrcDepth_DstChannels::(1280x720, 16SC1, 8)   18.890  14.163    1.33   
merge::Size_SrcDepth_DstChannels::(1920x1080, 8UC1, 2)   2.910   1.326     2.19   
merge::Size_SrcDepth_DstChannels::(1920x1080, 8UC1, 3)   5.351   1.997     2.68   
merge::Size_SrcDepth_DstChannels::(1920x1080, 8UC1, 4)   7.290   2.629     2.77   
merge::Size_SrcDepth_DstChannels::(1920x1080, 8UC1, 5)   12.426  9.611     1.29   
merge::Size_SrcDepth_DstChannels::(1920x1080, 8UC1, 6)   16.453  12.162    1.35   
merge::Size_SrcDepth_DstChannels::(1920x1080, 8UC1, 7)   19.420  16.190    1.20   
merge::Size_SrcDepth_DstChannels::(1920x1080, 8UC1, 8)   20.588  13.699    1.50   
merge::Size_SrcDepth_DstChannels::(1920x1080, 16SC1, 2)  3.400   2.640     1.29   
merge::Size_SrcDepth_DstChannels::(1920x1080, 16SC1, 3)  8.986   3.952     2.27   
merge::Size_SrcDepth_DstChannels::(1920x1080, 16SC1, 4)  11.972  5.273     2.27   
merge::Size_SrcDepth_DstChannels::(1920x1080, 16SC1, 5)  20.544  17.996    1.14   
merge::Size_SrcDepth_DstChannels::(1920x1080, 16SC1, 6)  28.677  22.086    1.30   
merge::Size_SrcDepth_DstChannels::(1920x1080, 16SC1, 7)  32.958  27.713    1.19   
merge::Size_SrcDepth_DstChannels::(1920x1080, 16SC1, 8)  36.499  27.439    1.33
```

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2025-02-20 17:28:28 +03:00
Benjamin Knecht
1f9d6aa6cf Fixed warning on Windows, clarified refineDetectedMarkers method 2025-02-20 15:12:56 +01:00
Alexander Smorkalov
58b14294b5
Merge pull request #26942 from vrabaud:png_leak
Bump openjp2 to v2.5.3
2025-02-20 12:56:54 +03:00
Vincent Rabaud
a6bfd87943 Bump openjp2 to v2.5.3
This should quiet some fuzzer bugs
2025-02-20 10:46:33 +03:00
Benjamin Knecht
f212c163e3 have two detectMarkers functions for python backwards compatibility
using multiple dictionaries for refinement (function split not necessary
as it's backwards compatible)
2025-02-19 18:45:06 +01:00
Benjamin Knecht
9ae23a7f51 Fix index comparison warnings 2025-02-19 10:53:03 +01:00
kyler1cartesis
d32d4da9a3
Merge pull request #26887 from kyler1cartesis:4.x
invSqrt SIMD_SCALABLE implementation & HAL tests refactoring #26887

Enable CV_SIMD_SCALABLE for invSqrt.

* Banana Pi BF3 (SpacemiT K1) RISC-V
* Compiler: Syntacore Clang 18.1.4 (build 2024.12)

```
Geometric mean (ms)

                Name of Test                  baseline   simd      simd   
                                                       scalable  scalable
                                                                    vs
                                                                 baseline
                                                                (x-factor)
InvSqrtf::InvSqrtfFixture::(127x61, 32FC1)     0.163    0.051      3.23   
InvSqrtf::InvSqrtfFixture::(127x61, 64FC1)     0.241    0.103      2.35   
InvSqrtf::InvSqrtfFixture::(640x480, 32FC1)    6.460    1.893      3.41   
InvSqrtf::InvSqrtfFixture::(640x480, 64FC1)    9.687    3.999      2.42   
InvSqrtf::InvSqrtfFixture::(1280x720, 32FC1)   19.292   5.701      3.38   
InvSqrtf::InvSqrtfFixture::(1280x720, 64FC1)   29.452   11.963     2.46   
InvSqrtf::InvSqrtfFixture::(1920x1080, 32FC1)  43.326   12.805     3.38   
InvSqrtf::InvSqrtfFixture::(1920x1080, 64FC1)  65.566   26.881     2.44
```
2025-02-19 12:13:48 +03:00
Benjamin Knecht
bb07ce7454 Address comments, add Python test 2025-02-18 17:03:37 +01:00
Benjamin Knecht
379b5a2fdb Fix python bindings 2025-02-18 14:08:09 +01:00
Alexander Smorkalov
6092499907
Merge pull request #26932 from shyama7004:deprecationFix
replace tostring() with tobytes()
2025-02-18 13:22:04 +03:00
Alexander Smorkalov
b5c3b706de
Merge pull request #26933 from asmorkalov:as/drop_android_test
Removed Android test as it's broken for now
2025-02-18 13:21:31 +03:00
Benjamin Knecht
c759a7cdde Extend ArUcoDetector to run multiple dictionaries in an efficient
manner.

* Add constructor for multiple dictionaries
* Add get/set/remove/add functions for multiple dictionaries
* Add unit tests

TESTED=unit tests
2025-02-18 11:04:05 +01:00
Alexander Smorkalov
f570852d20 Removed Android test as it's broken for now. 2025-02-18 12:52:24 +03:00
shyama7004
c5ad6d7904 replace tostring() with tobytes 2025-02-18 12:25:01 +05:30
Letu Ren
0fa61de22a Fix bayer2RGB_EA macro 2025-02-03 14:19:52 +08:00
Letu Ren
d6dc22d03c Fix build on RISC-V 2025-02-03 00:09:36 +08:00
Letu Ren
f1a775825f Use universal intrinsics in bayer2Gray 2024-12-21 23:29:39 +08:00
gaohaoyuan
603344fa54 add API to reinterpret Mat type 2024-07-30 11:04:58 +08:00
Alexander Alekhin
4c7a70cb5f video(test): filter very long debug tests 2024-02-15 07:52:28 +00:00
674 changed files with 31150 additions and 25279 deletions

View File

@ -46,9 +46,6 @@ jobs:
Android-SDK:
uses: opencv/ci-gha-workflow/.github/workflows/OCV-4.x-Android-SDK.yaml@main
Android-Test:
uses: opencv/ci-gha-workflow/.github/workflows/OCV-PR-4.x-Android-Test.yaml@main
TIM-VX:
uses: opencv/ci-gha-workflow/.github/workflows/OCV-timvx-backend-tests-4.x.yml@main

View File

@ -1,23 +1,23 @@
function(download_fastcv root_dir)
# Commit SHA in the opencv_3rdparty repo
set(FASTCV_COMMIT "f4413cc2ab7233fdfc383a4cded402c072677fb0")
set(FASTCV_COMMIT "abe340d0fb7f19fa9315080e3c8616642e98a296")
# Define actual FastCV versions
if(ANDROID)
if(AARCH64)
message(STATUS "Download FastCV for Android aarch64")
set(FCV_PACKAGE_NAME "fastcv_android_aarch64_2024_12_11.tgz")
set(FCV_PACKAGE_HASH "9dac41e86597305f846212dae31a4a88")
set(FCV_PACKAGE_NAME "fastcv_android_aarch64_2025_04_29.tgz")
set(FCV_PACKAGE_HASH "d9172a9a3e5d92d080a4192cc5691001")
else()
message(STATUS "Download FastCV for Android armv7")
set(FCV_PACKAGE_NAME "fastcv_android_arm32_2024_12_11.tgz")
set(FCV_PACKAGE_HASH "fe2d30334180b17e3031eee92aac43b6")
set(FCV_PACKAGE_NAME "fastcv_android_arm32_2025_04_29.tgz")
set(FCV_PACKAGE_HASH "246b5253233391cd2c74d01d49aee9c3")
endif()
elseif(UNIX AND NOT APPLE AND NOT IOS AND NOT XROS)
if(AARCH64)
set(FCV_PACKAGE_NAME "fastcv_linux_aarch64_2025_02_12.tgz")
set(FCV_PACKAGE_HASH "33ac2a59cf3e7d6402eee2e010de1202")
set(FCV_PACKAGE_NAME "fastcv_linux_aarch64_2025_04_29.tgz")
set(FCV_PACKAGE_HASH "e2ce60e25c8e4113a7af2bd243118f4c")
else()
message("FastCV: fastcv lib for 32-bit Linux is not supported for now!")
endif()

View File

@ -1,9 +0,0 @@
cmake_minimum_required(VERSION ${MIN_VER_CMAKE} FATAL_ERROR)
set(HAL_LIB_NAME "")
set(RVV_HAL_FOUND TRUE CACHE INTERNAL "")
set(RVV_HAL_VERSION "0.0.1" CACHE INTERNAL "")
set(RVV_HAL_LIBRARIES ${HAL_LIB_NAME} CACHE INTERNAL "")
set(RVV_HAL_HEADERS "hal_rvv.hpp" CACHE INTERNAL "")
set(RVV_HAL_INCLUDE_DIRS "${CMAKE_CURRENT_SOURCE_DIR}" CACHE INTERNAL "")

View File

@ -1,33 +0,0 @@
// This file is part of OpenCV project.
// It is subject to the license terms in the LICENSE file found in the top-level directory
// of this distribution and at http://opencv.org/license.html.
#ifndef OPENCV_HAL_RVV_HPP_INCLUDED
#define OPENCV_HAL_RVV_HPP_INCLUDED
#include "opencv2/core/hal/interface.h"
#ifndef CV_HAL_RVV_071_ENABLED
# if defined(__GNUC__) && __GNUC__ == 10 && __GNUC_MINOR__ == 4 && defined(__THEAD_VERSION__) && defined(__riscv_v) && __riscv_v == 7000
# define CV_HAL_RVV_071_ENABLED 1
# else
# define CV_HAL_RVV_071_ENABLED 0
# endif
#endif
#if CV_HAL_RVV_071_ENABLED
#include "version/hal_rvv_071.hpp"
#endif
#if defined(__riscv_v) && __riscv_v == 1000000
#include "hal_rvv_1p0/merge.hpp" // core
#include "hal_rvv_1p0/mean.hpp" // core
#include "hal_rvv_1p0/norm.hpp" // core
#include "hal_rvv_1p0/norm_diff.hpp" // core
#include "hal_rvv_1p0/convert_scale.hpp" // core
#include "hal_rvv_1p0/minmax.hpp" // core
#include "hal_rvv_1p0/atan.hpp" // core
#include "hal_rvv_1p0/split.hpp" // core
#endif
#endif

View File

@ -1,128 +0,0 @@
// This file is part of OpenCV project.
// It is subject to the license terms in the LICENSE file found in the top-level
// directory of this distribution and at http://opencv.org/license.html.
#pragma once
#undef cv_hal_fastAtan32f
#define cv_hal_fastAtan32f cv::cv_hal_rvv::fast_atan_32
#undef cv_hal_fastAtan64f
#define cv_hal_fastAtan64f cv::cv_hal_rvv::fast_atan_64
#include <riscv_vector.h>
#include <cfloat>
namespace cv::cv_hal_rvv {
namespace detail {
// ref: mathfuncs_core.simd.hpp
static constexpr float pi = CV_PI;
static constexpr float atan2_p1 = 0.9997878412794807F * (180 / pi);
static constexpr float atan2_p3 = -0.3258083974640975F * (180 / pi);
static constexpr float atan2_p5 = 0.1555786518463281F * (180 / pi);
static constexpr float atan2_p7 = -0.04432655554792128F * (180 / pi);
__attribute__((always_inline)) inline vfloat32m4_t
rvv_atan_f32(vfloat32m4_t vy, vfloat32m4_t vx, size_t vl, float p7,
vfloat32m4_t vp5, vfloat32m4_t vp3, vfloat32m4_t vp1,
float angle_90_deg) {
const auto ax = __riscv_vfabs(vx, vl);
const auto ay = __riscv_vfabs(vy, vl);
const auto c = __riscv_vfdiv(
__riscv_vfmin(ax, ay, vl),
__riscv_vfadd(__riscv_vfmax(ax, ay, vl), FLT_EPSILON, vl), vl);
const auto c2 = __riscv_vfmul(c, c, vl);
auto a = __riscv_vfmadd(c2, p7, vp5, vl);
a = __riscv_vfmadd(a, c2, vp3, vl);
a = __riscv_vfmadd(a, c2, vp1, vl);
a = __riscv_vfmul(a, c, vl);
const auto mask = __riscv_vmflt(ax, ay, vl);
a = __riscv_vfrsub_mu(mask, a, a, angle_90_deg, vl);
a = __riscv_vfrsub_mu(__riscv_vmflt(vx, 0.F, vl), a, a, angle_90_deg * 2,
vl);
a = __riscv_vfrsub_mu(__riscv_vmflt(vy, 0.F, vl), a, a, angle_90_deg * 4,
vl);
return a;
}
} // namespace detail
inline int fast_atan_32(const float *y, const float *x, float *dst, size_t n,
bool angle_in_deg) {
const float scale = angle_in_deg ? 1.f : CV_PI / 180.f;
const float p1 = detail::atan2_p1 * scale;
const float p3 = detail::atan2_p3 * scale;
const float p5 = detail::atan2_p5 * scale;
const float p7 = detail::atan2_p7 * scale;
const float angle_90_deg = 90.F * scale;
static size_t vlmax = __riscv_vsetvlmax_e32m4();
auto vp1 = __riscv_vfmv_v_f_f32m4(p1, vlmax);
auto vp3 = __riscv_vfmv_v_f_f32m4(p3, vlmax);
auto vp5 = __riscv_vfmv_v_f_f32m4(p5, vlmax);
for (size_t vl{}; n > 0; n -= vl) {
vl = __riscv_vsetvl_e32m4(n);
auto vy = __riscv_vle32_v_f32m4(y, vl);
auto vx = __riscv_vle32_v_f32m4(x, vl);
auto a =
detail::rvv_atan_f32(vy, vx, vl, p7, vp5, vp3, vp1, angle_90_deg);
__riscv_vse32(dst, a, vl);
x += vl;
y += vl;
dst += vl;
}
return CV_HAL_ERROR_OK;
}
inline int fast_atan_64(const double *y, const double *x, double *dst, size_t n,
bool angle_in_deg) {
// this also uses float32 version, ref: mathfuncs_core.simd.hpp
const float scale = angle_in_deg ? 1.f : CV_PI / 180.f;
const float p1 = detail::atan2_p1 * scale;
const float p3 = detail::atan2_p3 * scale;
const float p5 = detail::atan2_p5 * scale;
const float p7 = detail::atan2_p7 * scale;
const float angle_90_deg = 90.F * scale;
static size_t vlmax = __riscv_vsetvlmax_e32m4();
auto vp1 = __riscv_vfmv_v_f_f32m4(p1, vlmax);
auto vp3 = __riscv_vfmv_v_f_f32m4(p3, vlmax);
auto vp5 = __riscv_vfmv_v_f_f32m4(p5, vlmax);
for (size_t vl{}; n > 0; n -= vl) {
vl = __riscv_vsetvl_e64m8(n);
auto wy = __riscv_vle64_v_f64m8(y, vl);
auto wx = __riscv_vle64_v_f64m8(x, vl);
auto vy = __riscv_vfncvt_f_f_w_f32m4(wy, vl);
auto vx = __riscv_vfncvt_f_f_w_f32m4(wx, vl);
auto a =
detail::rvv_atan_f32(vy, vx, vl, p7, vp5, vp3, vp1, angle_90_deg);
auto wa = __riscv_vfwcvt_f_f_v_f64m8(a, vl);
__riscv_vse64(dst, wa, vl);
x += vl;
y += vl;
dst += vl;
}
return CV_HAL_ERROR_OK;
}
} // namespace cv::cv_hal_rvv

View File

@ -1,397 +0,0 @@
// This file is part of OpenCV project.
// It is subject to the license terms in the LICENSE file found in the top-level directory
// of this distribution and at http://opencv.org/license.html.
#ifndef OPENCV_HAL_RVV_MERGE_HPP_INCLUDED
#define OPENCV_HAL_RVV_MERGE_HPP_INCLUDED
#include <riscv_vector.h>
namespace cv { namespace cv_hal_rvv {
#undef cv_hal_merge8u
#define cv_hal_merge8u cv::cv_hal_rvv::merge8u
#undef cv_hal_merge16u
#define cv_hal_merge16u cv::cv_hal_rvv::merge16u
#undef cv_hal_merge32s
#define cv_hal_merge32s cv::cv_hal_rvv::merge32s
#undef cv_hal_merge64s
#define cv_hal_merge64s cv::cv_hal_rvv::merge64s
#if defined __GNUC__
__attribute__((optimize("no-tree-vectorize")))
#endif
inline int merge8u(const uchar** src, uchar* dst, int len, int cn ) {
int k = cn % 4 ? cn % 4 : 4;
int i = 0;
int vl = __riscv_vsetvlmax_e8m1();
if( k == 1 )
{
const uchar* src0 = src[0];
for( ; i <= len - vl; i += vl)
{
auto a = __riscv_vle8_v_u8m1(src0 + i, vl);
__riscv_vsse8_v_u8m1(dst + i*cn, sizeof(uchar)*cn, a, vl);
}
#if defined(__clang__)
#pragma clang loop vectorize(disable)
#endif
for( ; i < len; i++)
dst[i*cn] = src0[i];
}
else if( k == 2 )
{
const uchar *src0 = src[0], *src1 = src[1];
for( ; i <= len - vl; i += vl)
{
auto a = __riscv_vle8_v_u8m1(src0 + i, vl);
auto b = __riscv_vle8_v_u8m1(src1 + i, vl);
__riscv_vsse8_v_u8m1(dst + i*cn, sizeof(uchar)*cn, a, vl);
__riscv_vsse8_v_u8m1(dst + i*cn + 1, sizeof(uchar)*cn, b, vl);
}
#if defined(__clang__)
#pragma clang loop vectorize(disable)
#endif
for( ; i < len; i++ )
{
dst[i*cn] = src0[i];
dst[i*cn+1] = src1[i];
}
}
else if( k == 3 )
{
const uchar *src0 = src[0], *src1 = src[1], *src2 = src[2];
for( ; i <= len - vl; i += vl)
{
auto a = __riscv_vle8_v_u8m1(src0 + i, vl);
auto b = __riscv_vle8_v_u8m1(src1 + i, vl);
auto c = __riscv_vle8_v_u8m1(src2 + i, vl);
__riscv_vsse8_v_u8m1(dst + i*cn, sizeof(uchar)*cn, a, vl);
__riscv_vsse8_v_u8m1(dst + i*cn + 1, sizeof(uchar)*cn, b, vl);
__riscv_vsse8_v_u8m1(dst + i*cn + 2, sizeof(uchar)*cn, c, vl);
}
#if defined(__clang__)
#pragma clang loop vectorize(disable)
#endif
for( ; i < len; i++ )
{
dst[i*cn] = src0[i];
dst[i*cn+1] = src1[i];
dst[i*cn+2] = src2[i];
}
}
else
{
const uchar *src0 = src[0], *src1 = src[1], *src2 = src[2], *src3 = src[3];
for( ; i <= len - vl; i += vl)
{
auto a = __riscv_vle8_v_u8m1(src0 + i, vl);
auto b = __riscv_vle8_v_u8m1(src1 + i, vl);
auto c = __riscv_vle8_v_u8m1(src2 + i, vl);
auto d = __riscv_vle8_v_u8m1(src3 + i, vl);
__riscv_vsse8_v_u8m1(dst + i*cn, sizeof(uchar)*cn, a, vl);
__riscv_vsse8_v_u8m1(dst + i*cn + 1, sizeof(uchar)*cn, b, vl);
__riscv_vsse8_v_u8m1(dst + i*cn + 2, sizeof(uchar)*cn, c, vl);
__riscv_vsse8_v_u8m1(dst + i*cn + 3, sizeof(uchar)*cn, d, vl);
}
#if defined(__clang__)
#pragma clang loop vectorize(disable)
#endif
for( ; i < len; i++ )
{
dst[i*cn] = src0[i];
dst[i*cn+1] = src1[i];
dst[i*cn+2] = src2[i];
dst[i*cn+3] = src3[i];
}
}
#if defined(__clang__)
#pragma clang loop vectorize(disable)
#endif
for( ; k < cn; k += 4 )
{
const uchar *src0 = src[k], *src1 = src[k+1], *src2 = src[k+2], *src3 = src[k+3];
i = 0;
for( ; i <= len - vl; i += vl)
{
auto a = __riscv_vle8_v_u8m1(src0 + i, vl);
auto b = __riscv_vle8_v_u8m1(src1 + i, vl);
auto c = __riscv_vle8_v_u8m1(src2 + i, vl);
auto d = __riscv_vle8_v_u8m1(src3 + i, vl);
__riscv_vsse8_v_u8m1(dst + k+i*cn, sizeof(uchar)*cn, a, vl);
__riscv_vsse8_v_u8m1(dst + k+i*cn + 1, sizeof(uchar)*cn, b, vl);
__riscv_vsse8_v_u8m1(dst + k+i*cn + 2, sizeof(uchar)*cn, c, vl);
__riscv_vsse8_v_u8m1(dst + k+i*cn + 3, sizeof(uchar)*cn, d, vl);
}
#if defined(__clang__)
#pragma clang loop vectorize(disable)
#endif
for( ; i < len; i++ )
{
dst[k+i*cn] = src0[i];
dst[k+i*cn+1] = src1[i];
dst[k+i*cn+2] = src2[i];
dst[k+i*cn+3] = src3[i];
}
}
return CV_HAL_ERROR_OK;
}
#if defined __GNUC__
__attribute__((optimize("no-tree-vectorize")))
#endif
inline int merge16u(const ushort** src, ushort* dst, int len, int cn ) {
int k = cn % 4 ? cn % 4 : 4;
int i = 0;
int vl = __riscv_vsetvlmax_e16m1();
if( k == 1 )
{
const ushort* src0 = src[0];
for( ; i <= len - vl; i += vl)
{
auto a = __riscv_vle16_v_u16m1(src0 + i, vl);
__riscv_vsse16_v_u16m1(dst + i*cn, sizeof(ushort)*cn, a, vl);
}
#if defined(__clang__)
#pragma clang loop vectorize(disable)
#endif
for( ; i < len; i++)
dst[i*cn] = src0[i];
}
else if( k == 2 )
{
const ushort *src0 = src[0], *src1 = src[1];
for( ; i <= len - vl; i += vl)
{
auto a = __riscv_vle16_v_u16m1(src0 + i, vl);
auto b = __riscv_vle16_v_u16m1(src1 + i, vl);
__riscv_vsse16_v_u16m1(dst + i*cn, sizeof(ushort)*cn, a, vl);
__riscv_vsse16_v_u16m1(dst + i*cn + 1, sizeof(ushort)*cn, b, vl);
}
#if defined(__clang__)
#pragma clang loop vectorize(disable)
#endif
for( ; i < len; i++ )
{
dst[i*cn] = src0[i];
dst[i*cn+1] = src1[i];
}
}
else if( k == 3 )
{
const ushort *src0 = src[0], *src1 = src[1], *src2 = src[2];
for( ; i <= len - vl; i += vl)
{
auto a = __riscv_vle16_v_u16m1(src0 + i, vl);
auto b = __riscv_vle16_v_u16m1(src1 + i, vl);
auto c = __riscv_vle16_v_u16m1(src2 + i, vl);
__riscv_vsse16_v_u16m1(dst + i*cn, sizeof(ushort)*cn, a, vl);
__riscv_vsse16_v_u16m1(dst + i*cn + 1, sizeof(ushort)*cn, b, vl);
__riscv_vsse16_v_u16m1(dst + i*cn + 2, sizeof(ushort)*cn, c, vl);
}
#if defined(__clang__)
#pragma clang loop vectorize(disable)
#endif
for( ; i < len; i++ )
{
dst[i*cn] = src0[i];
dst[i*cn+1] = src1[i];
dst[i*cn+2] = src2[i];
}
}
else
{
const ushort *src0 = src[0], *src1 = src[1], *src2 = src[2], *src3 = src[3];
for( ; i <= len - vl; i += vl)
{
auto a = __riscv_vle16_v_u16m1(src0 + i, vl);
auto b = __riscv_vle16_v_u16m1(src1 + i, vl);
auto c = __riscv_vle16_v_u16m1(src2 + i, vl);
auto d = __riscv_vle16_v_u16m1(src3 + i, vl);
__riscv_vsse16_v_u16m1(dst + i*cn, sizeof(ushort)*cn, a, vl);
__riscv_vsse16_v_u16m1(dst + i*cn + 1, sizeof(ushort)*cn, b, vl);
__riscv_vsse16_v_u16m1(dst + i*cn + 2, sizeof(ushort)*cn, c, vl);
__riscv_vsse16_v_u16m1(dst + i*cn + 3, sizeof(ushort)*cn, d, vl);
}
#if defined(__clang__)
#pragma clang loop vectorize(disable)
#endif
for( ; i < len; i++ )
{
dst[i*cn] = src0[i];
dst[i*cn+1] = src1[i];
dst[i*cn+2] = src2[i];
dst[i*cn+3] = src3[i];
}
}
#if defined(__clang__)
#pragma clang loop vectorize(disable)
#endif
for( ; k < cn; k += 4 )
{
const uint16_t *src0 = src[k], *src1 = src[k+1], *src2 = src[k+2], *src3 = src[k+3];
i = 0;
for( ; i <= len - vl; i += vl)
{
auto a = __riscv_vle16_v_u16m1(src0 + i, vl);
auto b = __riscv_vle16_v_u16m1(src1 + i, vl);
auto c = __riscv_vle16_v_u16m1(src2 + i, vl);
auto d = __riscv_vle16_v_u16m1(src3 + i, vl);
__riscv_vsse16_v_u16m1(dst + k+i*cn, sizeof(ushort)*cn, a, vl);
__riscv_vsse16_v_u16m1(dst + k+i*cn + 1, sizeof(ushort)*cn, b, vl);
__riscv_vsse16_v_u16m1(dst + k+i*cn + 2, sizeof(ushort)*cn, c, vl);
__riscv_vsse16_v_u16m1(dst + k+i*cn + 3, sizeof(ushort)*cn, d, vl);
}
for( ; i < len; i++ )
{
dst[k+i*cn] = src0[i];
dst[k+i*cn+1] = src1[i];
dst[k+i*cn+2] = src2[i];
dst[k+i*cn+3] = src3[i];
}
}
return CV_HAL_ERROR_OK;
}
#if defined __GNUC__
__attribute__((optimize("no-tree-vectorize")))
#endif
inline int merge32s(const int** src, int* dst, int len, int cn ) {
int k = cn % 4 ? cn % 4 : 4;
int i, j;
if( k == 1 )
{
const int* src0 = src[0];
#if defined(__clang__)
#pragma clang loop vectorize(disable)
#endif
for( i = j = 0; i < len; i++, j += cn )
dst[j] = src0[i];
}
else if( k == 2 )
{
const int *src0 = src[0], *src1 = src[1];
i = j = 0;
#if defined(__clang__)
#pragma clang loop vectorize(disable)
#endif
for( ; i < len; i++, j += cn )
{
dst[j] = src0[i];
dst[j+1] = src1[i];
}
}
else if( k == 3 )
{
const int *src0 = src[0], *src1 = src[1], *src2 = src[2];
i = j = 0;
#if defined(__clang__)
#pragma clang loop vectorize(disable)
#endif
for( ; i < len; i++, j += cn )
{
dst[j] = src0[i];
dst[j+1] = src1[i];
dst[j+2] = src2[i];
}
}
else
{
const int *src0 = src[0], *src1 = src[1], *src2 = src[2], *src3 = src[3];
i = j = 0;
#if defined(__clang__)
#pragma clang loop vectorize(disable)
#endif
for( ; i < len; i++, j += cn )
{
dst[j] = src0[i]; dst[j+1] = src1[i];
dst[j+2] = src2[i]; dst[j+3] = src3[i];
}
}
#if defined(__clang__)
#pragma clang loop vectorize(disable)
#endif
for( ; k < cn; k += 4 )
{
const int *src0 = src[k], *src1 = src[k+1], *src2 = src[k+2], *src3 = src[k+3];
for( i = 0, j = k; i < len; i++, j += cn )
{
dst[j] = src0[i]; dst[j+1] = src1[i];
dst[j+2] = src2[i]; dst[j+3] = src3[i];
}
}
return CV_HAL_ERROR_OK;
}
#if defined __GNUC__
__attribute__((optimize("no-tree-vectorize")))
#endif
inline int merge64s(const int64** src, int64* dst, int len, int cn ) {
int k = cn % 4 ? cn % 4 : 4;
int i, j;
if( k == 1 )
{
const int64* src0 = src[0];
#if defined(__clang__)
#pragma clang loop vectorize(disable)
#endif
for( i = j = 0; i < len; i++, j += cn )
dst[j] = src0[i];
}
else if( k == 2 )
{
const int64 *src0 = src[0], *src1 = src[1];
i = j = 0;
#if defined(__clang__)
#pragma clang loop vectorize(disable)
#endif
for( ; i < len; i++, j += cn )
{
dst[j] = src0[i];
dst[j+1] = src1[i];
}
}
else if( k == 3 )
{
const int64 *src0 = src[0], *src1 = src[1], *src2 = src[2];
i = j = 0;
#if defined(__clang__)
#pragma clang loop vectorize(disable)
#endif
for( ; i < len; i++, j += cn )
{
dst[j] = src0[i];
dst[j+1] = src1[i];
dst[j+2] = src2[i];
}
}
else
{
const int64 *src0 = src[0], *src1 = src[1], *src2 = src[2], *src3 = src[3];
i = j = 0;
#if defined(__clang__)
#pragma clang loop vectorize(disable)
#endif
for( ; i < len; i++, j += cn )
{
dst[j] = src0[i]; dst[j+1] = src1[i];
dst[j+2] = src2[i]; dst[j+3] = src3[i];
}
}
#if defined(__clang__)
#pragma clang loop vectorize(disable)
#endif
for( ; k < cn; k += 4 )
{
const int64 *src0 = src[k], *src1 = src[k+1], *src2 = src[k+2], *src3 = src[k+3];
for( i = 0, j = k; i < len; i++, j += cn )
{
dst[j] = src0[i]; dst[j+1] = src1[i];
dst[j+2] = src2[i]; dst[j+3] = src3[i];
}
}
return CV_HAL_ERROR_OK;
}
}}
#endif

View File

@ -1,335 +0,0 @@
// This file is part of OpenCV project.
// It is subject to the license terms in the LICENSE file found in the top-level directory
// of this distribution and at http://opencv.org/license.html.
#ifndef OPENCV_HAL_RVV_MINMAXIDX_HPP_INCLUDED
#define OPENCV_HAL_RVV_MINMAXIDX_HPP_INCLUDED
#include <riscv_vector.h>
namespace cv { namespace cv_hal_rvv {
#undef cv_hal_minMaxIdx
#define cv_hal_minMaxIdx cv::cv_hal_rvv::minMaxIdx
#undef cv_hal_minMaxIdxMaskStep
#define cv_hal_minMaxIdxMaskStep cv::cv_hal_rvv::minMaxIdx
namespace
{
template<typename T> struct rvv;
#define HAL_RVV_GENERATOR(T, EEW, TYPE, IS_U, EMUL, M_EMUL, B_LEN) \
template<> struct rvv<T> \
{ \
using vec_t = v##IS_U##int##EEW##EMUL##_t; \
using bool_t = vbool##B_LEN##_t; \
static inline size_t vsetvlmax() { return __riscv_vsetvlmax_e##EEW##EMUL(); } \
static inline size_t vsetvl(size_t a) { return __riscv_vsetvl_e##EEW##EMUL(a); } \
static inline vec_t vmv_v_x(T a, size_t b) { return __riscv_vmv_v_x_##TYPE##EMUL(a, b); } \
static inline vec_t vle(const T* a, size_t b) { return __riscv_vle##EEW##_v_##TYPE##EMUL(a, b); } \
static inline vuint8##M_EMUL##_t vle_mask(const uchar* a, size_t b) { return __riscv_vle8_v_u8##M_EMUL(a, b); } \
static inline vec_t vmin_tu(vec_t a, vec_t b, vec_t c, size_t d) { return __riscv_vmin##IS_U##_tu(a, b, c, d); } \
static inline vec_t vmax_tu(vec_t a, vec_t b, vec_t c, size_t d) { return __riscv_vmax##IS_U##_tu(a, b, c, d); } \
static inline vec_t vmin_tumu(bool_t a, vec_t b, vec_t c, vec_t d, size_t e) { return __riscv_vmin##IS_U##_tumu(a, b, c, d, e); } \
static inline vec_t vmax_tumu(bool_t a, vec_t b, vec_t c, vec_t d, size_t e) { return __riscv_vmax##IS_U##_tumu(a, b, c, d, e); } \
static inline vec_t vredmin(vec_t a, vec_t b, size_t c) { return __riscv_vredmin##IS_U(a, b, c); } \
static inline vec_t vredmax(vec_t a, vec_t b, size_t c) { return __riscv_vredmax##IS_U(a, b, c); } \
};
HAL_RVV_GENERATOR(uchar , 8 , u8 , u, m1, m1 , 8 )
HAL_RVV_GENERATOR(schar , 8 , i8 , , m1, m1 , 8 )
HAL_RVV_GENERATOR(ushort, 16, u16, u, m1, mf2, 16)
HAL_RVV_GENERATOR(short , 16, i16, , m1, mf2, 16)
#undef HAL_RVV_GENERATOR
#define HAL_RVV_GENERATOR(T, NAME, EEW, TYPE, IS_F, F_OR_S, F_OR_X, EMUL, M_EMUL, P_EMUL, B_LEN) \
template<> struct rvv<T> \
{ \
using vec_t = v##NAME##EEW##EMUL##_t; \
using bool_t = vbool##B_LEN##_t; \
static inline size_t vsetvlmax() { return __riscv_vsetvlmax_e##EEW##EMUL(); } \
static inline size_t vsetvl(size_t a) { return __riscv_vsetvl_e##EEW##EMUL(a); } \
static inline vec_t vmv_v_x(T a, size_t b) { return __riscv_v##IS_F##mv_v_##F_OR_X##_##TYPE##EMUL(a, b); } \
static inline vuint32##P_EMUL##_t vid(size_t a) { return __riscv_vid_v_u32##P_EMUL(a); } \
static inline vuint32##P_EMUL##_t vundefined() { return __riscv_vundefined_u32##P_EMUL(); } \
static inline vec_t vle(const T* a, size_t b) { return __riscv_vle##EEW##_v_##TYPE##EMUL(a, b); } \
static inline vuint8##M_EMUL##_t vle_mask(const uchar* a, size_t b) { return __riscv_vle8_v_u8##M_EMUL(a, b); } \
static inline bool_t vmlt(vec_t a, vec_t b, size_t c) { return __riscv_vm##F_OR_S##lt(a, b, c); } \
static inline bool_t vmgt(vec_t a, vec_t b, size_t c) { return __riscv_vm##F_OR_S##gt(a, b, c); } \
static inline bool_t vmlt_mu(bool_t a, bool_t b, vec_t c, vec_t d, size_t e) { return __riscv_vm##F_OR_S##lt##_mu(a, b, c, d, e); } \
static inline bool_t vmgt_mu(bool_t a, bool_t b, vec_t c, vec_t d, size_t e) { return __riscv_vm##F_OR_S##gt##_mu(a, b, c, d, e); } \
static inline T vmv_x_s(vec_t a) { return __riscv_v##IS_F##mv_##F_OR_X(a); } \
};
HAL_RVV_GENERATOR(int , int , 32, i32, , s, x, m4, m1 , m4, 8 )
HAL_RVV_GENERATOR(float , float, 32, f32, f, f, f, m4, m1 , m4, 8 )
HAL_RVV_GENERATOR(double, float, 64, f64, f, f, f, m4, mf2, m2, 16)
#undef HAL_RVV_GENERATOR
}
template<typename T>
inline int minMaxIdxReadTwice(const uchar* src_data, size_t src_step, int width, int height, double* minVal, double* maxVal,
int* minIdx, int* maxIdx, uchar* mask, size_t mask_step)
{
int vlmax = rvv<T>::vsetvlmax();
auto vec_min = rvv<T>::vmv_v_x(std::numeric_limits<T>::max(), vlmax);
auto vec_max = rvv<T>::vmv_v_x(std::numeric_limits<T>::lowest(), vlmax);
T val_min, val_max;
if (mask)
{
for (int i = 0; i < height; i++)
{
const T* src_row = reinterpret_cast<const T*>(src_data + i * src_step);
const uchar* mask_row = mask + i * mask_step;
int vl;
for (int j = 0; j < width; j += vl)
{
vl = rvv<T>::vsetvl(width - j);
auto vec_src = rvv<T>::vle(src_row + j, vl);
auto vec_mask = rvv<T>::vle_mask(mask_row + j, vl);
auto bool_mask = __riscv_vmsne(vec_mask, 0, vl);
vec_min = rvv<T>::vmin_tumu(bool_mask, vec_min, vec_min, vec_src, vl);
vec_max = rvv<T>::vmax_tumu(bool_mask, vec_max, vec_max, vec_src, vl);
}
}
auto sc_minval = rvv<T>::vmv_v_x(std::numeric_limits<T>::max(), vlmax);
auto sc_maxval = rvv<T>::vmv_v_x(std::numeric_limits<T>::lowest(), vlmax);
sc_minval = rvv<T>::vredmin(vec_min, sc_minval, vlmax);
sc_maxval = rvv<T>::vredmax(vec_max, sc_maxval, vlmax);
val_min = __riscv_vmv_x(sc_minval);
val_max = __riscv_vmv_x(sc_maxval);
bool found_min = !minIdx, found_max = !maxIdx;
for (int i = 0; i < height && (!found_min || !found_max); i++)
{
const T* src_row = reinterpret_cast<const T*>(src_data + i * src_step);
const uchar* mask_row = mask + i * mask_step;
int vl;
for (int j = 0; j < width && (!found_min || !found_max); j += vl)
{
vl = rvv<T>::vsetvl(width - j);
auto vec_src = rvv<T>::vle(src_row + j, vl);
auto vec_mask = rvv<T>::vle_mask(mask_row + j, vl);
auto bool_mask = __riscv_vmsne(vec_mask, 0, vl);
auto bool_zero = __riscv_vmxor(bool_mask, bool_mask, vl);
if (!found_min)
{
auto bool_minpos = __riscv_vmseq_mu(bool_mask, bool_zero, vec_src, val_min, vl);
int index = __riscv_vfirst(bool_minpos, vl);
if (index != -1)
{
found_min = true;
minIdx[0] = i;
minIdx[1] = j + index;
}
}
if (!found_max)
{
auto bool_maxpos = __riscv_vmseq_mu(bool_mask, bool_zero, vec_src, val_max, vl);
int index = __riscv_vfirst(bool_maxpos, vl);
if (index != -1)
{
found_max = true;
maxIdx[0] = i;
maxIdx[1] = j + index;
}
}
}
}
}
else
{
for (int i = 0; i < height; i++)
{
const T* src_row = reinterpret_cast<const T*>(src_data + i * src_step);
int vl;
for (int j = 0; j < width; j += vl)
{
vl = rvv<T>::vsetvl(width - j);
auto vec_src = rvv<T>::vle(src_row + j, vl);
vec_min = rvv<T>::vmin_tu(vec_min, vec_min, vec_src, vl);
vec_max = rvv<T>::vmax_tu(vec_max, vec_max, vec_src, vl);
}
}
auto sc_minval = rvv<T>::vmv_v_x(std::numeric_limits<T>::max(), vlmax);
auto sc_maxval = rvv<T>::vmv_v_x(std::numeric_limits<T>::lowest(), vlmax);
sc_minval = rvv<T>::vredmin(vec_min, sc_minval, vlmax);
sc_maxval = rvv<T>::vredmax(vec_max, sc_maxval, vlmax);
val_min = __riscv_vmv_x(sc_minval);
val_max = __riscv_vmv_x(sc_maxval);
bool found_min = !minIdx, found_max = !maxIdx;
for (int i = 0; i < height && (!found_min || !found_max); i++)
{
const T* src_row = reinterpret_cast<const T*>(src_data + i * src_step);
int vl;
for (int j = 0; j < width && (!found_min || !found_max); j += vl)
{
vl = rvv<T>::vsetvl(width - j);
auto vec_src = rvv<T>::vle(src_row + j, vl);
if (!found_min)
{
auto bool_minpos = __riscv_vmseq(vec_src, val_min, vl);
int index = __riscv_vfirst(bool_minpos, vl);
if (index != -1)
{
found_min = true;
minIdx[0] = i;
minIdx[1] = j + index;
}
}
if (!found_max)
{
auto bool_maxpos = __riscv_vmseq(vec_src, val_max, vl);
int index = __riscv_vfirst(bool_maxpos, vl);
if (index != -1)
{
found_max = true;
maxIdx[0] = i;
maxIdx[1] = j + index;
}
}
}
}
}
if (minVal)
{
*minVal = val_min;
}
if (maxVal)
{
*maxVal = val_max;
}
return CV_HAL_ERROR_OK;
}
template<typename T>
inline int minMaxIdxReadOnce(const uchar* src_data, size_t src_step, int width, int height, double* minVal, double* maxVal,
int* minIdx, int* maxIdx, uchar* mask, size_t mask_step)
{
int vlmax = rvv<T>::vsetvlmax();
auto vec_min = rvv<T>::vmv_v_x(std::numeric_limits<T>::max(), vlmax);
auto vec_max = rvv<T>::vmv_v_x(std::numeric_limits<T>::lowest(), vlmax);
auto vec_pos = rvv<T>::vid(vlmax);
auto vec_minpos = rvv<T>::vundefined(), vec_maxpos = rvv<T>::vundefined();
T val_min, val_max;
if (mask)
{
for (int i = 0; i < height; i++)
{
const T* src_row = reinterpret_cast<const T*>(src_data + i * src_step);
const uchar* mask_row = mask + i * mask_step;
int vl;
for (int j = 0; j < width; j += vl)
{
vl = rvv<T>::vsetvl(width - j);
auto vec_src = rvv<T>::vle(src_row + j, vl);
auto vec_mask = rvv<T>::vle_mask(mask_row + j, vl);
auto bool_mask = __riscv_vmsne(vec_mask, 0, vl);
auto bool_zero = __riscv_vmxor(bool_mask, bool_mask, vl);
auto bool_minpos = rvv<T>::vmlt_mu(bool_mask, bool_zero, vec_src, vec_min, vl);
auto bool_maxpos = rvv<T>::vmgt_mu(bool_mask, bool_zero, vec_src, vec_max, vl);
vec_minpos = __riscv_vmerge_tu(vec_minpos, vec_minpos, vec_pos, bool_minpos, vl);
vec_maxpos = __riscv_vmerge_tu(vec_maxpos, vec_maxpos, vec_pos, bool_maxpos, vl);
vec_min = __riscv_vmerge_tu(vec_min, vec_min, vec_src, bool_minpos, vl);
vec_max = __riscv_vmerge_tu(vec_max, vec_max, vec_src, bool_maxpos, vl);
vec_pos = __riscv_vadd(vec_pos, vl, vlmax);
}
}
}
else
{
for (int i = 0; i < height; i++)
{
const T* src_row = reinterpret_cast<const T*>(src_data + i * src_step);
int vl;
for (int j = 0; j < width; j += vl)
{
vl = rvv<T>::vsetvl(width - j);
auto vec_src = rvv<T>::vle(src_row + j, vl);
auto bool_minpos = rvv<T>::vmlt(vec_src, vec_min, vl);
auto bool_maxpos = rvv<T>::vmgt(vec_src, vec_max, vl);
vec_minpos = __riscv_vmerge_tu(vec_minpos, vec_minpos, vec_pos, bool_minpos, vl);
vec_maxpos = __riscv_vmerge_tu(vec_maxpos, vec_maxpos, vec_pos, bool_maxpos, vl);
vec_min = __riscv_vmerge_tu(vec_min, vec_min, vec_src, bool_minpos, vl);
vec_max = __riscv_vmerge_tu(vec_max, vec_max, vec_src, bool_maxpos, vl);
vec_pos = __riscv_vadd(vec_pos, vl, vlmax);
}
}
}
val_min = std::numeric_limits<T>::max();
val_max = std::numeric_limits<T>::lowest();
for (int i = 0; i < vlmax; i++)
{
if (val_min > rvv<T>::vmv_x_s(vec_min))
{
val_min = rvv<T>::vmv_x_s(vec_min);
if (minIdx)
{
minIdx[0] = __riscv_vmv_x(vec_minpos) / width;
minIdx[1] = __riscv_vmv_x(vec_minpos) % width;
}
}
if (val_max < rvv<T>::vmv_x_s(vec_max))
{
val_max = rvv<T>::vmv_x_s(vec_max);
if (maxIdx)
{
maxIdx[0] = __riscv_vmv_x(vec_maxpos) / width;
maxIdx[1] = __riscv_vmv_x(vec_maxpos) % width;
}
}
vec_min = __riscv_vslidedown(vec_min, 1, vlmax);
vec_max = __riscv_vslidedown(vec_max, 1, vlmax);
vec_minpos = __riscv_vslidedown(vec_minpos, 1, vlmax);
vec_maxpos = __riscv_vslidedown(vec_maxpos, 1, vlmax);
}
if (minVal)
{
*minVal = val_min;
}
if (maxVal)
{
*maxVal = val_max;
}
return CV_HAL_ERROR_OK;
}
inline int minMaxIdx(const uchar* src_data, size_t src_step, int width, int height, int depth, double* minVal, double* maxVal,
int* minIdx, int* maxIdx, uchar* mask, size_t mask_step = 0)
{
if (!mask_step)
mask_step = src_step;
switch (depth)
{
case CV_8UC1:
return minMaxIdxReadTwice<uchar>(src_data, src_step, width, height, minVal, maxVal, minIdx, maxIdx, mask, mask_step);
case CV_8SC1:
return minMaxIdxReadTwice<schar>(src_data, src_step, width, height, minVal, maxVal, minIdx, maxIdx, mask, mask_step);
case CV_16UC1:
return minMaxIdxReadTwice<ushort>(src_data, src_step, width, height, minVal, maxVal, minIdx, maxIdx, mask, mask_step);
case CV_16SC1:
return minMaxIdxReadTwice<short>(src_data, src_step, width, height, minVal, maxVal, minIdx, maxIdx, mask, mask_step);
case CV_32SC1:
return minMaxIdxReadOnce<int>(src_data, src_step, width, height, minVal, maxVal, minIdx, maxIdx, mask, mask_step);
case CV_32FC1:
return minMaxIdxReadOnce<float>(src_data, src_step, width, height, minVal, maxVal, minIdx, maxIdx, mask, mask_step);
case CV_64FC1:
return minMaxIdxReadOnce<double>(src_data, src_step, width, height, minVal, maxVal, minIdx, maxIdx, mask, mask_step);
}
return CV_HAL_ERROR_NOT_IMPLEMENTED;
}
}}
#endif

View File

@ -1,517 +0,0 @@
// This file is part of OpenCV project.
// It is subject to the license terms in the LICENSE file found in the top-level directory
// of this distribution and at http://opencv.org/license.html.
#ifndef OPENCV_HAL_RVV_NORM_HPP_INCLUDED
#define OPENCV_HAL_RVV_NORM_HPP_INCLUDED
#include <riscv_vector.h>
namespace cv { namespace cv_hal_rvv {
#undef cv_hal_norm
#define cv_hal_norm cv::cv_hal_rvv::norm
inline int normInf_8UC1(const uchar* src, size_t src_step, const uchar* mask, size_t mask_step, int width, int height, double* result)
{
int vlmax = __riscv_vsetvlmax_e8m8();
auto vec_max = __riscv_vmv_v_x_u8m8(0, vlmax);
if (mask)
{
for (int i = 0; i < height; i++)
{
const uchar* src_row = src + i * src_step;
const uchar* mask_row = mask + i * mask_step;
int vl;
for (int j = 0; j < width; j += vl)
{
vl = __riscv_vsetvl_e8m8(width - j);
auto vec_src = __riscv_vle8_v_u8m8(src_row + j, vl);
auto vec_mask = __riscv_vle8_v_u8m8(mask_row + j, vl);
auto bool_mask = __riscv_vmsne(vec_mask, 0, vl);
vec_max = __riscv_vmaxu_tumu(bool_mask, vec_max, vec_max, vec_src, vl);
}
}
}
else
{
for (int i = 0; i < height; i++)
{
const uchar* src_row = src + i * src_step;
int vl;
for (int j = 0; j < width; j += vl)
{
vl = __riscv_vsetvl_e8m8(width - j);
auto vec_src = __riscv_vle8_v_u8m8(src_row + j, vl);
vec_max = __riscv_vmaxu_tu(vec_max, vec_max, vec_src, vl);
}
}
}
auto sc_max = __riscv_vmv_s_x_u8m1(0, vlmax);
sc_max = __riscv_vredmaxu(vec_max, sc_max, vlmax);
*result = __riscv_vmv_x(sc_max);
return CV_HAL_ERROR_OK;
}
inline int normL1_8UC1(const uchar* src, size_t src_step, const uchar* mask, size_t mask_step, int width, int height, double* result)
{
int vlmax = __riscv_vsetvlmax_e8m2();
auto vec_sum = __riscv_vmv_v_x_u32m8(0, vlmax);
if (mask)
{
for (int i = 0; i < height; i++)
{
const uchar* src_row = src + i * src_step;
const uchar* mask_row = mask + i * mask_step;
int vl;
for (int j = 0; j < width; j += vl)
{
vl = __riscv_vsetvl_e8m2(width - j);
auto vec_src = __riscv_vle8_v_u8m2(src_row + j, vl);
auto vec_mask = __riscv_vle8_v_u8m2(mask_row + j, vl);
auto bool_mask = __riscv_vmsne(vec_mask, 0, vl);
auto vec_zext = __riscv_vzext_vf4_u32m8_m(bool_mask, vec_src, vl);
vec_sum = __riscv_vadd_tumu(bool_mask, vec_sum, vec_sum, vec_zext, vl);
}
}
}
else
{
for (int i = 0; i < height; i++)
{
const uchar* src_row = src + i * src_step;
int vl;
for (int j = 0; j < width; j += vl)
{
vl = __riscv_vsetvl_e8m2(width - j);
auto vec_src = __riscv_vle8_v_u8m2(src_row + j, vl);
auto vec_zext = __riscv_vzext_vf4(vec_src, vl);
vec_sum = __riscv_vadd_tu(vec_sum, vec_sum, vec_zext, vl);
}
}
}
auto sc_sum = __riscv_vmv_s_x_u32m1(0, vlmax);
sc_sum = __riscv_vredsum(vec_sum, sc_sum, vlmax);
*result = __riscv_vmv_x(sc_sum);
return CV_HAL_ERROR_OK;
}
inline int normL2Sqr_8UC1(const uchar* src, size_t src_step, const uchar* mask, size_t mask_step, int width, int height, double* result)
{
int vlmax = __riscv_vsetvlmax_e8m2();
auto vec_sum = __riscv_vmv_v_x_u32m8(0, vlmax);
int cnt = 0;
auto reduce = [&](int vl) {
if ((cnt += vl) < (1 << 16))
return;
cnt = vl;
for (int i = 0; i < vlmax; i++)
{
*result += __riscv_vmv_x(vec_sum);
vec_sum = __riscv_vslidedown(vec_sum, 1, vlmax);
}
vec_sum = __riscv_vmv_v_x_u32m8(0, vlmax);
};
*result = 0;
if (mask)
{
for (int i = 0; i < height; i++)
{
const uchar* src_row = src + i * src_step;
const uchar* mask_row = mask + i * mask_step;
int vl;
for (int j = 0; j < width; j += vl)
{
vl = __riscv_vsetvl_e8m2(width - j);
reduce(vl);
auto vec_src = __riscv_vle8_v_u8m2(src_row + j, vl);
auto vec_mask = __riscv_vle8_v_u8m2(mask_row + j, vl);
auto bool_mask = __riscv_vmsne(vec_mask, 0, vl);
auto vec_mul = __riscv_vwmulu_vv_u16m4_m(bool_mask, vec_src, vec_src, vl);
auto vec_zext = __riscv_vzext_vf2_u32m8_m(bool_mask, vec_mul, vl);
vec_sum = __riscv_vadd_tumu(bool_mask, vec_sum, vec_sum, vec_zext, vl);
}
}
}
else
{
for (int i = 0; i < height; i++)
{
const uchar* src_row = src + i * src_step;
int vl;
for (int j = 0; j < width; j += vl)
{
vl = __riscv_vsetvl_e8m2(width - j);
reduce(vl);
auto vec_src = __riscv_vle8_v_u8m2(src_row + j, vl);
auto vec_mul = __riscv_vwmulu(vec_src, vec_src, vl);
auto vec_zext = __riscv_vzext_vf2(vec_mul, vl);
vec_sum = __riscv_vadd_tu(vec_sum, vec_sum, vec_zext, vl);
}
}
}
reduce(1 << 16);
return CV_HAL_ERROR_OK;
}
inline int normInf_8UC4(const uchar* src, size_t src_step, const uchar* mask, size_t mask_step, int width, int height, double* result)
{
int vlmax = __riscv_vsetvlmax_e8m8();
auto vec_max = __riscv_vmv_v_x_u8m8(0, vlmax);
if (mask)
{
for (int i = 0; i < height; i++)
{
const uchar* src_row = src + i * src_step;
const uchar* mask_row = mask + i * mask_step;
int vl, vlm;
for (int j = 0, jm = 0; j < width * 4; j += vl, jm += vlm)
{
vl = __riscv_vsetvl_e8m8(width * 4 - j);
vlm = __riscv_vsetvl_e8m2(width - jm);
auto vec_src = __riscv_vle8_v_u8m8(src_row + j, vl);
auto vec_mask = __riscv_vle8_v_u8m2(mask_row + jm, vlm);
auto vec_mask_ext = __riscv_vmul(__riscv_vzext_vf4(__riscv_vminu(vec_mask, 1, vlm), vlm), 0x01010101, vlm);
auto bool_mask_ext = __riscv_vmsne(__riscv_vreinterpret_u8m8(vec_mask_ext), 0, vl);
vec_max = __riscv_vmaxu_tumu(bool_mask_ext, vec_max, vec_max, vec_src, vl);
}
}
}
else
{
for (int i = 0; i < height; i++)
{
const uchar* src_row = src + i * src_step;
int vl;
for (int j = 0; j < width * 4; j += vl)
{
vl = __riscv_vsetvl_e8m8(width * 4 - j);
auto vec_src = __riscv_vle8_v_u8m8(src_row + j, vl);
vec_max = __riscv_vmaxu_tu(vec_max, vec_max, vec_src, vl);
}
}
}
auto sc_max = __riscv_vmv_s_x_u8m1(0, vlmax);
sc_max = __riscv_vredmaxu(vec_max, sc_max, vlmax);
*result = __riscv_vmv_x(sc_max);
return CV_HAL_ERROR_OK;
}
inline int normL1_8UC4(const uchar* src, size_t src_step, const uchar* mask, size_t mask_step, int width, int height, double* result)
{
int vlmax = __riscv_vsetvlmax_e8m2();
auto vec_sum = __riscv_vmv_v_x_u32m8(0, vlmax);
if (mask)
{
for (int i = 0; i < height; i++)
{
const uchar* src_row = src + i * src_step;
const uchar* mask_row = mask + i * mask_step;
int vl, vlm;
for (int j = 0, jm = 0; j < width * 4; j += vl, jm += vlm)
{
vl = __riscv_vsetvl_e8m2(width * 4 - j);
vlm = __riscv_vsetvl_e8mf2(width - jm);
auto vec_src = __riscv_vle8_v_u8m2(src_row + j, vl);
auto vec_mask = __riscv_vle8_v_u8mf2(mask_row + jm, vlm);
auto vec_mask_ext = __riscv_vmul(__riscv_vzext_vf4(__riscv_vminu(vec_mask, 1, vlm), vlm), 0x01010101, vlm);
auto bool_mask_ext = __riscv_vmsne(__riscv_vreinterpret_u8m2(vec_mask_ext), 0, vl);
auto vec_zext = __riscv_vzext_vf4_u32m8_m(bool_mask_ext, vec_src, vl);
vec_sum = __riscv_vadd_tumu(bool_mask_ext, vec_sum, vec_sum, vec_zext, vl);
}
}
}
else
{
for (int i = 0; i < height; i++)
{
const uchar* src_row = src + i * src_step;
int vl;
for (int j = 0; j < width * 4; j += vl)
{
vl = __riscv_vsetvl_e8m2(width * 4 - j);
auto vec_src = __riscv_vle8_v_u8m2(src_row + j, vl);
auto vec_zext = __riscv_vzext_vf4(vec_src, vl);
vec_sum = __riscv_vadd_tu(vec_sum, vec_sum, vec_zext, vl);
}
}
}
auto sc_sum = __riscv_vmv_s_x_u32m1(0, vlmax);
sc_sum = __riscv_vredsum(vec_sum, sc_sum, vlmax);
*result = __riscv_vmv_x(sc_sum);
return CV_HAL_ERROR_OK;
}
inline int normL2Sqr_8UC4(const uchar* src, size_t src_step, const uchar* mask, size_t mask_step, int width, int height, double* result)
{
int vlmax = __riscv_vsetvlmax_e8m2();
auto vec_sum = __riscv_vmv_v_x_u32m8(0, vlmax);
int cnt = 0;
auto reduce = [&](int vl) {
if ((cnt += vl) < (1 << 16))
return;
cnt = vl;
for (int i = 0; i < vlmax; i++)
{
*result += __riscv_vmv_x(vec_sum);
vec_sum = __riscv_vslidedown(vec_sum, 1, vlmax);
}
vec_sum = __riscv_vmv_v_x_u32m8(0, vlmax);
};
*result = 0;
if (mask)
{
for (int i = 0; i < height; i++)
{
const uchar* src_row = src + i * src_step;
const uchar* mask_row = mask + i * mask_step;
int vl, vlm;
for (int j = 0, jm = 0; j < width * 4; j += vl, jm += vlm)
{
vl = __riscv_vsetvl_e8m2(width * 4 - j);
vlm = __riscv_vsetvl_e8mf2(width - jm);
reduce(vl);
auto vec_src = __riscv_vle8_v_u8m2(src_row + j, vl);
auto vec_mask = __riscv_vle8_v_u8mf2(mask_row + jm, vlm);
auto vec_mask_ext = __riscv_vmul(__riscv_vzext_vf4(__riscv_vminu(vec_mask, 1, vlm), vlm), 0x01010101, vlm);
auto bool_mask_ext = __riscv_vmsne(__riscv_vreinterpret_u8m2(vec_mask_ext), 0, vl);
auto vec_mul = __riscv_vwmulu_vv_u16m4_m(bool_mask_ext, vec_src, vec_src, vl);
auto vec_zext = __riscv_vzext_vf2_u32m8_m(bool_mask_ext, vec_mul, vl);
vec_sum = __riscv_vadd_tumu(bool_mask_ext, vec_sum, vec_sum, vec_zext, vl);
}
}
}
else
{
for (int i = 0; i < height; i++)
{
const uchar* src_row = src + i * src_step;
int vl;
for (int j = 0; j < width * 4; j += vl)
{
vl = __riscv_vsetvl_e8m2(width * 4 - j);
reduce(vl);
auto vec_src = __riscv_vle8_v_u8m2(src_row + j, vl);
auto vec_mul = __riscv_vwmulu(vec_src, vec_src, vl);
auto vec_zext = __riscv_vzext_vf2(vec_mul, vl);
vec_sum = __riscv_vadd_tu(vec_sum, vec_sum, vec_zext, vl);
}
}
}
reduce(1 << 16);
return CV_HAL_ERROR_OK;
}
inline int normInf_32FC1(const uchar* src, size_t src_step, const uchar* mask, size_t mask_step, int width, int height, double* result)
{
int vlmax = __riscv_vsetvlmax_e32m8();
auto vec_max = __riscv_vfmv_v_f_f32m8(0, vlmax);
if (mask)
{
for (int i = 0; i < height; i++)
{
const float* src_row = reinterpret_cast<const float*>(src + i * src_step);
const uchar* mask_row = mask + i * mask_step;
int vl;
for (int j = 0; j < width; j += vl)
{
vl = __riscv_vsetvl_e32m8(width - j);
auto vec_src = __riscv_vle32_v_f32m8(src_row + j, vl);
auto vec_mask = __riscv_vle8_v_u8m2(mask_row + j, vl);
auto bool_mask = __riscv_vmsne(vec_mask, 0, vl);
auto vec_abs = __riscv_vfabs_v_f32m8_m(bool_mask, vec_src, vl);
vec_max = __riscv_vfmax_tumu(bool_mask, vec_max, vec_max, vec_abs, vl);
}
}
}
else
{
for (int i = 0; i < height; i++)
{
const float* src_row = reinterpret_cast<const float*>(src + i * src_step);
int vl;
for (int j = 0; j < width; j += vl)
{
vl = __riscv_vsetvl_e32m8(width - j);
auto vec_src = __riscv_vle32_v_f32m8(src_row + j, vl);
auto vec_abs = __riscv_vfabs(vec_src, vl);
vec_max = __riscv_vfmax_tu(vec_max, vec_max, vec_abs, vl);
}
}
}
auto sc_max = __riscv_vfmv_s_f_f32m1(0, vlmax);
sc_max = __riscv_vfredmax(vec_max, sc_max, vlmax);
*result = __riscv_vfmv_f(sc_max);
return CV_HAL_ERROR_OK;
}
inline int normL1_32FC1(const uchar* src, size_t src_step, const uchar* mask, size_t mask_step, int width, int height, double* result)
{
int vlmax = __riscv_vsetvlmax_e32m4();
auto vec_sum = __riscv_vfmv_v_f_f64m8(0, vlmax);
if (mask)
{
for (int i = 0; i < height; i++)
{
const float* src_row = reinterpret_cast<const float*>(src + i * src_step);
const uchar* mask_row = mask + i * mask_step;
int vl;
for (int j = 0; j < width; j += vl)
{
vl = __riscv_vsetvl_e32m4(width - j);
auto vec_src = __riscv_vle32_v_f32m4(src_row + j, vl);
auto vec_mask = __riscv_vle8_v_u8m1(mask_row + j, vl);
auto bool_mask = __riscv_vmsne(vec_mask, 0, vl);
auto vec_abs = __riscv_vfabs_v_f32m4_m(bool_mask, vec_src, vl);
auto vec_fext = __riscv_vfwcvt_f_f_v_f64m8_m(bool_mask, vec_abs, vl);
vec_sum = __riscv_vfadd_tumu(bool_mask, vec_sum, vec_sum, vec_fext, vl);
}
}
}
else
{
for (int i = 0; i < height; i++)
{
const float* src_row = reinterpret_cast<const float*>(src + i * src_step);
int vl;
for (int j = 0; j < width; j += vl)
{
vl = __riscv_vsetvl_e32m4(width - j);
auto vec_src = __riscv_vle32_v_f32m4(src_row + j, vl);
auto vec_abs = __riscv_vfabs(vec_src, vl);
auto vec_fext = __riscv_vfwcvt_f_f_v_f64m8(vec_abs, vl);
vec_sum = __riscv_vfadd_tu(vec_sum, vec_sum, vec_fext, vl);
}
}
}
auto sc_sum = __riscv_vfmv_s_f_f64m1(0, vlmax);
sc_sum = __riscv_vfredosum(vec_sum, sc_sum, vlmax);
*result = __riscv_vfmv_f(sc_sum);
return CV_HAL_ERROR_OK;
}
inline int normL2Sqr_32FC1(const uchar* src, size_t src_step, const uchar* mask, size_t mask_step, int width, int height, double* result)
{
int vlmax = __riscv_vsetvlmax_e32m4();
auto vec_sum = __riscv_vfmv_v_f_f64m8(0, vlmax);
if (mask)
{
for (int i = 0; i < height; i++)
{
const float* src_row = reinterpret_cast<const float*>(src + i * src_step);
const uchar* mask_row = mask + i * mask_step;
int vl;
for (int j = 0; j < width; j += vl)
{
vl = __riscv_vsetvl_e32m4(width - j);
auto vec_src = __riscv_vle32_v_f32m4(src_row + j, vl);
auto vec_mask = __riscv_vle8_v_u8m1(mask_row + j, vl);
auto bool_mask = __riscv_vmsne(vec_mask, 0, vl);
auto vec_mul = __riscv_vfwmul_vv_f64m8_m(bool_mask, vec_src, vec_src, vl);
vec_sum = __riscv_vfadd_tumu(bool_mask, vec_sum, vec_sum, vec_mul, vl);
}
}
}
else
{
for (int i = 0; i < height; i++)
{
const float* src_row = reinterpret_cast<const float*>(src + i * src_step);
int vl;
for (int j = 0; j < width; j += vl)
{
vl = __riscv_vsetvl_e32m4(width - j);
auto vec_src = __riscv_vle32_v_f32m4(src_row + j, vl);
auto vec_mul = __riscv_vfwmul(vec_src, vec_src, vl);
vec_sum = __riscv_vfadd_tu(vec_sum, vec_sum, vec_mul, vl);
}
}
}
auto sc_sum = __riscv_vfmv_s_f_f64m1(0, vlmax);
sc_sum = __riscv_vfredosum(vec_sum, sc_sum, vlmax);
*result = __riscv_vfmv_f(sc_sum);
return CV_HAL_ERROR_OK;
}
inline int norm(const uchar* src, size_t src_step, const uchar* mask, size_t mask_step, int width,
int height, int type, int norm_type, double* result)
{
if (!result)
return CV_HAL_ERROR_OK;
switch (type)
{
case CV_8UC1:
switch (norm_type)
{
case NORM_INF:
return normInf_8UC1(src, src_step, mask, mask_step, width, height, result);
case NORM_L1:
return normL1_8UC1(src, src_step, mask, mask_step, width, height, result);
case NORM_L2SQR:
return normL2Sqr_8UC1(src, src_step, mask, mask_step, width, height, result);
case NORM_L2:
int ret = normL2Sqr_8UC1(src, src_step, mask, mask_step, width, height, result);
*result = std::sqrt(*result);
return ret;
}
return CV_HAL_ERROR_NOT_IMPLEMENTED;
case CV_8UC4:
switch (norm_type)
{
case NORM_INF:
return normInf_8UC4(src, src_step, mask, mask_step, width, height, result);
case NORM_L1:
return normL1_8UC4(src, src_step, mask, mask_step, width, height, result);
case NORM_L2SQR:
return normL2Sqr_8UC4(src, src_step, mask, mask_step, width, height, result);
case NORM_L2:
int ret = normL2Sqr_8UC4(src, src_step, mask, mask_step, width, height, result);
*result = std::sqrt(*result);
return ret;
}
return CV_HAL_ERROR_NOT_IMPLEMENTED;
case CV_32FC1:
switch (norm_type)
{
case NORM_INF:
return normInf_32FC1(src, src_step, mask, mask_step, width, height, result);
case NORM_L1:
return normL1_32FC1(src, src_step, mask, mask_step, width, height, result);
case NORM_L2SQR:
return normL2Sqr_32FC1(src, src_step, mask, mask_step, width, height, result);
case NORM_L2:
int ret = normL2Sqr_32FC1(src, src_step, mask, mask_step, width, height, result);
*result = std::sqrt(*result);
return ret;
}
return CV_HAL_ERROR_NOT_IMPLEMENTED;
}
return CV_HAL_ERROR_NOT_IMPLEMENTED;
}
}}
#endif

View File

@ -1,605 +0,0 @@
// This file is part of OpenCV project.
// It is subject to the license terms in the LICENSE file found in the top-level directory
// of this distribution and at http://opencv.org/license.html.
#ifndef OPENCV_HAL_RVV_NORM_DIFF_HPP_INCLUDED
#define OPENCV_HAL_RVV_NORM_DIFF_HPP_INCLUDED
#include <riscv_vector.h>
namespace cv { namespace cv_hal_rvv {
#undef cv_hal_normDiff
#define cv_hal_normDiff cv::cv_hal_rvv::normDiff
inline int normDiffInf_8UC1(const uchar* src1, size_t src1_step, const uchar* src2, size_t src2_step, const uchar* mask, size_t mask_step, int width, int height, double* result)
{
int vlmax = __riscv_vsetvlmax_e8m8();
auto vec_max = __riscv_vmv_v_x_u8m8(0, vlmax);
if (mask)
{
for (int i = 0; i < height; i++)
{
const uchar* src1_row = src1 + i * src1_step;
const uchar* src2_row = src2 + i * src2_step;
const uchar* mask_row = mask + i * mask_step;
int vl;
for (int j = 0; j < width; j += vl)
{
vl = __riscv_vsetvl_e8m8(width - j);
auto vec_src1 = __riscv_vle8_v_u8m8(src1_row + j, vl);
auto vec_src2 = __riscv_vle8_v_u8m8(src2_row + j, vl);
auto vec_mask = __riscv_vle8_v_u8m8(mask_row + j, vl);
auto bool_mask = __riscv_vmsne(vec_mask, 0, vl);
auto vec_src = __riscv_vsub_vv_u8m8_m(bool_mask, __riscv_vmaxu_vv_u8m8_m(bool_mask, vec_src1, vec_src2, vl),
__riscv_vminu_vv_u8m8_m(bool_mask, vec_src1, vec_src2, vl), vl);
vec_max = __riscv_vmaxu_tumu(bool_mask, vec_max, vec_max, vec_src, vl);
}
}
}
else
{
for (int i = 0; i < height; i++)
{
const uchar* src1_row = src1 + i * src1_step;
const uchar* src2_row = src2 + i * src2_step;
int vl;
for (int j = 0; j < width; j += vl)
{
vl = __riscv_vsetvl_e8m8(width - j);
auto vec_src1 = __riscv_vle8_v_u8m8(src1_row + j, vl);
auto vec_src2 = __riscv_vle8_v_u8m8(src2_row + j, vl);
auto vec_src = __riscv_vsub(__riscv_vmaxu(vec_src1, vec_src2, vl), __riscv_vminu(vec_src1, vec_src2, vl), vl);
vec_max = __riscv_vmaxu_tu(vec_max, vec_max, vec_src, vl);
}
}
}
auto sc_max = __riscv_vmv_s_x_u8m1(0, vlmax);
sc_max = __riscv_vredmaxu(vec_max, sc_max, vlmax);
*result = __riscv_vmv_x(sc_max);
return CV_HAL_ERROR_OK;
}
inline int normDiffL1_8UC1(const uchar* src1, size_t src1_step, const uchar* src2, size_t src2_step, const uchar* mask, size_t mask_step, int width, int height, double* result)
{
int vlmax = __riscv_vsetvlmax_e8m2();
auto vec_sum = __riscv_vmv_v_x_u32m8(0, vlmax);
if (mask)
{
for (int i = 0; i < height; i++)
{
const uchar* src1_row = src1 + i * src1_step;
const uchar* src2_row = src2 + i * src2_step;
const uchar* mask_row = mask + i * mask_step;
int vl;
for (int j = 0; j < width; j += vl)
{
vl = __riscv_vsetvl_e8m2(width - j);
auto vec_src1 = __riscv_vle8_v_u8m2(src1_row + j, vl);
auto vec_src2 = __riscv_vle8_v_u8m2(src2_row + j, vl);
auto vec_mask = __riscv_vle8_v_u8m2(mask_row + j, vl);
auto bool_mask = __riscv_vmsne(vec_mask, 0, vl);
auto vec_src = __riscv_vsub_vv_u8m2_m(bool_mask, __riscv_vmaxu_vv_u8m2_m(bool_mask, vec_src1, vec_src2, vl),
__riscv_vminu_vv_u8m2_m(bool_mask, vec_src1, vec_src2, vl), vl);
auto vec_zext = __riscv_vzext_vf4_u32m8_m(bool_mask, vec_src, vl);
vec_sum = __riscv_vadd_tumu(bool_mask, vec_sum, vec_sum, vec_zext, vl);
}
}
}
else
{
for (int i = 0; i < height; i++)
{
const uchar* src1_row = src1 + i * src1_step;
const uchar* src2_row = src2 + i * src2_step;
int vl;
for (int j = 0; j < width; j += vl)
{
vl = __riscv_vsetvl_e8m2(width - j);
auto vec_src1 = __riscv_vle8_v_u8m2(src1_row + j, vl);
auto vec_src2 = __riscv_vle8_v_u8m2(src2_row + j, vl);
auto vec_src = __riscv_vsub(__riscv_vmaxu(vec_src1, vec_src2, vl), __riscv_vminu(vec_src1, vec_src2, vl), vl);
auto vec_zext = __riscv_vzext_vf4(vec_src, vl);
vec_sum = __riscv_vadd_tu(vec_sum, vec_sum, vec_zext, vl);
}
}
}
auto sc_sum = __riscv_vmv_s_x_u32m1(0, vlmax);
sc_sum = __riscv_vredsum(vec_sum, sc_sum, vlmax);
*result = __riscv_vmv_x(sc_sum);
return CV_HAL_ERROR_OK;
}
inline int normDiffL2Sqr_8UC1(const uchar* src1, size_t src1_step, const uchar* src2, size_t src2_step, const uchar* mask, size_t mask_step, int width, int height, double* result)
{
int vlmax = __riscv_vsetvlmax_e8m2();
auto vec_sum = __riscv_vmv_v_x_u32m8(0, vlmax);
int cnt = 0;
auto reduce = [&](int vl) {
if ((cnt += vl) < (1 << 16))
return;
cnt = vl;
for (int i = 0; i < vlmax; i++)
{
*result += __riscv_vmv_x(vec_sum);
vec_sum = __riscv_vslidedown(vec_sum, 1, vlmax);
}
vec_sum = __riscv_vmv_v_x_u32m8(0, vlmax);
};
*result = 0;
if (mask)
{
for (int i = 0; i < height; i++)
{
const uchar* src1_row = src1 + i * src1_step;
const uchar* src2_row = src2 + i * src2_step;
const uchar* mask_row = mask + i * mask_step;
int vl;
for (int j = 0; j < width; j += vl)
{
vl = __riscv_vsetvl_e8m2(width - j);
reduce(vl);
auto vec_src1 = __riscv_vle8_v_u8m2(src1_row + j, vl);
auto vec_src2 = __riscv_vle8_v_u8m2(src2_row + j, vl);
auto vec_mask = __riscv_vle8_v_u8m2(mask_row + j, vl);
auto bool_mask = __riscv_vmsne(vec_mask, 0, vl);
auto vec_src = __riscv_vsub_vv_u8m2_m(bool_mask, __riscv_vmaxu_vv_u8m2_m(bool_mask, vec_src1, vec_src2, vl),
__riscv_vminu_vv_u8m2_m(bool_mask, vec_src1, vec_src2, vl), vl);
auto vec_mul = __riscv_vwmulu_vv_u16m4_m(bool_mask, vec_src, vec_src, vl);
auto vec_zext = __riscv_vzext_vf2_u32m8_m(bool_mask, vec_mul, vl);
vec_sum = __riscv_vadd_tumu(bool_mask, vec_sum, vec_sum, vec_zext, vl);
}
}
}
else
{
for (int i = 0; i < height; i++)
{
const uchar* src1_row = src1 + i * src1_step;
const uchar* src2_row = src2 + i * src2_step;
int vl;
for (int j = 0; j < width; j += vl)
{
vl = __riscv_vsetvl_e8m2(width - j);
reduce(vl);
auto vec_src1 = __riscv_vle8_v_u8m2(src1_row + j, vl);
auto vec_src2 = __riscv_vle8_v_u8m2(src2_row + j, vl);
auto vec_src = __riscv_vsub(__riscv_vmaxu(vec_src1, vec_src2, vl), __riscv_vminu(vec_src1, vec_src2, vl), vl);
auto vec_mul = __riscv_vwmulu(vec_src, vec_src, vl);
auto vec_zext = __riscv_vzext_vf2(vec_mul, vl);
vec_sum = __riscv_vadd_tu(vec_sum, vec_sum, vec_zext, vl);
}
}
}
reduce(1 << 16);
return CV_HAL_ERROR_OK;
}
inline int normDiffInf_8UC4(const uchar* src1, size_t src1_step, const uchar* src2, size_t src2_step, const uchar* mask, size_t mask_step, int width, int height, double* result)
{
int vlmax = __riscv_vsetvlmax_e8m8();
auto vec_max = __riscv_vmv_v_x_u8m8(0, vlmax);
if (mask)
{
for (int i = 0; i < height; i++)
{
const uchar* src1_row = src1 + i * src1_step;
const uchar* src2_row = src2 + i * src2_step;
const uchar* mask_row = mask + i * mask_step;
int vl, vlm;
for (int j = 0, jm = 0; j < width * 4; j += vl, jm += vlm)
{
vl = __riscv_vsetvl_e8m8(width * 4 - j);
vlm = __riscv_vsetvl_e8m2(width - jm);
auto vec_src1 = __riscv_vle8_v_u8m8(src1_row + j, vl);
auto vec_src2 = __riscv_vle8_v_u8m8(src2_row + j, vl);
auto vec_mask = __riscv_vle8_v_u8m2(mask_row + jm, vlm);
auto vec_mask_ext = __riscv_vmul(__riscv_vzext_vf4(__riscv_vminu(vec_mask, 1, vlm), vlm), 0x01010101, vlm);
auto bool_mask_ext = __riscv_vmsne(__riscv_vreinterpret_u8m8(vec_mask_ext), 0, vl);
auto vec_src = __riscv_vsub_vv_u8m8_m(bool_mask_ext, __riscv_vmaxu_vv_u8m8_m(bool_mask_ext, vec_src1, vec_src2, vl),
__riscv_vminu_vv_u8m8_m(bool_mask_ext, vec_src1, vec_src2, vl), vl);
vec_max = __riscv_vmaxu_tumu(bool_mask_ext, vec_max, vec_max, vec_src, vl);
}
}
}
else
{
for (int i = 0; i < height; i++)
{
const uchar* src1_row = src1 + i * src1_step;
const uchar* src2_row = src2 + i * src2_step;
int vl;
for (int j = 0; j < width * 4; j += vl)
{
vl = __riscv_vsetvl_e8m8(width * 4 - j);
auto vec_src1 = __riscv_vle8_v_u8m8(src1_row + j, vl);
auto vec_src2 = __riscv_vle8_v_u8m8(src2_row + j, vl);
auto vec_src = __riscv_vsub(__riscv_vmaxu(vec_src1, vec_src2, vl), __riscv_vminu(vec_src1, vec_src2, vl), vl);
vec_max = __riscv_vmaxu_tu(vec_max, vec_max, vec_src, vl);
}
}
}
auto sc_max = __riscv_vmv_s_x_u8m1(0, vlmax);
sc_max = __riscv_vredmaxu(vec_max, sc_max, vlmax);
*result = __riscv_vmv_x(sc_max);
return CV_HAL_ERROR_OK;
}
inline int normDiffL1_8UC4(const uchar* src1, size_t src1_step, const uchar* src2, size_t src2_step, const uchar* mask, size_t mask_step, int width, int height, double* result)
{
int vlmax = __riscv_vsetvlmax_e8m2();
auto vec_sum = __riscv_vmv_v_x_u32m8(0, vlmax);
if (mask)
{
for (int i = 0; i < height; i++)
{
const uchar* src1_row = src1 + i * src1_step;
const uchar* src2_row = src2 + i * src2_step;
const uchar* mask_row = mask + i * mask_step;
int vl, vlm;
for (int j = 0, jm = 0; j < width * 4; j += vl, jm += vlm)
{
vl = __riscv_vsetvl_e8m2(width * 4 - j);
vlm = __riscv_vsetvl_e8mf2(width - jm);
auto vec_src1 = __riscv_vle8_v_u8m2(src1_row + j, vl);
auto vec_src2 = __riscv_vle8_v_u8m2(src2_row + j, vl);
auto vec_mask = __riscv_vle8_v_u8mf2(mask_row + jm, vlm);
auto vec_mask_ext = __riscv_vmul(__riscv_vzext_vf4(__riscv_vminu(vec_mask, 1, vlm), vlm), 0x01010101, vlm);
auto bool_mask_ext = __riscv_vmsne(__riscv_vreinterpret_u8m2(vec_mask_ext), 0, vl);
auto vec_src = __riscv_vsub_vv_u8m2_m(bool_mask_ext, __riscv_vmaxu_vv_u8m2_m(bool_mask_ext, vec_src1, vec_src2, vl),
__riscv_vminu_vv_u8m2_m(bool_mask_ext, vec_src1, vec_src2, vl), vl);
auto vec_zext = __riscv_vzext_vf4_u32m8_m(bool_mask_ext, vec_src, vl);
vec_sum = __riscv_vadd_tumu(bool_mask_ext, vec_sum, vec_sum, vec_zext, vl);
}
}
}
else
{
for (int i = 0; i < height; i++)
{
const uchar* src1_row = src1 + i * src1_step;
const uchar* src2_row = src2 + i * src2_step;
int vl;
for (int j = 0; j < width * 4; j += vl)
{
vl = __riscv_vsetvl_e8m2(width * 4 - j);
auto vec_src1 = __riscv_vle8_v_u8m2(src1_row + j, vl);
auto vec_src2 = __riscv_vle8_v_u8m2(src2_row + j, vl);
auto vec_src = __riscv_vsub(__riscv_vmaxu(vec_src1, vec_src2, vl), __riscv_vminu(vec_src1, vec_src2, vl), vl);
auto vec_zext = __riscv_vzext_vf4(vec_src, vl);
vec_sum = __riscv_vadd_tu(vec_sum, vec_sum, vec_zext, vl);
}
}
}
auto sc_sum = __riscv_vmv_s_x_u32m1(0, vlmax);
sc_sum = __riscv_vredsum(vec_sum, sc_sum, vlmax);
*result = __riscv_vmv_x(sc_sum);
return CV_HAL_ERROR_OK;
}
inline int normDiffL2Sqr_8UC4(const uchar* src1, size_t src1_step, const uchar* src2, size_t src2_step, const uchar* mask, size_t mask_step, int width, int height, double* result)
{
int vlmax = __riscv_vsetvlmax_e8m2();
auto vec_sum = __riscv_vmv_v_x_u32m8(0, vlmax);
int cnt = 0;
auto reduce = [&](int vl) {
if ((cnt += vl) < (1 << 16))
return;
cnt = vl;
for (int i = 0; i < vlmax; i++)
{
*result += __riscv_vmv_x(vec_sum);
vec_sum = __riscv_vslidedown(vec_sum, 1, vlmax);
}
vec_sum = __riscv_vmv_v_x_u32m8(0, vlmax);
};
*result = 0;
if (mask)
{
for (int i = 0; i < height; i++)
{
const uchar* src1_row = src1 + i * src1_step;
const uchar* src2_row = src2 + i * src2_step;
const uchar* mask_row = mask + i * mask_step;
int vl, vlm;
for (int j = 0, jm = 0; j < width * 4; j += vl, jm += vlm)
{
vl = __riscv_vsetvl_e8m2(width * 4 - j);
vlm = __riscv_vsetvl_e8mf2(width - jm);
reduce(vl);
auto vec_src1 = __riscv_vle8_v_u8m2(src1_row + j, vl);
auto vec_src2 = __riscv_vle8_v_u8m2(src2_row + j, vl);
auto vec_mask = __riscv_vle8_v_u8mf2(mask_row + jm, vlm);
auto vec_mask_ext = __riscv_vmul(__riscv_vzext_vf4(__riscv_vminu(vec_mask, 1, vlm), vlm), 0x01010101, vlm);
auto bool_mask_ext = __riscv_vmsne(__riscv_vreinterpret_u8m2(vec_mask_ext), 0, vl);
auto vec_src = __riscv_vsub_vv_u8m2_m(bool_mask_ext, __riscv_vmaxu_vv_u8m2_m(bool_mask_ext, vec_src1, vec_src2, vl),
__riscv_vminu_vv_u8m2_m(bool_mask_ext, vec_src1, vec_src2, vl), vl);
auto vec_mul = __riscv_vwmulu_vv_u16m4_m(bool_mask_ext, vec_src, vec_src, vl);
auto vec_zext = __riscv_vzext_vf2_u32m8_m(bool_mask_ext, vec_mul, vl);
vec_sum = __riscv_vadd_tumu(bool_mask_ext, vec_sum, vec_sum, vec_zext, vl);
}
}
}
else
{
for (int i = 0; i < height; i++)
{
const uchar* src1_row = src1 + i * src1_step;
const uchar* src2_row = src2 + i * src2_step;
int vl;
for (int j = 0; j < width * 4; j += vl)
{
vl = __riscv_vsetvl_e8m2(width * 4 - j);
reduce(vl);
auto vec_src1 = __riscv_vle8_v_u8m2(src1_row + j, vl);
auto vec_src2 = __riscv_vle8_v_u8m2(src2_row + j, vl);
auto vec_src = __riscv_vsub(__riscv_vmaxu(vec_src1, vec_src2, vl), __riscv_vminu(vec_src1, vec_src2, vl), vl);
auto vec_mul = __riscv_vwmulu(vec_src, vec_src, vl);
auto vec_zext = __riscv_vzext_vf2(vec_mul, vl);
vec_sum = __riscv_vadd_tu(vec_sum, vec_sum, vec_zext, vl);
}
}
}
reduce(1 << 16);
return CV_HAL_ERROR_OK;
}
inline int normDiffInf_32FC1(const uchar* src1, size_t src1_step, const uchar* src2, size_t src2_step, const uchar* mask, size_t mask_step, int width, int height, double* result)
{
int vlmax = __riscv_vsetvlmax_e32m8();
auto vec_max = __riscv_vfmv_v_f_f32m8(0, vlmax);
if (mask)
{
for (int i = 0; i < height; i++)
{
const float* src1_row = reinterpret_cast<const float*>(src1 + i * src1_step);
const float* src2_row = reinterpret_cast<const float*>(src2 + i * src2_step);
const uchar* mask_row = mask + i * mask_step;
int vl;
for (int j = 0; j < width; j += vl)
{
vl = __riscv_vsetvl_e32m8(width - j);
auto vec_src1 = __riscv_vle32_v_f32m8(src1_row + j, vl);
auto vec_src2 = __riscv_vle32_v_f32m8(src2_row + j, vl);
auto vec_mask = __riscv_vle8_v_u8m2(mask_row + j, vl);
auto bool_mask = __riscv_vmsne(vec_mask, 0, vl);
auto vec_src = __riscv_vfsub_vv_f32m8_m(bool_mask, vec_src1, vec_src2, vl);
auto vec_abs = __riscv_vfabs_v_f32m8_m(bool_mask, vec_src, vl);
vec_max = __riscv_vfmax_tumu(bool_mask, vec_max, vec_max, vec_abs, vl);
}
}
}
else
{
for (int i = 0; i < height; i++)
{
const float* src1_row = reinterpret_cast<const float*>(src1 + i * src1_step);
const float* src2_row = reinterpret_cast<const float*>(src2 + i * src2_step);
int vl;
for (int j = 0; j < width; j += vl)
{
vl = __riscv_vsetvl_e32m8(width - j);
auto vec_src1 = __riscv_vle32_v_f32m8(src1_row + j, vl);
auto vec_src2 = __riscv_vle32_v_f32m8(src2_row + j, vl);
auto vec_src = __riscv_vfsub(vec_src1, vec_src2, vl);
auto vec_abs = __riscv_vfabs(vec_src, vl);
vec_max = __riscv_vfmax_tu(vec_max, vec_max, vec_abs, vl);
}
}
}
auto sc_max = __riscv_vfmv_s_f_f32m1(0, vlmax);
sc_max = __riscv_vfredmax(vec_max, sc_max, vlmax);
*result = __riscv_vfmv_f(sc_max);
return CV_HAL_ERROR_OK;
}
inline int normDiffL1_32FC1(const uchar* src1, size_t src1_step, const uchar* src2, size_t src2_step, const uchar* mask, size_t mask_step, int width, int height, double* result)
{
int vlmax = __riscv_vsetvlmax_e32m4();
auto vec_sum = __riscv_vfmv_v_f_f64m8(0, vlmax);
if (mask)
{
for (int i = 0; i < height; i++)
{
const float* src1_row = reinterpret_cast<const float*>(src1 + i * src1_step);
const float* src2_row = reinterpret_cast<const float*>(src2 + i * src2_step);
const uchar* mask_row = mask + i * mask_step;
int vl;
for (int j = 0; j < width; j += vl)
{
vl = __riscv_vsetvl_e32m4(width - j);
auto vec_src1 = __riscv_vle32_v_f32m4(src1_row + j, vl);
auto vec_src2 = __riscv_vle32_v_f32m4(src2_row + j, vl);
auto vec_mask = __riscv_vle8_v_u8m1(mask_row + j, vl);
auto bool_mask = __riscv_vmsne(vec_mask, 0, vl);
auto vec_src = __riscv_vfsub_vv_f32m4_m(bool_mask, vec_src1, vec_src2, vl);
auto vec_abs = __riscv_vfabs_v_f32m4_m(bool_mask, vec_src, vl);
auto vec_fext = __riscv_vfwcvt_f_f_v_f64m8_m(bool_mask, vec_abs, vl);
vec_sum = __riscv_vfadd_tumu(bool_mask, vec_sum, vec_sum, vec_fext, vl);
}
}
}
else
{
for (int i = 0; i < height; i++)
{
const float* src1_row = reinterpret_cast<const float*>(src1 + i * src1_step);
const float* src2_row = reinterpret_cast<const float*>(src2 + i * src2_step);
int vl;
for (int j = 0; j < width; j += vl)
{
vl = __riscv_vsetvl_e32m4(width - j);
auto vec_src1 = __riscv_vle32_v_f32m4(src1_row + j, vl);
auto vec_src2 = __riscv_vle32_v_f32m4(src2_row + j, vl);
auto vec_src = __riscv_vfsub(vec_src1, vec_src2, vl);
auto vec_abs = __riscv_vfabs(vec_src, vl);
auto vec_fext = __riscv_vfwcvt_f_f_v_f64m8(vec_abs, vl);
vec_sum = __riscv_vfadd_tu(vec_sum, vec_sum, vec_fext, vl);
}
}
}
auto sc_sum = __riscv_vfmv_s_f_f64m1(0, vlmax);
sc_sum = __riscv_vfredosum(vec_sum, sc_sum, vlmax);
*result = __riscv_vfmv_f(sc_sum);
return CV_HAL_ERROR_OK;
}
inline int normDiffL2Sqr_32FC1(const uchar* src1, size_t src1_step, const uchar* src2, size_t src2_step, const uchar* mask, size_t mask_step, int width, int height, double* result)
{
int vlmax = __riscv_vsetvlmax_e32m4();
auto vec_sum = __riscv_vfmv_v_f_f64m8(0, vlmax);
if (mask)
{
for (int i = 0; i < height; i++)
{
const float* src1_row = reinterpret_cast<const float*>(src1 + i * src1_step);
const float* src2_row = reinterpret_cast<const float*>(src2 + i * src2_step);
const uchar* mask_row = mask + i * mask_step;
int vl;
for (int j = 0; j < width; j += vl)
{
vl = __riscv_vsetvl_e32m4(width - j);
auto vec_src1 = __riscv_vle32_v_f32m4(src1_row + j, vl);
auto vec_src2 = __riscv_vle32_v_f32m4(src2_row + j, vl);
auto vec_mask = __riscv_vle8_v_u8m1(mask_row + j, vl);
auto bool_mask = __riscv_vmsne(vec_mask, 0, vl);
auto vec_src = __riscv_vfsub_vv_f32m4_m(bool_mask, vec_src1, vec_src2, vl);
auto vec_mul = __riscv_vfwmul_vv_f64m8_m(bool_mask, vec_src, vec_src, vl);
vec_sum = __riscv_vfadd_tumu(bool_mask, vec_sum, vec_sum, vec_mul, vl);
}
}
}
else
{
for (int i = 0; i < height; i++)
{
const float* src1_row = reinterpret_cast<const float*>(src1 + i * src1_step);
const float* src2_row = reinterpret_cast<const float*>(src2 + i * src2_step);
int vl;
for (int j = 0; j < width; j += vl)
{
vl = __riscv_vsetvl_e32m4(width - j);
auto vec_src1 = __riscv_vle32_v_f32m4(src1_row + j, vl);
auto vec_src2 = __riscv_vle32_v_f32m4(src2_row + j, vl);
auto vec_src = __riscv_vfsub(vec_src1, vec_src2, vl);
auto vec_mul = __riscv_vfwmul(vec_src, vec_src, vl);
vec_sum = __riscv_vfadd_tu(vec_sum, vec_sum, vec_mul, vl);
}
}
}
auto sc_sum = __riscv_vfmv_s_f_f64m1(0, vlmax);
sc_sum = __riscv_vfredosum(vec_sum, sc_sum, vlmax);
*result = __riscv_vfmv_f(sc_sum);
return CV_HAL_ERROR_OK;
}
inline int normDiff(const uchar* src1, size_t src1_step, const uchar* src2, size_t src2_step, const uchar* mask,
size_t mask_step, int width, int height, int type, int norm_type, double* result)
{
if (!result)
return CV_HAL_ERROR_OK;
int ret;
switch (type)
{
case CV_8UC1:
switch (norm_type & ~NORM_RELATIVE)
{
case NORM_INF:
ret = normDiffInf_8UC1(src1, src1_step, src2, src2_step, mask, mask_step, width, height, result);
break;
case NORM_L1:
ret = normDiffL1_8UC1(src1, src1_step, src2, src2_step, mask, mask_step, width, height, result);
break;
case NORM_L2SQR:
ret = normDiffL2Sqr_8UC1(src1, src1_step, src2, src2_step, mask, mask_step, width, height, result);
break;
case NORM_L2:
ret = normDiffL2Sqr_8UC1(src1, src1_step, src2, src2_step, mask, mask_step, width, height, result);
*result = std::sqrt(*result);
break;
default:
ret = CV_HAL_ERROR_NOT_IMPLEMENTED;
}
break;
case CV_8UC4:
switch (norm_type & ~NORM_RELATIVE)
{
case NORM_INF:
ret = normDiffInf_8UC4(src1, src1_step, src2, src2_step, mask, mask_step, width, height, result);
break;
case NORM_L1:
ret = normDiffL1_8UC4(src1, src1_step, src2, src2_step, mask, mask_step, width, height, result);
break;
case NORM_L2SQR:
ret = normDiffL2Sqr_8UC4(src1, src1_step, src2, src2_step, mask, mask_step, width, height, result);
break;
case NORM_L2:
ret = normDiffL2Sqr_8UC4(src1, src1_step, src2, src2_step, mask, mask_step, width, height, result);
*result = std::sqrt(*result);
break;
default:
ret = CV_HAL_ERROR_NOT_IMPLEMENTED;
}
break;
case CV_32FC1:
switch (norm_type & ~NORM_RELATIVE)
{
case NORM_INF:
ret = normDiffInf_32FC1(src1, src1_step, src2, src2_step, mask, mask_step, width, height, result);
break;
case NORM_L1:
ret = normDiffL1_32FC1(src1, src1_step, src2, src2_step, mask, mask_step, width, height, result);
break;
case NORM_L2SQR:
ret = normDiffL2Sqr_32FC1(src1, src1_step, src2, src2_step, mask, mask_step, width, height, result);
break;
case NORM_L2:
ret = normDiffL2Sqr_32FC1(src1, src1_step, src2, src2_step, mask, mask_step, width, height, result);
*result = std::sqrt(*result);
break;
default:
ret = CV_HAL_ERROR_NOT_IMPLEMENTED;
}
break;
default:
ret = CV_HAL_ERROR_NOT_IMPLEMENTED;
}
if(ret == CV_HAL_ERROR_OK && (norm_type & NORM_RELATIVE))
{
double result_;
ret = cv::cv_hal_rvv::norm(src2, src2_step, mask, mask_step, width, height, type, norm_type & ~NORM_RELATIVE, &result_);
if(ret == CV_HAL_ERROR_OK)
{
*result /= result_ + DBL_EPSILON;
}
}
return ret;
}
}}
#endif

View File

@ -1,109 +0,0 @@
// This file is part of OpenCV project.
// It is subject to the license terms in the LICENSE file found in the top-level directory
// of this distribution and at http://opencv.org/license.html.
#ifndef OPENCV_HAL_RVV_071_HPP_INCLUDED
#define OPENCV_HAL_RVV_071_HPP_INCLUDED
#include <riscv_vector.h>
#include <limits>
namespace cv { namespace cv_hal_rvv {
#undef cv_hal_cvtBGRtoBGR
#define cv_hal_cvtBGRtoBGR cv::cv_hal_rvv::cvtBGRtoBGR
static const unsigned char index_array_32 [32]
{ 2, 1, 0, 3, 6, 5, 4, 7, 10, 9, 8, 11, 14, 13, 12, 15, 18, 17, 16, 19, 22, 21, 20, 23, 26, 25, 24, 27, 30, 29, 28, 31 };
static const unsigned char index_array_24 [24]
{ 2, 1, 0, 5, 4, 3, 8, 7, 6, 11, 10, 9, 14, 13, 12, 17, 16, 15, 20, 19, 18, 23, 22, 21 };
static void vBGRtoBGR(const unsigned char* src, unsigned char * dst, const unsigned char * index, int n, int scn, int dcn, int vsize_pixels, const int vsize)
{
vuint8m2_t vec_index = vle8_v_u8m2(index, vsize);
int i = 0;
for ( ; i <= n-vsize; i += vsize_pixels, src += vsize, dst += vsize)
{
vuint8m2_t vec_src = vle8_v_u8m2(src, vsize);
vuint8m2_t vec_dst = vrgather_vv_u8m2(vec_src, vec_index, vsize);
vse8_v_u8m2(dst, vec_dst, vsize);
}
for ( ; i < n; i++, src += scn, dst += dcn )
{
unsigned char t0 = src[0], t1 = src[1], t2 = src[2];
dst[2] = t0;
dst[1] = t1;
dst[0] = t2;
if(dcn == 4)
{
unsigned char d = src[3];
dst[3] = d;
}
}
}
static void sBGRtoBGR(const unsigned char* src, unsigned char * dst, int n, int scn, int dcn, int bi)
{
for (int i = 0; i < n; i++, src += scn, dst += dcn)
{
unsigned char t0 = src[0], t1 = src[1], t2 = src[2];
dst[bi ] = t0;
dst[1] = t1;
dst[bi^2] = t2;
if(dcn == 4)
{
unsigned char d = scn == 4 ? src[3] : std::numeric_limits<unsigned char>::max();
dst[3] = d;
}
}
}
static int cvtBGRtoBGR(const unsigned char * src_data, size_t src_step, unsigned char * dst_data, size_t dst_step, int width, int height, int depth, int scn, int dcn, bool swapBlue)
{
if (depth != CV_8U)
{
return CV_HAL_ERROR_NOT_IMPLEMENTED;
}
const int blueIdx = swapBlue ? 2 : 0;
if (scn == dcn)
{
if (!swapBlue)
{
return CV_HAL_ERROR_NOT_IMPLEMENTED;
}
const int vsize_pixels = 8;
if (scn == 4)
{
for (int i = 0; i < height; i++, src_data += src_step, dst_data += dst_step)
{
vBGRtoBGR(src_data, dst_data, index_array_32, width, scn, dcn, vsize_pixels, 32);
}
}
else
{
for (int i = 0; i < height; i++, src_data += src_step, dst_data += dst_step)
{
vBGRtoBGR(src_data, dst_data, index_array_24, width, scn, dcn, vsize_pixels, 24);
}
}
}
else
{
for (int i = 0; i < height; i++, src_data += src_step, dst_data += dst_step)
sBGRtoBGR(src_data, dst_data, width, scn, dcn, blueIdx);
}
return CV_HAL_ERROR_OK;
}
}}
#endif

View File

@ -18,7 +18,7 @@ if(CV_GCC AND NOT CMAKE_CXX_COMPILER_VERSION VERSION_LESS 13)
ocv_warnings_disable(CMAKE_C_FLAGS -Wstringop-overflow)
endif()
set(VERSION 3.0.3)
set(VERSION 3.1.0)
set(COPYRIGHT_YEAR "1991-2024")
string(REPLACE "." ";" VERSION_TRIPLET ${VERSION})
list(GET VERSION_TRIPLET 0 VERSION_MAJOR)
@ -203,7 +203,7 @@ check_type_size("size_t" SIZE_T)
check_type_size("unsigned long" UNSIGNED_LONG)
if(ENABLE_LIBJPEG_TURBO_SIMD)
add_subdirectory(src/simd)
add_subdirectory(simd)
if(NEON_INTRINSICS)
add_definitions(-DNEON_INTRINSICS)
endif()

View File

@ -94,7 +94,7 @@ intended solely for clarification.
The Modified (3-clause) BSD License
===================================
Copyright (C)2009-2023 D. R. Commander. All Rights Reserved.<br>
Copyright (C)2009-2024 D. R. Commander. All Rights Reserved.<br>
Copyright (C)2015 Viktor Szathmáry. All Rights Reserved.
Redistribution and use in source and binary forms, with or without

View File

@ -36,16 +36,18 @@ TO DO Plans for future IJG releases.
Other documentation files in the distribution are:
User documentation:
usage.txt Usage instructions for cjpeg, djpeg, jpegtran,
rdjpgcom, and wrjpgcom.
*.1 Unix-style man pages for programs (same info as usage.txt).
wizard.txt Advanced usage instructions for JPEG wizards only.
change.log Version-to-version change highlights.
doc/usage.txt Usage instructions for cjpeg, djpeg, jpegtran,
rdjpgcom, and wrjpgcom.
doc/*.1 Unix-style man pages for programs (same info as
usage.txt).
doc/wizard.txt Advanced usage instructions for JPEG wizards only.
doc/change.log Version-to-version change highlights.
Programmer and internal documentation:
libjpeg.txt How to use the JPEG library in your own programs.
example.c Sample code for calling the JPEG library.
structure.txt Overview of the JPEG library's internal structure.
coderules.txt Coding style rules --- please read if you contribute code.
doc/libjpeg.txt How to use the JPEG library in your own programs.
src/example.c Sample code for calling the JPEG library.
doc/structure.txt Overview of the JPEG library's internal structure.
doc/coderules.txt Coding style rules --- please read if you contribute
code.
Please read at least usage.txt. Some information can also be found in the JPEG
FAQ (Frequently Asked Questions) article. See ARCHIVE LOCATIONS below to find
@ -89,9 +91,9 @@ The library is intended to be reused in other applications.
In order to support file conversion and viewing software, we have included
considerable functionality beyond the bare JPEG coding/decoding capability;
for example, the color quantization modules are not strictly part of JPEG
decoding, but they are essential for output to colormapped file formats or
colormapped displays. These extra functions can be compiled out of the
library if not required for a particular application.
decoding, but they are essential for output to colormapped file formats. These
extra functions can be compiled out of the library if not required for a
particular application.
We have also included "jpegtran", a utility for lossless transcoding between
different JPEG processes, and "rdjpgcom" and "wrjpgcom", two simple

View File

@ -69,9 +69,12 @@ JPEG images:
generating planar YUV images and performing multiple simultaneous lossless
transforms on an image. The Java interface for libjpeg-turbo is written on
top of the TurboJPEG API. The TurboJPEG API is recommended for first-time
users of libjpeg-turbo. Refer to [tjexample.c](tjexample.c) and
[TJExample.java](java/TJExample.java) for examples of its usage and to
<http://libjpeg-turbo.org/Documentation/Documentation> for API documentation.
users of libjpeg-turbo. Refer to [tjcomp.c](src/tjcomp.c),
[tjdecomp.c](src/tjdecomp.c), [tjtran.c](src/tjtran.c),
[TJComp.java](java/TJComp.java), [TJDecomp.java](java/TJDecomp.java), and
[TJTran.java](java/TJTran.java) for examples of its usage and to
<https://libjpeg-turbo.org/Documentation/Documentation> for API
documentation.
- **libjpeg API**<br>
This is the de facto industry-standard API for compressing and decompressing
@ -79,8 +82,9 @@ JPEG images:
more powerful. The libjpeg API implementation in libjpeg-turbo is both
API/ABI-compatible and mathematically compatible with libjpeg v6b. It can
also optionally be configured to be API/ABI-compatible with libjpeg v7 and v8
(see below.) Refer to [cjpeg.c](cjpeg.c) and [djpeg.c](djpeg.c) for examples
of its usage and to [libjpeg.txt](libjpeg.txt) for API documentation.
(see below.) Refer to [cjpeg.c](src/cjpeg.c) and [djpeg.c](src/djpeg.c) for
examples of its usage and to [libjpeg.txt](doc/libjpeg.txt) for API
documentation.
There is no significant performance advantage to either API when both are used
to perform similar operations.
@ -132,9 +136,9 @@ extensions at compile time with:
#ifdef JCS_ALPHA_EXTENSIONS
[jcstest.c](jcstest.c), located in the libjpeg-turbo source tree, demonstrates
how to check for the existence of the colorspace extensions at compile time and
run time.
[jcstest.c](src/jcstest.c), located in the libjpeg-turbo source tree,
demonstrates how to check for the existence of the colorspace extensions at
compile time and run time.
libjpeg v7 and v8 API/ABI Emulation
-----------------------------------
@ -199,7 +203,7 @@ supported and which aren't.
NOTE: As of this writing, extensive research has been conducted into the
usefulness of DCT scaling as a means of data reduction and SmartScale as a
means of quality improvement. Readers are invited to peruse the research at
<http://www.libjpeg-turbo.org/About/SmartScale> and draw their own conclusions,
<https://libjpeg-turbo.org/About/SmartScale> and draw their own conclusions,
but it is the general belief of our project that these features have not
demonstrated sufficient usefulness to justify inclusion in libjpeg-turbo.

View File

@ -273,48 +273,33 @@ endif()
check_c_source_compiles("
#include <arm_neon.h>
int main(int argc, char **argv) {
int16_t input[] = {
(int16_t)argc, (int16_t)argc, (int16_t)argc, (int16_t)argc,
(int16_t)argc, (int16_t)argc, (int16_t)argc, (int16_t)argc,
(int16_t)argc, (int16_t)argc, (int16_t)argc, (int16_t)argc
};
int16x4x3_t output = vld1_s16_x3(input);
int16_t input[12];
int16x4x3_t output;
int i;
for (i = 0; i < 12; i++) input[i] = (int16_t)argc;
output = vld1_s16_x3(input);
vst3_s16(input, output);
return (int)input[0];
}" HAVE_VLD1_S16_X3)
check_c_source_compiles("
#include <arm_neon.h>
int main(int argc, char **argv) {
uint16_t input[] = {
(uint16_t)argc, (uint16_t)argc, (uint16_t)argc, (uint16_t)argc,
(uint16_t)argc, (uint16_t)argc, (uint16_t)argc, (uint16_t)argc
};
uint16x4x2_t output = vld1_u16_x2(input);
uint16_t input[8];
uint16x4x2_t output;
int i;
for (i = 0; i < 8; i++) input[i] = (uint16_t)argc;
output = vld1_u16_x2(input);
vst2_u16(input, output);
return (int)input[0];
}" HAVE_VLD1_U16_X2)
check_c_source_compiles("
#include <arm_neon.h>
int main(int argc, char **argv) {
uint8_t input[] = {
(uint8_t)argc, (uint8_t)argc, (uint8_t)argc, (uint8_t)argc,
(uint8_t)argc, (uint8_t)argc, (uint8_t)argc, (uint8_t)argc,
(uint8_t)argc, (uint8_t)argc, (uint8_t)argc, (uint8_t)argc,
(uint8_t)argc, (uint8_t)argc, (uint8_t)argc, (uint8_t)argc,
(uint8_t)argc, (uint8_t)argc, (uint8_t)argc, (uint8_t)argc,
(uint8_t)argc, (uint8_t)argc, (uint8_t)argc, (uint8_t)argc,
(uint8_t)argc, (uint8_t)argc, (uint8_t)argc, (uint8_t)argc,
(uint8_t)argc, (uint8_t)argc, (uint8_t)argc, (uint8_t)argc,
(uint8_t)argc, (uint8_t)argc, (uint8_t)argc, (uint8_t)argc,
(uint8_t)argc, (uint8_t)argc, (uint8_t)argc, (uint8_t)argc,
(uint8_t)argc, (uint8_t)argc, (uint8_t)argc, (uint8_t)argc,
(uint8_t)argc, (uint8_t)argc, (uint8_t)argc, (uint8_t)argc,
(uint8_t)argc, (uint8_t)argc, (uint8_t)argc, (uint8_t)argc,
(uint8_t)argc, (uint8_t)argc, (uint8_t)argc, (uint8_t)argc,
(uint8_t)argc, (uint8_t)argc, (uint8_t)argc, (uint8_t)argc,
(uint8_t)argc, (uint8_t)argc, (uint8_t)argc, (uint8_t)argc
};
uint8x16x4_t output = vld1q_u8_x4(input);
uint8_t input[64];
uint8x16x4_t output;
int i;
for (i = 0; i < 64; i++) input[i] = (uint8_t)argc;
output = vld1q_u8_x4(input);
vst4q_u8(input, output);
return (int)input[0];
}" HAVE_VLD1Q_U8_X4)
@ -369,7 +354,8 @@ if(NOT NEON_INTRINSICS)
separate_arguments(CMAKE_ASM_FLAGS_SEP UNIX_COMMAND "${CMAKE_ASM_FLAGS}")
execute_process(COMMAND ${CMAKE_ASM_COMPILER} ${CMAKE_ASM_FLAGS_SEP}
-x assembler-with-cpp -c ${CMAKE_CURRENT_BINARY_DIR}/gastest.S
RESULT_VARIABLE RESULT OUTPUT_VARIABLE OUTPUT ERROR_VARIABLE ERROR)
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR} RESULT_VARIABLE RESULT
OUTPUT_VARIABLE OUTPUT ERROR_VARIABLE ERROR)
if(NOT RESULT EQUAL 0)
message(WARNING "GAS appears to be broken. Using the full Neon SIMD intrinsics implementation.")
set(NEON_INTRINSICS 1 CACHE INTERNAL "" FORCE)

View File

@ -2,6 +2,7 @@
* jchuff-neon.c - Huffman entropy encoding (32-bit Arm Neon)
*
* Copyright (C) 2020, Arm Limited. All Rights Reserved.
* Copyright (C) 2024, D. R. Commander. All Rights Reserved.
*
* This software is provided 'as-is', without any express or implied
* warranty. In no event will the authors be held liable for any damages
@ -24,11 +25,11 @@
*/
#define JPEG_INTERNALS
#include "../../../jinclude.h"
#include "../../../jpeglib.h"
#include "../../../jsimd.h"
#include "../../../jdct.h"
#include "../../../jsimddct.h"
#include "../../../src/jinclude.h"
#include "../../../src/jpeglib.h"
#include "../../../src/jsimd.h"
#include "../../../src/jdct.h"
#include "../../../src/jsimddct.h"
#include "../../jsimd.h"
#include "../jchuff.h"
#include "neon-compat.h"

View File

@ -3,7 +3,7 @@
*
* Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB
* Copyright (C) 2011, Nokia Corporation and/or its subsidiary(-ies).
* Copyright (C) 2009-2011, 2013-2014, 2016, 2018, 2022, D. R. Commander.
* Copyright (C) 2009-2011, 2013-2014, 2016, 2018, 2022, 2024, D. R. Commander.
* Copyright (C) 2015-2016, 2018, 2022, Matthieu Darbois.
* Copyright (C) 2019, Google LLC.
* Copyright (C) 2020, Arm Limited.
@ -18,11 +18,11 @@
*/
#define JPEG_INTERNALS
#include "../../../jinclude.h"
#include "../../../jpeglib.h"
#include "../../../jsimd.h"
#include "../../../jdct.h"
#include "../../../jsimddct.h"
#include "../../../src/jinclude.h"
#include "../../../src/jpeglib.h"
#include "../../../src/jsimd.h"
#include "../../../src/jdct.h"
#include "../../../src/jsimddct.h"
#include "../../jsimd.h"
#include <ctype.h>

View File

@ -2,7 +2,7 @@
* jchuff-neon.c - Huffman entropy encoding (64-bit Arm Neon)
*
* Copyright (C) 2020-2021, Arm Limited. All Rights Reserved.
* Copyright (C) 2020, 2022, D. R. Commander. All Rights Reserved.
* Copyright (C) 2020, 2022, 2024, D. R. Commander. All Rights Reserved.
*
* This software is provided 'as-is', without any express or implied
* warranty. In no event will the authors be held liable for any damages
@ -25,11 +25,11 @@
*/
#define JPEG_INTERNALS
#include "../../../jinclude.h"
#include "../../../jpeglib.h"
#include "../../../jsimd.h"
#include "../../../jdct.h"
#include "../../../jsimddct.h"
#include "../../../src/jinclude.h"
#include "../../../src/jpeglib.h"
#include "../../../src/jsimd.h"
#include "../../../src/jdct.h"
#include "../../../src/jsimddct.h"
#include "../../jsimd.h"
#include "../align.h"
#include "../jchuff.h"

View File

@ -3,7 +3,8 @@
*
* Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB
* Copyright (C) 2011, Nokia Corporation and/or its subsidiary(-ies).
* Copyright (C) 2009-2011, 2013-2014, 2016, 2018, 2020, 2022, D. R. Commander.
* Copyright (C) 2009-2011, 2013-2014, 2016, 2018, 2020, 2022, 2024,
* D. R. Commander.
* Copyright (C) 2015-2016, 2018, 2022, Matthieu Darbois.
* Copyright (C) 2020, Arm Limited.
*
@ -17,11 +18,11 @@
*/
#define JPEG_INTERNALS
#include "../../../jinclude.h"
#include "../../../jpeglib.h"
#include "../../../jsimd.h"
#include "../../../jdct.h"
#include "../../../jsimddct.h"
#include "../../../src/jinclude.h"
#include "../../../src/jpeglib.h"
#include "../../../src/jsimd.h"
#include "../../../src/jdct.h"
#include "../../../src/jsimddct.h"
#include "../../jsimd.h"
#include <ctype.h>

View File

@ -2,7 +2,7 @@
* jccolor-neon.c - colorspace conversion (Arm Neon)
*
* Copyright (C) 2020, Arm Limited. All Rights Reserved.
* Copyright (C) 2020, D. R. Commander. All Rights Reserved.
* Copyright (C) 2020, 2024, D. R. Commander. All Rights Reserved.
*
* This software is provided 'as-is', without any express or implied
* warranty. In no event will the authors be held liable for any damages
@ -22,11 +22,11 @@
*/
#define JPEG_INTERNALS
#include "../../jinclude.h"
#include "../../jpeglib.h"
#include "../../jsimd.h"
#include "../../jdct.h"
#include "../../jsimddct.h"
#include "../../src/jinclude.h"
#include "../../src/jpeglib.h"
#include "../../src/jsimd.h"
#include "../../src/jdct.h"
#include "../../src/jsimddct.h"
#include "../jsimd.h"
#include "align.h"
#include "neon-compat.h"

View File

@ -2,6 +2,7 @@
* jcgray-neon.c - grayscale colorspace conversion (Arm Neon)
*
* Copyright (C) 2020, Arm Limited. All Rights Reserved.
* Copyright (C) 2024, D. R. Commander. All Rights Reserved.
*
* This software is provided 'as-is', without any express or implied
* warranty. In no event will the authors be held liable for any damages
@ -21,13 +22,14 @@
*/
#define JPEG_INTERNALS
#include "../../jinclude.h"
#include "../../jpeglib.h"
#include "../../jsimd.h"
#include "../../jdct.h"
#include "../../jsimddct.h"
#include "../../src/jinclude.h"
#include "../../src/jpeglib.h"
#include "../../src/jsimd.h"
#include "../../src/jdct.h"
#include "../../src/jsimddct.h"
#include "../jsimd.h"
#include "align.h"
#include "neon-compat.h"
#include <arm_neon.h>

View File

@ -3,7 +3,7 @@
*
* Copyright (C) 2020-2021, Arm Limited. All Rights Reserved.
* Copyright (C) 2022, Matthieu Darbois. All Rights Reserved.
* Copyright (C) 2022, D. R. Commander. All Rights Reserved.
* Copyright (C) 2022, 2024, D. R. Commander. All Rights Reserved.
*
* This software is provided 'as-is', without any express or implied
* warranty. In no event will the authors be held liable for any damages
@ -23,11 +23,11 @@
*/
#define JPEG_INTERNALS
#include "../../jinclude.h"
#include "../../jpeglib.h"
#include "../../jsimd.h"
#include "../../jdct.h"
#include "../../jsimddct.h"
#include "../../src/jinclude.h"
#include "../../src/jpeglib.h"
#include "../../src/jsimd.h"
#include "../../src/jdct.h"
#include "../../src/jsimddct.h"
#include "../jsimd.h"
#include "neon-compat.h"

View File

@ -2,6 +2,7 @@
* jcsample-neon.c - downsampling (Arm Neon)
*
* Copyright (C) 2020, Arm Limited. All Rights Reserved.
* Copyright (C) 2024, D. R. Commander. All Rights Reserved.
*
* This software is provided 'as-is', without any express or implied
* warranty. In no event will the authors be held liable for any damages
@ -21,13 +22,14 @@
*/
#define JPEG_INTERNALS
#include "../../jinclude.h"
#include "../../jpeglib.h"
#include "../../jsimd.h"
#include "../../jdct.h"
#include "../../jsimddct.h"
#include "../../src/jinclude.h"
#include "../../src/jpeglib.h"
#include "../../src/jsimd.h"
#include "../../src/jdct.h"
#include "../../src/jsimddct.h"
#include "../jsimd.h"
#include "align.h"
#include "neon-compat.h"
#include <arm_neon.h>

View File

@ -2,6 +2,7 @@
* jdcolor-neon.c - colorspace conversion (Arm Neon)
*
* Copyright (C) 2020, Arm Limited. All Rights Reserved.
* Copyright (C) 2024, D. R. Commander. All Rights Reserved.
*
* This software is provided 'as-is', without any express or implied
* warranty. In no event will the authors be held liable for any damages
@ -21,13 +22,14 @@
*/
#define JPEG_INTERNALS
#include "../../jinclude.h"
#include "../../jpeglib.h"
#include "../../jsimd.h"
#include "../../jdct.h"
#include "../../jsimddct.h"
#include "../../src/jinclude.h"
#include "../../src/jpeglib.h"
#include "../../src/jsimd.h"
#include "../../src/jdct.h"
#include "../../src/jsimddct.h"
#include "../jsimd.h"
#include "align.h"
#include "neon-compat.h"
#include <arm_neon.h>

View File

@ -2,6 +2,7 @@
* jdmerge-neon.c - merged upsampling/color conversion (Arm Neon)
*
* Copyright (C) 2020, Arm Limited. All Rights Reserved.
* Copyright (C) 2024, D. R. Commander. All Rights Reserved.
*
* This software is provided 'as-is', without any express or implied
* warranty. In no event will the authors be held liable for any damages
@ -21,13 +22,14 @@
*/
#define JPEG_INTERNALS
#include "../../jinclude.h"
#include "../../jpeglib.h"
#include "../../jsimd.h"
#include "../../jdct.h"
#include "../../jsimddct.h"
#include "../../src/jinclude.h"
#include "../../src/jpeglib.h"
#include "../../src/jsimd.h"
#include "../../src/jdct.h"
#include "../../src/jsimddct.h"
#include "../jsimd.h"
#include "align.h"
#include "neon-compat.h"
#include <arm_neon.h>

View File

@ -2,7 +2,7 @@
* jdsample-neon.c - upsampling (Arm Neon)
*
* Copyright (C) 2020, Arm Limited. All Rights Reserved.
* Copyright (C) 2020, D. R. Commander. All Rights Reserved.
* Copyright (C) 2020, 2024, D. R. Commander. All Rights Reserved.
*
* This software is provided 'as-is', without any express or implied
* warranty. In no event will the authors be held liable for any damages
@ -22,12 +22,13 @@
*/
#define JPEG_INTERNALS
#include "../../jinclude.h"
#include "../../jpeglib.h"
#include "../../jsimd.h"
#include "../../jdct.h"
#include "../../jsimddct.h"
#include "../../src/jinclude.h"
#include "../../src/jpeglib.h"
#include "../../src/jsimd.h"
#include "../../src/jdct.h"
#include "../../src/jsimddct.h"
#include "../jsimd.h"
#include "neon-compat.h"
#include <arm_neon.h>

View File

@ -2,6 +2,7 @@
* jfdctfst-neon.c - fast integer FDCT (Arm Neon)
*
* Copyright (C) 2020, Arm Limited. All Rights Reserved.
* Copyright (C) 2024, D. R. Commander. All Rights Reserved.
*
* This software is provided 'as-is', without any express or implied
* warranty. In no event will the authors be held liable for any damages
@ -21,13 +22,14 @@
*/
#define JPEG_INTERNALS
#include "../../jinclude.h"
#include "../../jpeglib.h"
#include "../../jsimd.h"
#include "../../jdct.h"
#include "../../jsimddct.h"
#include "../../src/jinclude.h"
#include "../../src/jpeglib.h"
#include "../../src/jsimd.h"
#include "../../src/jdct.h"
#include "../../src/jsimddct.h"
#include "../jsimd.h"
#include "align.h"
#include "neon-compat.h"
#include <arm_neon.h>

View File

@ -2,7 +2,7 @@
* jfdctint-neon.c - accurate integer FDCT (Arm Neon)
*
* Copyright (C) 2020, Arm Limited. All Rights Reserved.
* Copyright (C) 2020, D. R. Commander. All Rights Reserved.
* Copyright (C) 2020, 2024, D. R. Commander. All Rights Reserved.
*
* This software is provided 'as-is', without any express or implied
* warranty. In no event will the authors be held liable for any damages
@ -22,11 +22,11 @@
*/
#define JPEG_INTERNALS
#include "../../jinclude.h"
#include "../../jpeglib.h"
#include "../../jsimd.h"
#include "../../jdct.h"
#include "../../jsimddct.h"
#include "../../src/jinclude.h"
#include "../../src/jpeglib.h"
#include "../../src/jsimd.h"
#include "../../src/jdct.h"
#include "../../src/jsimddct.h"
#include "../jsimd.h"
#include "align.h"
#include "neon-compat.h"

View File

@ -2,6 +2,7 @@
* jidctfst-neon.c - fast integer IDCT (Arm Neon)
*
* Copyright (C) 2020, Arm Limited. All Rights Reserved.
* Copyright (C) 2024, D. R. Commander. All Rights Reserved.
*
* This software is provided 'as-is', without any express or implied
* warranty. In no event will the authors be held liable for any damages
@ -21,13 +22,14 @@
*/
#define JPEG_INTERNALS
#include "../../jinclude.h"
#include "../../jpeglib.h"
#include "../../jsimd.h"
#include "../../jdct.h"
#include "../../jsimddct.h"
#include "../../src/jinclude.h"
#include "../../src/jpeglib.h"
#include "../../src/jsimd.h"
#include "../../src/jdct.h"
#include "../../src/jsimddct.h"
#include "../jsimd.h"
#include "align.h"
#include "neon-compat.h"
#include <arm_neon.h>

View File

@ -2,7 +2,7 @@
* jidctint-neon.c - accurate integer IDCT (Arm Neon)
*
* Copyright (C) 2020, Arm Limited. All Rights Reserved.
* Copyright (C) 2020, D. R. Commander. All Rights Reserved.
* Copyright (C) 2020, 2024, D. R. Commander. All Rights Reserved.
*
* This software is provided 'as-is', without any express or implied
* warranty. In no event will the authors be held liable for any damages
@ -22,11 +22,11 @@
*/
#define JPEG_INTERNALS
#include "../../jinclude.h"
#include "../../jpeglib.h"
#include "../../jsimd.h"
#include "../../jdct.h"
#include "../../jsimddct.h"
#include "../../src/jinclude.h"
#include "../../src/jpeglib.h"
#include "../../src/jsimd.h"
#include "../../src/jdct.h"
#include "../../src/jsimddct.h"
#include "../jsimd.h"
#include "align.h"
#include "neon-compat.h"

View File

@ -2,7 +2,7 @@
* jidctred-neon.c - reduced-size IDCT (Arm Neon)
*
* Copyright (C) 2020, Arm Limited. All Rights Reserved.
* Copyright (C) 2020, D. R. Commander. All Rights Reserved.
* Copyright (C) 2020, 2024, D. R. Commander. All Rights Reserved.
*
* This software is provided 'as-is', without any express or implied
* warranty. In no event will the authors be held liable for any damages
@ -22,11 +22,11 @@
*/
#define JPEG_INTERNALS
#include "../../jinclude.h"
#include "../../jpeglib.h"
#include "../../jsimd.h"
#include "../../jdct.h"
#include "../../jsimddct.h"
#include "../../src/jinclude.h"
#include "../../src/jpeglib.h"
#include "../../src/jsimd.h"
#include "../../src/jdct.h"
#include "../../src/jsimddct.h"
#include "../jsimd.h"
#include "align.h"
#include "neon-compat.h"

View File

@ -2,6 +2,7 @@
* jquanti-neon.c - sample data conversion and quantization (Arm Neon)
*
* Copyright (C) 2020-2021, Arm Limited. All Rights Reserved.
* Copyright (C) 2024, D. R. Commander. All Rights Reserved.
*
* This software is provided 'as-is', without any express or implied
* warranty. In no event will the authors be held liable for any damages
@ -21,12 +22,13 @@
*/
#define JPEG_INTERNALS
#include "../../jinclude.h"
#include "../../jpeglib.h"
#include "../../jsimd.h"
#include "../../jdct.h"
#include "../../jsimddct.h"
#include "../../src/jinclude.h"
#include "../../src/jpeglib.h"
#include "../../src/jsimd.h"
#include "../../src/jdct.h"
#include "../../src/jsimddct.h"
#include "../jsimd.h"
#include "neon-compat.h"
#include <arm_neon.h>

View File

@ -1,5 +1,5 @@
/*
* Copyright (C) 2020, D. R. Commander. All Rights Reserved.
* Copyright (C) 2020, 2024, D. R. Commander. All Rights Reserved.
* Copyright (C) 2020-2021, Arm Limited. All Rights Reserved.
*
* This software is provided 'as-is', without any express or implied
@ -35,3 +35,11 @@
#else
#error "Unknown compiler"
#endif
#if defined(__clang__)
#pragma clang diagnostic ignored "-Wdeclaration-after-statement"
#pragma clang diagnostic ignored "-Wc99-extensions"
#elif defined(__GNUC__)
#pragma GCC diagnostic ignored "-Wdeclaration-after-statement"
#pragma GCC diagnostic ignored "-Wpedantic"
#endif

View File

@ -8,11 +8,7 @@
; Copyright (C) 1999-2006, MIYASAKA Masaru.
; For conditions of distribution and use, see copyright notice in jsimdext.inc
;
; This file should be assembled with NASM (Netwide Assembler),
; can *not* be assembled with Microsoft's MASM or any compatible
; assembler (including Borland's Turbo Assembler).
; NASM is available from http://nasm.sourceforge.net/ or
; http://sourceforge.net/project/showfiles.php?group_id=6208
; This file should be assembled with NASM (Netwide Assembler) or Yasm.
%include "jcolsamp.inc"

View File

@ -8,11 +8,7 @@
; Copyright (C) 1999-2006, MIYASAKA Masaru.
; For conditions of distribution and use, see copyright notice in jsimdext.inc
;
; This file should be assembled with NASM (Netwide Assembler),
; can *not* be assembled with Microsoft's MASM or any compatible
; assembler (including Borland's Turbo Assembler).
; NASM is available from http://nasm.sourceforge.net/ or
; http://sourceforge.net/project/showfiles.php?group_id=6208
; This file should be assembled with NASM (Netwide Assembler) or Yasm.
%include "jcolsamp.inc"

View File

@ -7,11 +7,7 @@
; Copyright (C) 1999-2006, MIYASAKA Masaru.
; For conditions of distribution and use, see copyright notice in jsimdext.inc
;
; This file should be assembled with NASM (Netwide Assembler),
; can *not* be assembled with Microsoft's MASM or any compatible
; assembler (including Borland's Turbo Assembler).
; NASM is available from http://nasm.sourceforge.net/ or
; http://sourceforge.net/project/showfiles.php?group_id=6208
; This file should be assembled with NASM (Netwide Assembler) or Yasm.
%include "jcolsamp.inc"

View File

@ -8,11 +8,7 @@
; Copyright (C) 1999-2006, MIYASAKA Masaru.
; For conditions of distribution and use, see copyright notice in jsimdext.inc
;
; This file should be assembled with NASM (Netwide Assembler),
; can *not* be assembled with Microsoft's MASM or any compatible
; assembler (including Borland's Turbo Assembler).
; NASM is available from http://nasm.sourceforge.net/ or
; http://sourceforge.net/project/showfiles.php?group_id=6208
; This file should be assembled with NASM (Netwide Assembler) or Yasm.
%include "jsimdext.inc"

View File

@ -8,11 +8,7 @@
; Copyright (C) 1999-2006, MIYASAKA Masaru.
; For conditions of distribution and use, see copyright notice in jsimdext.inc
;
; This file should be assembled with NASM (Netwide Assembler),
; can *not* be assembled with Microsoft's MASM or any compatible
; assembler (including Borland's Turbo Assembler).
; NASM is available from http://nasm.sourceforge.net/ or
; http://sourceforge.net/project/showfiles.php?group_id=6208
; This file should be assembled with NASM (Netwide Assembler) or Yasm.
%include "jsimdext.inc"

View File

@ -7,11 +7,7 @@
; Copyright (C) 1999-2006, MIYASAKA Masaru.
; For conditions of distribution and use, see copyright notice in jsimdext.inc
;
; This file should be assembled with NASM (Netwide Assembler),
; can *not* be assembled with Microsoft's MASM or any compatible
; assembler (including Borland's Turbo Assembler).
; NASM is available from http://nasm.sourceforge.net/ or
; http://sourceforge.net/project/showfiles.php?group_id=6208
; This file should be assembled with NASM (Netwide Assembler) or Yasm.
%include "jsimdext.inc"

View File

@ -8,11 +8,7 @@
; Copyright (C) 1999-2006, MIYASAKA Masaru.
; For conditions of distribution and use, see copyright notice in jsimdext.inc
;
; This file should be assembled with NASM (Netwide Assembler),
; can *not* be assembled with Microsoft's MASM or any compatible
; assembler (including Borland's Turbo Assembler).
; NASM is available from http://nasm.sourceforge.net/ or
; http://sourceforge.net/project/showfiles.php?group_id=6208
; This file should be assembled with NASM (Netwide Assembler) or Yasm.
%include "jsimdext.inc"

View File

@ -8,11 +8,7 @@
; Copyright (C) 1999-2006, MIYASAKA Masaru.
; For conditions of distribution and use, see copyright notice in jsimdext.inc
;
; This file should be assembled with NASM (Netwide Assembler),
; can *not* be assembled with Microsoft's MASM or any compatible
; assembler (including Borland's Turbo Assembler).
; NASM is available from http://nasm.sourceforge.net/ or
; http://sourceforge.net/project/showfiles.php?group_id=6208
; This file should be assembled with NASM (Netwide Assembler) or Yasm.
%include "jsimdext.inc"

View File

@ -7,11 +7,7 @@
; Copyright (C) 1999-2006, MIYASAKA Masaru.
; For conditions of distribution and use, see copyright notice in jsimdext.inc
;
; This file should be assembled with NASM (Netwide Assembler),
; can *not* be assembled with Microsoft's MASM or any compatible
; assembler (including Borland's Turbo Assembler).
; NASM is available from http://nasm.sourceforge.net/ or
; http://sourceforge.net/project/showfiles.php?group_id=6208
; This file should be assembled with NASM (Netwide Assembler) or Yasm.
%include "jsimdext.inc"

View File

@ -8,11 +8,7 @@
; Copyright (C) 1999-2006, MIYASAKA Masaru.
; For conditions of distribution and use, see copyright notice in jsimdext.inc
;
; This file should be assembled with NASM (Netwide Assembler),
; can *not* be assembled with Microsoft's MASM or any compatible
; assembler (including Borland's Turbo Assembler).
; NASM is available from http://nasm.sourceforge.net/ or
; http://sourceforge.net/project/showfiles.php?group_id=6208
; This file should be assembled with NASM (Netwide Assembler) or Yasm.
%include "jcolsamp.inc"

View File

@ -8,11 +8,7 @@
; Copyright (C) 1999-2006, MIYASAKA Masaru.
; For conditions of distribution and use, see copyright notice in jsimdext.inc
;
; This file should be assembled with NASM (Netwide Assembler),
; can *not* be assembled with Microsoft's MASM or any compatible
; assembler (including Borland's Turbo Assembler).
; NASM is available from http://nasm.sourceforge.net/ or
; http://sourceforge.net/project/showfiles.php?group_id=6208
; This file should be assembled with NASM (Netwide Assembler) or Yasm.
%include "jcolsamp.inc"

View File

@ -7,11 +7,7 @@
; Copyright (C) 1999-2006, MIYASAKA Masaru.
; For conditions of distribution and use, see copyright notice in jsimdext.inc
;
; This file should be assembled with NASM (Netwide Assembler),
; can *not* be assembled with Microsoft's MASM or any compatible
; assembler (including Borland's Turbo Assembler).
; NASM is available from http://nasm.sourceforge.net/ or
; http://sourceforge.net/project/showfiles.php?group_id=6208
; This file should be assembled with NASM (Netwide Assembler) or Yasm.
%include "jcolsamp.inc"

View File

@ -9,11 +9,7 @@
; Copyright (C) 1999-2006, MIYASAKA Masaru.
; For conditions of distribution and use, see copyright notice in jsimdext.inc
;
; This file should be assembled with NASM (Netwide Assembler),
; can *not* be assembled with Microsoft's MASM or any compatible
; assembler (including Borland's Turbo Assembler).
; NASM is available from http://nasm.sourceforge.net/ or
; http://sourceforge.net/project/showfiles.php?group_id=6208
; This file should be assembled with NASM (Netwide Assembler) or Yasm.
;
; This file contains an SSE2 implementation for Huffman coding of one block.
; The following code is based on jchuff.c; see jchuff.c for more details.

View File

@ -7,11 +7,7 @@
; Copyright (C) 1999-2006, MIYASAKA Masaru.
; For conditions of distribution and use, see copyright notice in jsimdext.inc
;
; This file should be assembled with NASM (Netwide Assembler),
; can *not* be assembled with Microsoft's MASM or any compatible
; assembler (including Borland's Turbo Assembler).
; NASM is available from http://nasm.sourceforge.net/ or
; http://sourceforge.net/project/showfiles.php?group_id=6208
; This file should be assembled with NASM (Netwide Assembler) or Yasm.
;
; This file contains an SSE2 implementation of data preparation for progressive
; Huffman encoding. See jcphuff.c for more details.

View File

@ -9,11 +9,7 @@
; Copyright (C) 1999-2006, MIYASAKA Masaru.
; For conditions of distribution and use, see copyright notice in jsimdext.inc
;
; This file should be assembled with NASM (Netwide Assembler),
; can *not* be assembled with Microsoft's MASM or any compatible
; assembler (including Borland's Turbo Assembler).
; NASM is available from http://nasm.sourceforge.net/ or
; http://sourceforge.net/project/showfiles.php?group_id=6208
; This file should be assembled with NASM (Netwide Assembler) or Yasm.
%include "jsimdext.inc"

View File

@ -8,11 +8,7 @@
; Copyright (C) 1999-2006, MIYASAKA Masaru.
; For conditions of distribution and use, see copyright notice in jsimdext.inc
;
; This file should be assembled with NASM (Netwide Assembler),
; can *not* be assembled with Microsoft's MASM or any compatible
; assembler (including Borland's Turbo Assembler).
; NASM is available from http://nasm.sourceforge.net/ or
; http://sourceforge.net/project/showfiles.php?group_id=6208
; This file should be assembled with NASM (Netwide Assembler) or Yasm.
%include "jsimdext.inc"

View File

@ -8,11 +8,7 @@
; Copyright (C) 1999-2006, MIYASAKA Masaru.
; For conditions of distribution and use, see copyright notice in jsimdext.inc
;
; This file should be assembled with NASM (Netwide Assembler),
; can *not* be assembled with Microsoft's MASM or any compatible
; assembler (including Borland's Turbo Assembler).
; NASM is available from http://nasm.sourceforge.net/ or
; http://sourceforge.net/project/showfiles.php?group_id=6208
; This file should be assembled with NASM (Netwide Assembler) or Yasm.
%include "jsimdext.inc"

View File

@ -9,11 +9,7 @@
; Copyright (C) 1999-2006, MIYASAKA Masaru.
; For conditions of distribution and use, see copyright notice in jsimdext.inc
;
; This file should be assembled with NASM (Netwide Assembler),
; can *not* be assembled with Microsoft's MASM or any compatible
; assembler (including Borland's Turbo Assembler).
; NASM is available from http://nasm.sourceforge.net/ or
; http://sourceforge.net/project/showfiles.php?group_id=6208
; This file should be assembled with NASM (Netwide Assembler) or Yasm.
%include "jcolsamp.inc"

View File

@ -8,11 +8,7 @@
; Copyright (C) 1999-2006, MIYASAKA Masaru.
; For conditions of distribution and use, see copyright notice in jsimdext.inc
;
; This file should be assembled with NASM (Netwide Assembler),
; can *not* be assembled with Microsoft's MASM or any compatible
; assembler (including Borland's Turbo Assembler).
; NASM is available from http://nasm.sourceforge.net/ or
; http://sourceforge.net/project/showfiles.php?group_id=6208
; This file should be assembled with NASM (Netwide Assembler) or Yasm.
%include "jcolsamp.inc"

View File

@ -8,11 +8,7 @@
; Copyright (C) 1999-2006, MIYASAKA Masaru.
; For conditions of distribution and use, see copyright notice in jsimdext.inc
;
; This file should be assembled with NASM (Netwide Assembler),
; can *not* be assembled with Microsoft's MASM or any compatible
; assembler (including Borland's Turbo Assembler).
; NASM is available from http://nasm.sourceforge.net/ or
; http://sourceforge.net/project/showfiles.php?group_id=6208
; This file should be assembled with NASM (Netwide Assembler) or Yasm.
%include "jcolsamp.inc"

View File

@ -9,11 +9,7 @@
; Copyright (C) 1999-2006, MIYASAKA Masaru.
; For conditions of distribution and use, see copyright notice in jsimdext.inc
;
; This file should be assembled with NASM (Netwide Assembler),
; can *not* be assembled with Microsoft's MASM or any compatible
; assembler (including Borland's Turbo Assembler).
; NASM is available from http://nasm.sourceforge.net/ or
; http://sourceforge.net/project/showfiles.php?group_id=6208
; This file should be assembled with NASM (Netwide Assembler) or Yasm.
%include "jsimdext.inc"

View File

@ -8,11 +8,7 @@
; Copyright (C) 1999-2006, MIYASAKA Masaru.
; For conditions of distribution and use, see copyright notice in jsimdext.inc
;
; This file should be assembled with NASM (Netwide Assembler),
; can *not* be assembled with Microsoft's MASM or any compatible
; assembler (including Borland's Turbo Assembler).
; NASM is available from http://nasm.sourceforge.net/ or
; http://sourceforge.net/project/showfiles.php?group_id=6208
; This file should be assembled with NASM (Netwide Assembler) or Yasm.
%include "jsimdext.inc"

View File

@ -8,11 +8,7 @@
; Copyright (C) 1999-2006, MIYASAKA Masaru.
; For conditions of distribution and use, see copyright notice in jsimdext.inc
;
; This file should be assembled with NASM (Netwide Assembler),
; can *not* be assembled with Microsoft's MASM or any compatible
; assembler (including Borland's Turbo Assembler).
; NASM is available from http://nasm.sourceforge.net/ or
; http://sourceforge.net/project/showfiles.php?group_id=6208
; This file should be assembled with NASM (Netwide Assembler) or Yasm.
%include "jsimdext.inc"

View File

@ -9,11 +9,7 @@
; Copyright (C) 1999-2006, MIYASAKA Masaru.
; For conditions of distribution and use, see copyright notice in jsimdext.inc
;
; This file should be assembled with NASM (Netwide Assembler),
; can *not* be assembled with Microsoft's MASM or any compatible
; assembler (including Borland's Turbo Assembler).
; NASM is available from http://nasm.sourceforge.net/ or
; http://sourceforge.net/project/showfiles.php?group_id=6208
; This file should be assembled with NASM (Netwide Assembler) or Yasm.
%include "jsimdext.inc"

View File

@ -8,11 +8,7 @@
; Copyright (C) 1999-2006, MIYASAKA Masaru.
; For conditions of distribution and use, see copyright notice in jsimdext.inc
;
; This file should be assembled with NASM (Netwide Assembler),
; can *not* be assembled with Microsoft's MASM or any compatible
; assembler (including Borland's Turbo Assembler).
; NASM is available from http://nasm.sourceforge.net/ or
; http://sourceforge.net/project/showfiles.php?group_id=6208
; This file should be assembled with NASM (Netwide Assembler) or Yasm.
%include "jsimdext.inc"

View File

@ -8,11 +8,7 @@
; Copyright (C) 1999-2006, MIYASAKA Masaru.
; For conditions of distribution and use, see copyright notice in jsimdext.inc
;
; This file should be assembled with NASM (Netwide Assembler),
; can *not* be assembled with Microsoft's MASM or any compatible
; assembler (including Borland's Turbo Assembler).
; NASM is available from http://nasm.sourceforge.net/ or
; http://sourceforge.net/project/showfiles.php?group_id=6208
; This file should be assembled with NASM (Netwide Assembler) or Yasm.
%include "jsimdext.inc"

View File

@ -9,11 +9,7 @@
; Copyright (C) 1999-2006, MIYASAKA Masaru.
; For conditions of distribution and use, see copyright notice in jsimdext.inc
;
; This file should be assembled with NASM (Netwide Assembler),
; can *not* be assembled with Microsoft's MASM or any compatible
; assembler (including Borland's Turbo Assembler).
; NASM is available from http://nasm.sourceforge.net/ or
; http://sourceforge.net/project/showfiles.php?group_id=6208
; This file should be assembled with NASM (Netwide Assembler) or Yasm.
%include "jcolsamp.inc"

View File

@ -8,11 +8,7 @@
; Copyright (C) 1999-2006, MIYASAKA Masaru.
; For conditions of distribution and use, see copyright notice in jsimdext.inc
;
; This file should be assembled with NASM (Netwide Assembler),
; can *not* be assembled with Microsoft's MASM or any compatible
; assembler (including Borland's Turbo Assembler).
; NASM is available from http://nasm.sourceforge.net/ or
; http://sourceforge.net/project/showfiles.php?group_id=6208
; This file should be assembled with NASM (Netwide Assembler) or Yasm.
%include "jcolsamp.inc"

View File

@ -8,11 +8,7 @@
; Copyright (C) 1999-2006, MIYASAKA Masaru.
; For conditions of distribution and use, see copyright notice in jsimdext.inc
;
; This file should be assembled with NASM (Netwide Assembler),
; can *not* be assembled with Microsoft's MASM or any compatible
; assembler (including Borland's Turbo Assembler).
; NASM is available from http://nasm.sourceforge.net/ or
; http://sourceforge.net/project/showfiles.php?group_id=6208
; This file should be assembled with NASM (Netwide Assembler) or Yasm.
%include "jcolsamp.inc"

View File

@ -9,11 +9,7 @@
; Copyright (C) 1999-2006, MIYASAKA Masaru.
; For conditions of distribution and use, see copyright notice in jsimdext.inc
;
; This file should be assembled with NASM (Netwide Assembler),
; can *not* be assembled with Microsoft's MASM or any compatible
; assembler (including Borland's Turbo Assembler).
; NASM is available from http://nasm.sourceforge.net/ or
; http://sourceforge.net/project/showfiles.php?group_id=6208
; This file should be assembled with NASM (Netwide Assembler) or Yasm.
%include "jsimdext.inc"

View File

@ -8,11 +8,7 @@
; Copyright (C) 1999-2006, MIYASAKA Masaru.
; For conditions of distribution and use, see copyright notice in jsimdext.inc
;
; This file should be assembled with NASM (Netwide Assembler),
; can *not* be assembled with Microsoft's MASM or any compatible
; assembler (including Borland's Turbo Assembler).
; NASM is available from http://nasm.sourceforge.net/ or
; http://sourceforge.net/project/showfiles.php?group_id=6208
; This file should be assembled with NASM (Netwide Assembler) or Yasm.
%include "jsimdext.inc"

View File

@ -8,11 +8,7 @@
; Copyright (C) 1999-2006, MIYASAKA Masaru.
; For conditions of distribution and use, see copyright notice in jsimdext.inc
;
; This file should be assembled with NASM (Netwide Assembler),
; can *not* be assembled with Microsoft's MASM or any compatible
; assembler (including Borland's Turbo Assembler).
; NASM is available from http://nasm.sourceforge.net/ or
; http://sourceforge.net/project/showfiles.php?group_id=6208
; This file should be assembled with NASM (Netwide Assembler) or Yasm.
%include "jsimdext.inc"

View File

@ -8,11 +8,7 @@
; Copyright (C) 1999-2006, MIYASAKA Masaru.
; For conditions of distribution and use, see copyright notice in jsimdext.inc
;
; This file should be assembled with NASM (Netwide Assembler),
; can *not* be assembled with Microsoft's MASM or any compatible
; assembler (including Borland's Turbo Assembler).
; NASM is available from http://nasm.sourceforge.net/ or
; http://sourceforge.net/project/showfiles.php?group_id=6208
; This file should be assembled with NASM (Netwide Assembler) or Yasm.
;
; This file contains a floating-point implementation of the forward DCT
; (Discrete Cosine Transform). The following code is based directly on

View File

@ -8,11 +8,7 @@
; Copyright (C) 1999-2006, MIYASAKA Masaru.
; For conditions of distribution and use, see copyright notice in jsimdext.inc
;
; This file should be assembled with NASM (Netwide Assembler),
; can *not* be assembled with Microsoft's MASM or any compatible
; assembler (including Borland's Turbo Assembler).
; NASM is available from http://nasm.sourceforge.net/ or
; http://sourceforge.net/project/showfiles.php?group_id=6208
; This file should be assembled with NASM (Netwide Assembler) or Yasm.
;
; This file contains a floating-point implementation of the forward DCT
; (Discrete Cosine Transform). The following code is based directly on

View File

@ -8,11 +8,7 @@
; Copyright (C) 1999-2006, MIYASAKA Masaru.
; For conditions of distribution and use, see copyright notice in jsimdext.inc
;
; This file should be assembled with NASM (Netwide Assembler),
; can *not* be assembled with Microsoft's MASM or any compatible
; assembler (including Borland's Turbo Assembler).
; NASM is available from http://nasm.sourceforge.net/ or
; http://sourceforge.net/project/showfiles.php?group_id=6208
; This file should be assembled with NASM (Netwide Assembler) or Yasm.
;
; This file contains a fast, not so accurate integer implementation of
; the forward DCT (Discrete Cosine Transform). The following code is

View File

@ -8,11 +8,7 @@
; Copyright (C) 1999-2006, MIYASAKA Masaru.
; For conditions of distribution and use, see copyright notice in jsimdext.inc
;
; This file should be assembled with NASM (Netwide Assembler),
; can *not* be assembled with Microsoft's MASM or any compatible
; assembler (including Borland's Turbo Assembler).
; NASM is available from http://nasm.sourceforge.net/ or
; http://sourceforge.net/project/showfiles.php?group_id=6208
; This file should be assembled with NASM (Netwide Assembler) or Yasm.
;
; This file contains a fast, not so accurate integer implementation of
; the forward DCT (Discrete Cosine Transform). The following code is

View File

@ -8,11 +8,7 @@
; Copyright (C) 1999-2006, MIYASAKA Masaru.
; For conditions of distribution and use, see copyright notice in jsimdext.inc
;
; This file should be assembled with NASM (Netwide Assembler),
; can *not* be assembled with Microsoft's MASM or any compatible
; assembler (including Borland's Turbo Assembler).
; NASM is available from http://nasm.sourceforge.net/ or
; http://sourceforge.net/project/showfiles.php?group_id=6208
; This file should be assembled with NASM (Netwide Assembler) or Yasm.
;
; This file contains a slower but more accurate integer implementation of the
; forward DCT (Discrete Cosine Transform). The following code is based

View File

@ -8,11 +8,7 @@
; Copyright (C) 1999-2006, MIYASAKA Masaru.
; For conditions of distribution and use, see copyright notice in jsimdext.inc
;
; This file should be assembled with NASM (Netwide Assembler),
; can *not* be assembled with Microsoft's MASM or any compatible
; assembler (including Borland's Turbo Assembler).
; NASM is available from http://nasm.sourceforge.net/ or
; http://sourceforge.net/project/showfiles.php?group_id=6208
; This file should be assembled with NASM (Netwide Assembler) or Yasm.
;
; This file contains a slower but more accurate integer implementation of the
; forward DCT (Discrete Cosine Transform). The following code is based

View File

@ -8,11 +8,7 @@
; Copyright (C) 1999-2006, MIYASAKA Masaru.
; For conditions of distribution and use, see copyright notice in jsimdext.inc
;
; This file should be assembled with NASM (Netwide Assembler),
; can *not* be assembled with Microsoft's MASM or any compatible
; assembler (including Borland's Turbo Assembler).
; NASM is available from http://nasm.sourceforge.net/ or
; http://sourceforge.net/project/showfiles.php?group_id=6208
; This file should be assembled with NASM (Netwide Assembler) or Yasm.
;
; This file contains a slower but more accurate integer implementation of the
; forward DCT (Discrete Cosine Transform). The following code is based

View File

@ -8,11 +8,7 @@
; Copyright (C) 1999-2006, MIYASAKA Masaru.
; For conditions of distribution and use, see copyright notice in jsimdext.inc
;
; This file should be assembled with NASM (Netwide Assembler),
; can *not* be assembled with Microsoft's MASM or any compatible
; assembler (including Borland's Turbo Assembler).
; NASM is available from http://nasm.sourceforge.net/ or
; http://sourceforge.net/project/showfiles.php?group_id=6208
; This file should be assembled with NASM (Netwide Assembler) or Yasm.
;
; This file contains a floating-point implementation of the inverse DCT
; (Discrete Cosine Transform). The following code is based directly on

View File

@ -8,11 +8,7 @@
; Copyright (C) 1999-2006, MIYASAKA Masaru.
; For conditions of distribution and use, see copyright notice in jsimdext.inc
;
; This file should be assembled with NASM (Netwide Assembler),
; can *not* be assembled with Microsoft's MASM or any compatible
; assembler (including Borland's Turbo Assembler).
; NASM is available from http://nasm.sourceforge.net/ or
; http://sourceforge.net/project/showfiles.php?group_id=6208
; This file should be assembled with NASM (Netwide Assembler) or Yasm.
;
; This file contains a floating-point implementation of the inverse DCT
; (Discrete Cosine Transform). The following code is based directly on

View File

@ -8,11 +8,7 @@
; Copyright (C) 1999-2006, MIYASAKA Masaru.
; For conditions of distribution and use, see copyright notice in jsimdext.inc
;
; This file should be assembled with NASM (Netwide Assembler),
; can *not* be assembled with Microsoft's MASM or any compatible
; assembler (including Borland's Turbo Assembler).
; NASM is available from http://nasm.sourceforge.net/ or
; http://sourceforge.net/project/showfiles.php?group_id=6208
; This file should be assembled with NASM (Netwide Assembler) or Yasm.
;
; This file contains a floating-point implementation of the inverse DCT
; (Discrete Cosine Transform). The following code is based directly on

View File

@ -8,11 +8,7 @@
; Copyright (C) 1999-2006, MIYASAKA Masaru.
; For conditions of distribution and use, see copyright notice in jsimdext.inc
;
; This file should be assembled with NASM (Netwide Assembler),
; can *not* be assembled with Microsoft's MASM or any compatible
; assembler (including Borland's Turbo Assembler).
; NASM is available from http://nasm.sourceforge.net/ or
; http://sourceforge.net/project/showfiles.php?group_id=6208
; This file should be assembled with NASM (Netwide Assembler) or Yasm.
;
; This file contains a fast, not so accurate integer implementation of
; the inverse DCT (Discrete Cosine Transform). The following code is

View File

@ -8,11 +8,7 @@
; Copyright (C) 1999-2006, MIYASAKA Masaru.
; For conditions of distribution and use, see copyright notice in jsimdext.inc
;
; This file should be assembled with NASM (Netwide Assembler),
; can *not* be assembled with Microsoft's MASM or any compatible
; assembler (including Borland's Turbo Assembler).
; NASM is available from http://nasm.sourceforge.net/ or
; http://sourceforge.net/project/showfiles.php?group_id=6208
; This file should be assembled with NASM (Netwide Assembler) or Yasm.
;
; This file contains a fast, not so accurate integer implementation of
; the inverse DCT (Discrete Cosine Transform). The following code is

View File

@ -8,11 +8,7 @@
; Copyright (C) 1999-2006, MIYASAKA Masaru.
; For conditions of distribution and use, see copyright notice in jsimdext.inc
;
; This file should be assembled with NASM (Netwide Assembler),
; can *not* be assembled with Microsoft's MASM or any compatible
; assembler (including Borland's Turbo Assembler).
; NASM is available from http://nasm.sourceforge.net/ or
; http://sourceforge.net/project/showfiles.php?group_id=6208
; This file should be assembled with NASM (Netwide Assembler) or Yasm.
;
; This file contains a slower but more accurate integer implementation of the
; inverse DCT (Discrete Cosine Transform). The following code is based

View File

@ -8,11 +8,7 @@
; Copyright (C) 1999-2006, MIYASAKA Masaru.
; For conditions of distribution and use, see copyright notice in jsimdext.inc
;
; This file should be assembled with NASM (Netwide Assembler),
; can *not* be assembled with Microsoft's MASM or any compatible
; assembler (including Borland's Turbo Assembler).
; NASM is available from http://nasm.sourceforge.net/ or
; http://sourceforge.net/project/showfiles.php?group_id=6208
; This file should be assembled with NASM (Netwide Assembler) or Yasm.
;
; This file contains a slower but more accurate integer implementation of the
; inverse DCT (Discrete Cosine Transform). The following code is based

View File

@ -8,11 +8,7 @@
; Copyright (C) 1999-2006, MIYASAKA Masaru.
; For conditions of distribution and use, see copyright notice in jsimdext.inc
;
; This file should be assembled with NASM (Netwide Assembler),
; can *not* be assembled with Microsoft's MASM or any compatible
; assembler (including Borland's Turbo Assembler).
; NASM is available from http://nasm.sourceforge.net/ or
; http://sourceforge.net/project/showfiles.php?group_id=6208
; This file should be assembled with NASM (Netwide Assembler) or Yasm.
;
; This file contains a slower but more accurate integer implementation of the
; inverse DCT (Discrete Cosine Transform). The following code is based

View File

@ -8,11 +8,7 @@
; Copyright (C) 1999-2006, MIYASAKA Masaru.
; For conditions of distribution and use, see copyright notice in jsimdext.inc
;
; This file should be assembled with NASM (Netwide Assembler),
; can *not* be assembled with Microsoft's MASM or any compatible
; assembler (including Borland's Turbo Assembler).
; NASM is available from http://nasm.sourceforge.net/ or
; http://sourceforge.net/project/showfiles.php?group_id=6208
; This file should be assembled with NASM (Netwide Assembler) or Yasm.
;
; This file contains inverse-DCT routines that produce reduced-size
; output: either 4x4 or 2x2 pixels from an 8x8 DCT block.

View File

@ -8,11 +8,7 @@
; Copyright (C) 1999-2006, MIYASAKA Masaru.
; For conditions of distribution and use, see copyright notice in jsimdext.inc
;
; This file should be assembled with NASM (Netwide Assembler),
; can *not* be assembled with Microsoft's MASM or any compatible
; assembler (including Borland's Turbo Assembler).
; NASM is available from http://nasm.sourceforge.net/ or
; http://sourceforge.net/project/showfiles.php?group_id=6208
; This file should be assembled with NASM (Netwide Assembler) or Yasm.
;
; This file contains inverse-DCT routines that produce reduced-size
; output: either 4x4 or 2x2 pixels from an 8x8 DCT block.

View File

@ -8,11 +8,7 @@
; Copyright (C) 1999-2006, MIYASAKA Masaru.
; For conditions of distribution and use, see copyright notice in jsimdext.inc
;
; This file should be assembled with NASM (Netwide Assembler),
; can *not* be assembled with Microsoft's MASM or any compatible
; assembler (including Borland's Turbo Assembler).
; NASM is available from http://nasm.sourceforge.net/ or
; http://sourceforge.net/project/showfiles.php?group_id=6208
; This file should be assembled with NASM (Netwide Assembler) or Yasm.
%include "jsimdext.inc"
%include "jdct.inc"

View File

@ -8,11 +8,7 @@
; Copyright (C) 1999-2006, MIYASAKA Masaru.
; For conditions of distribution and use, see copyright notice in jsimdext.inc
;
; This file should be assembled with NASM (Netwide Assembler),
; can *not* be assembled with Microsoft's MASM or any compatible
; assembler (including Borland's Turbo Assembler).
; NASM is available from http://nasm.sourceforge.net/ or
; http://sourceforge.net/project/showfiles.php?group_id=6208
; This file should be assembled with NASM (Netwide Assembler) or Yasm.
%include "jsimdext.inc"
%include "jdct.inc"
@ -120,8 +116,8 @@ EXTN(jsimd_convsamp_mmx):
; Quantize/descale the coefficients, and store into coef_block
;
; This implementation is based on an algorithm described in
; "How to optimize for the Pentium family of microprocessors"
; (http://www.agner.org/assem/).
; "Optimizing subroutines in assembly language:
; An optimization guide for x86 platforms" (https://agner.org/optimize).
;
; GLOBAL(void)
; jsimd_quantize_mmx(JCOEFPTR coef_block, DCTELEM *divisors,

View File

@ -8,11 +8,7 @@
; Copyright (C) 1999-2006, MIYASAKA Masaru.
; For conditions of distribution and use, see copyright notice in jsimdext.inc
;
; This file should be assembled with NASM (Netwide Assembler),
; can *not* be assembled with Microsoft's MASM or any compatible
; assembler (including Borland's Turbo Assembler).
; NASM is available from http://nasm.sourceforge.net/ or
; http://sourceforge.net/project/showfiles.php?group_id=6208
; This file should be assembled with NASM (Netwide Assembler) or Yasm.
%include "jsimdext.inc"
%include "jdct.inc"

View File

@ -8,11 +8,7 @@
; Copyright (C) 1999-2006, MIYASAKA Masaru.
; For conditions of distribution and use, see copyright notice in jsimdext.inc
;
; This file should be assembled with NASM (Netwide Assembler),
; can *not* be assembled with Microsoft's MASM or any compatible
; assembler (including Borland's Turbo Assembler).
; NASM is available from http://nasm.sourceforge.net/ or
; http://sourceforge.net/project/showfiles.php?group_id=6208
; This file should be assembled with NASM (Netwide Assembler) or Yasm.
%include "jsimdext.inc"
%include "jdct.inc"

View File

@ -2,18 +2,14 @@
; jquanti.asm - sample data conversion and quantization (AVX2)
;
; Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB
; Copyright (C) 2016, 2018, D. R. Commander.
; Copyright (C) 2016, 2018, 2024, D. R. Commander.
; Copyright (C) 2016, Matthieu Darbois.
;
; Based on the x86 SIMD extension for IJG JPEG library
; Copyright (C) 1999-2006, MIYASAKA Masaru.
; For conditions of distribution and use, see copyright notice in jsimdext.inc
;
; This file should be assembled with NASM (Netwide Assembler),
; can *not* be assembled with Microsoft's MASM or any compatible
; assembler (including Borland's Turbo Assembler).
; NASM is available from http://nasm.sourceforge.net/ or
; http://sourceforge.net/project/showfiles.php?group_id=6208
; This file should be assembled with NASM (Netwide Assembler) or Yasm.
%include "jsimdext.inc"
%include "jdct.inc"
@ -107,8 +103,8 @@ EXTN(jsimd_convsamp_avx2):
; Quantize/descale the coefficients, and store into coef_block
;
; This implementation is based on an algorithm described in
; "How to optimize for the Pentium family of microprocessors"
; (http://www.agner.org/assem/).
; "Optimizing subroutines in assembly language:
; An optimization guide for x86 platforms" (https://agner.org/optimize).
;
; GLOBAL(void)
; jsimd_quantize_avx2(JCOEFPTR coef_block, DCTELEM *divisors,

View File

@ -8,11 +8,7 @@
; Copyright (C) 1999-2006, MIYASAKA Masaru.
; For conditions of distribution and use, see copyright notice in jsimdext.inc
;
; This file should be assembled with NASM (Netwide Assembler),
; can *not* be assembled with Microsoft's MASM or any compatible
; assembler (including Borland's Turbo Assembler).
; NASM is available from http://nasm.sourceforge.net/ or
; http://sourceforge.net/project/showfiles.php?group_id=6208
; This file should be assembled with NASM (Netwide Assembler) or Yasm.
%include "jsimdext.inc"
%include "jdct.inc"
@ -98,8 +94,8 @@ EXTN(jsimd_convsamp_sse2):
; Quantize/descale the coefficients, and store into coef_block
;
; This implementation is based on an algorithm described in
; "How to optimize for the Pentium family of microprocessors"
; (http://www.agner.org/assem/).
; "Optimizing subroutines in assembly language:
; An optimization guide for x86 platforms" (https://agner.org/optimize).
;
; GLOBAL(void)
; jsimd_quantize_sse2(JCOEFPTR coef_block, DCTELEM *divisors,

View File

@ -2,7 +2,7 @@
* jsimd_i386.c
*
* Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB
* Copyright (C) 2009-2011, 2013-2014, 2016, 2018, 2022-2023, D. R. Commander.
* Copyright (C) 2009-2011, 2013-2014, 2016, 2018, 2022-2024, D. R. Commander.
* Copyright (C) 2015-2016, 2018, 2022, Matthieu Darbois.
*
* Based on the x86 SIMD extension for IJG JPEG library,
@ -15,11 +15,11 @@
*/
#define JPEG_INTERNALS
#include "../../jinclude.h"
#include "../../jpeglib.h"
#include "../../jsimd.h"
#include "../../jdct.h"
#include "../../jsimddct.h"
#include "../../src/jinclude.h"
#include "../../src/jpeglib.h"
#include "../../src/jsimd.h"
#include "../../src/jdct.h"
#include "../../src/jsimddct.h"
#include "../jsimd.h"
/*

View File

@ -8,11 +8,7 @@
; Copyright (C) 1999-2006, MIYASAKA Masaru.
; For conditions of distribution and use, see copyright notice in jsimdext.inc
;
; This file should be assembled with NASM (Netwide Assembler),
; can *not* be assembled with Microsoft's MASM or any compatible
; assembler (including Borland's Turbo Assembler).
; NASM is available from http://nasm.sourceforge.net/ or
; http://sourceforge.net/project/showfiles.php?group_id=6208
; This file should be assembled with NASM (Netwide Assembler) or Yasm.
%include "jsimdext.inc"

Some files were not shown because too many files have changed in this diff Show More