Compare commits

...

458 Commits

Author SHA1 Message Date
Alexander Smorkalov
21402668a1
Merge pull request #26927 from asmorkalov:as/squeeze_windows
Squeeze several Windows pipelines into one with jobs
2025-02-17 15:08:43 +03:00
Alexander Smorkalov
680fd4d975
Merge pull request #26911 from asmorkalov:as/openvx_hal_imgproc
Migrate remaning OpenVX integrations to OpenVX HAL (imgproc)
2025-02-17 13:57:17 +03:00
Alexander Smorkalov
9a3fb556c4
Merge pull request #26928 from shyama7004:doxFix
fix minor issues in calib3d Docs
2025-02-17 13:54:31 +03:00
Alexander Smorkalov
7ac939c53a Squeeze several Windows pipelines into one with jobs. 2025-02-17 13:51:48 +03:00
shyama7004
18c0368840 fix minor issues in calib3d Docs 2025-02-17 12:38:00 +05:30
Alexander Smorkalov
8065f10521
Merge pull request #26921 from shyama7004:sampsonDistance
Fix assertion in cv2.sampsonDistance
2025-02-17 09:25:30 +03:00
Alexander Smorkalov
acc9084044 Move OpenVX integrations to imgproc to OpenVX HAL
Covered functions:
- medianBlur
- Sobel
- Canny
- pyrDown
- BoxFilter
- equalizeHist
- GaussianBlur
- remap
- threshold
2025-02-15 09:55:37 +03:00
Skreg
a9cb451199
Fix assertion in cv2.sampsonDistance 2025-02-15 04:47:01 +00:00
Alexander Smorkalov
36a5176a5f
Merge pull request #26907 from asmorkalov:as/openvx_hal_features2d
Migrate remaning OpenVX integrations to OpenVX HAL (features2d)
2025-02-14 19:39:58 +03:00
Alexander Smorkalov
1de6e20463 Move OpenVX implementation for FAST to HAL. 2025-02-14 17:47:48 +03:00
Alexander Smorkalov
ae25c3194f
Merge pull request #26875 from asmorkalov:as/in_memory_models
Added trackers factory with pre-loaded dnn models #26875

Replaces https://github.com/opencv/opencv/pull/26295

Allows to substitute custom models or initialize tracker from in-memory model.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-02-14 15:35:38 +03:00
Alexander Smorkalov
a8df0a06ac
Merge pull request #26902 from Kababey:patch-3
Update optical_flow.cpp
2025-02-14 15:22:04 +03:00
Alexander Smorkalov
b44b30b730
Merge pull request #26898 from mshabunin:fix-riscv-toolchain
RISC-V: error message in the toolchain file when compiler is not found
2025-02-14 13:24:25 +03:00
Alexander Smorkalov
58e557d059
Merge pull request #26903 from asmorkalov:as/openvx_hal
Migrate remaning OpenVX integrations to OpenVX HAL (core) #26903

Tested with OpenVX 1.2 & 1.3 sample implementation.

Steps to build and test:
```
git clone git@github.com:KhronosGroup/OpenVX-sample-impl.git
cd OpenVX-sample-impl
python3 Build.py --os=Linux --conf=Release
cd ..
mkdir build
cmake -DWITH_OPENVX=ON -DOPENVX_ROOT=/mnt/Projects/Projects/OpenVX-sample-impl/install/Linux/x64/Release/ ../opencv
make -j8
```

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-02-14 11:55:20 +03:00
Alexander Smorkalov
e4b23cf96a
Merge pull request #26920 from ConnorBaker:fix/cmake-matches-uses-regex
cmake/OpenCVDetectCUDAUtils.cmake: use IN_LIST to avoid regex matching valid capabilities
2025-02-14 11:54:05 +03:00
Connor Baker
5200419ba5 cmake/OpenCVDetectCUDAUtils.cmake: use IN_LIST to avoid regex matching valid capabilities 2025-02-13 23:47:00 +00:00
Alexander Smorkalov
def8619648
Merge pull request #26917 from asmorkalov:as/static_fastcv
Switch to static instance of FastCV
2025-02-13 21:26:42 +03:00
Maksim Shabunin
45aa502549
Merge pull request #26915 from mshabunin:fix-png-be
Resolves #26913
Related(?): #25715 #26832
2025-02-13 16:58:15 +03:00
Alexander Smorkalov
5921aae2b3 Switch to static instance of FastCV on Linux. 2025-02-13 15:58:25 +03:00
Alexander Smorkalov
8e65075c1e
Merge pull request #26895 from asmorkalov:as/mean_hal
Use HAL for cv::mean function too
2025-02-12 12:02:14 +03:00
Alexander Smorkalov
7aaada4175 Use HAL for cv::mean function too. 2025-02-12 08:51:20 +03:00
Alexander Smorkalov
62658fba24
Merge pull request #26908 from shyama7004:_DEBUG/NDEBUG
Fix _DEBUG/NDEBUG handling across modules
2025-02-12 08:18:16 +03:00
shyama7004
076bfa6431 Fix _DEBUG/NDEBUG handling across modules (#26151) 2025-02-11 22:00:44 +05:30
lve-gh
d8c2f0bcdf
Merge pull request #26884 from lve-gh:split8u_rvv_hal
[HAL] split8u RVV 1.0 #26884

### Pull Request Readiness Checklist
* Banana Pi BF3 (SpacemiT K1)
* Compiler: Syntacore Clang 18.1.4 (build 2024.12)
```
Geometric mean (ms)

                  Name of Test                   baseline  hal      hal
                                                    ui               vs
                                                                  baseline 
                                                                     ui
                                                                 (x-factor)
split::Size_Depth_Channels::(127x61, 8UC1, 2)     0.012   0.004     3.12   
split::Size_Depth_Channels::(127x61, 8UC1, 3)     0.019   0.006     2.91   
split::Size_Depth_Channels::(127x61, 8UC1, 4)     0.028   0.011     2.64   
split::Size_Depth_Channels::(127x61, 8UC1, 5)     0.067   0.033     2.02   
split::Size_Depth_Channels::(127x61, 8UC1, 6)     0.084   0.040     2.11   
split::Size_Depth_Channels::(127x61, 8UC1, 7)     0.103   0.055     1.88   
split::Size_Depth_Channels::(127x61, 8UC1, 8)     0.113   0.032     3.50   
split::Size_Depth_Channels::(640x480, 8UC1, 2)    0.454   0.179     2.54   
split::Size_Depth_Channels::(640x480, 8UC1, 3)    0.677   0.298     2.27   
split::Size_Depth_Channels::(640x480, 8UC1, 4)    0.901   0.410     2.20   
split::Size_Depth_Channels::(640x480, 8UC1, 5)    3.781   3.010     1.26   
split::Size_Depth_Channels::(640x480, 8UC1, 6)    4.886   4.009     1.22   
split::Size_Depth_Channels::(640x480, 8UC1, 7)    5.777   4.770     1.21   
split::Size_Depth_Channels::(640x480, 8UC1, 8)    4.596   1.330     3.46   
split::Size_Depth_Channels::(1280x720, 8UC1, 2)   1.377   0.709     1.94   
split::Size_Depth_Channels::(1280x720, 8UC1, 3)   2.091   1.034     2.02   
split::Size_Depth_Channels::(1280x720, 8UC1, 4)   2.744   1.573     1.74   
split::Size_Depth_Channels::(1280x720, 8UC1, 5)   9.542   6.284     1.52   
split::Size_Depth_Channels::(1280x720, 8UC1, 6)   11.114  7.850     1.42   
split::Size_Depth_Channels::(1280x720, 8UC1, 7)   14.083  11.879    1.19   
split::Size_Depth_Channels::(1280x720, 8UC1, 8)   13.524  3.865     3.50   
split::Size_Depth_Channels::(1920x1080, 8UC1, 2)  3.108   1.395     2.23   
split::Size_Depth_Channels::(1920x1080, 8UC1, 3)  4.659   2.128     2.19   
split::Size_Depth_Channels::(1920x1080, 8UC1, 4)  6.127   2.818     2.17   
split::Size_Depth_Channels::(1920x1080, 8UC1, 5)  26.733  16.625    1.61   
split::Size_Depth_Channels::(1920x1080, 8UC1, 6)  31.242  22.414    1.39   
split::Size_Depth_Channels::(1920x1080, 8UC1, 7)  35.968  27.658    1.30   
split::Size_Depth_Channels::(1920x1080, 8UC1, 8)  29.997  8.655     3.47
```
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2025-02-11 17:57:05 +03:00
Kababey
2364f4b0b9
Update optical_flow.cpp
ref: push_back is changed to emplace_back in order to avoid unnecessary conversions [Scalar(r, g, b))] .
2025-02-11 13:30:48 +03:00
Alexander Smorkalov
7eaddb8aa4
Merge pull request #26897 from shyama7004:typos-fix
fix hal_replacement typos
2025-02-11 09:37:08 +03:00
Maksim Shabunin
284660fb6c RISC-V: error message in the toolchain file when compiler is not found 2025-02-10 18:47:26 +03:00
shyama7004
a490623bf0 fix hal_replacement typos 2025-02-10 20:33:33 +05:30
Alexander Smorkalov
1e013a07c4
Merge pull request #26891 from MaximSmolskiy:refactor-test-for-filestorage-base64
Refactor test for FileStorage Base64
2025-02-10 13:31:29 +03:00
Alexander Smorkalov
ce51023ad4
Merge pull request #26894 from Kumataro:fix26893
imgcodecs: tiff: refactor Imgcodecs_Tiff_decode_Huge test
2025-02-10 10:46:31 +03:00
Kumataro
fbd8180cc1 imgcodecs: tiff: refactor reading scanlines test 2025-02-10 08:40:28 +09:00
MaximSmolskiy
4d23b56d98 Refactor test for FileStorage Base64 2025-02-09 01:38:14 +03:00
Alexander Smorkalov
0e17a879d7
Merge pull request #26890 from shyama7004:type-hint
fix wrong python type hints for imread
2025-02-08 11:19:36 +03:00
shyama7004
bbca50ecc5 fix wrong python type hints for imread 2025-02-08 11:17:34 +03:00
Kumataro
6c2d6bea2f
Merge pull request #26889 from Kumataro:fix26877
doc: update supporting imgcodec format settings #26889

Close https://github.com/opencv/opencv/issues/26877

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-02-08 10:12:09 +03:00
RoshniUG
b323780460
Merge pull request #26662 from RoshniUG:4.x
Update window_cocoa.mm #26662

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
- [x ] Added reference to the original bug report (#26661).
- [x]Updated the code as per the reviewer's suggestion to use a ternary operator.
- [x] Verified that the feature is properly documented and can be built with CMake.
2025-02-08 09:50:53 +03:00
Alexander Smorkalov
aeac913203
Merge pull request #26882 from shyama7004:nullptr-setidentity
replace null literals with nullptr and optimize setidentity with std::fill
2025-02-07 17:18:49 +03:00
Alexander Smorkalov
740388b3ce
Merge pull request #26867 from shyama7004:fix-meanStdDev
fix meanStdDev overflow for large images
2025-02-07 13:13:35 +03:00
shyama7004
987ba6504b fix meanStdDev overflow for large images 2025-02-07 10:17:48 +03:00
shyama7004
32d3d54ca1 replace null literals with nullptr; optimize setidentity with std::fill for cv_64fc1 2025-02-06 23:48:23 +05:30
天音あめ
2e909c38dc
Merge pull request #26804 from amane-ame:norm_hal_rvv
Add RISC-V HAL implementation for cv::norm and cv::normalize #26804

This patch implements `cv::norm` with norm types `NORM_INF/NORM_L1/NORM_L2/NORM_L2SQR` and `Mat::convertTo` function in RVV_HAL using native intrinsic, optimizing the performance for `cv::norm(src)`, `cv::norm(src1, src2)`, and `cv::normalize(src)` with data types `8UC1/8UC4/32FC1`.

`cv::normalize` also calls `minMaxIdx`, #26789 implements RVV_HAL for this.

Tested on MUSE-PI for both gcc 14.2 and clang 20.0.

```
$ opencv_test_core --gtest_filter="*Norm*"
$ opencv_perf_core --gtest_filter="*norm*" --perf_min_samples=300 --perf_force_samples=300
```

The head of the perf table is shown below since the table is too long.

View the full perf table here: [hal_rvv_norm.pdf](https://github.com/user-attachments/files/18468255/hal_rvv_norm.pdf)

<img width="1304" alt="Untitled" src="https://github.com/user-attachments/assets/3550b671-6d96-4db3-8b5b-d4cb241da650" />

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-02-06 19:34:54 +03:00
Kumataro
01e3fe8791
Merge pull request #26859 from Kumataro:fix26858
imgcodecs:gif: support IMREAD_UNCHANGED and IMREAD_GRAYSCALE #26859

Close #26858 

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2025-02-06 19:29:54 +03:00
Malacath-92
7563cebad5
Merge pull request #26879 from Malacath-92:4.x
Add missing include in gislandmodel.hpp #26879

Add `<exception>`, `<string>`, and `<cstddef>` includes to `gislandmodel.hpp` which are required due to the usage of `std::exception_ptr`, `std::string`, and `size_t` in this header.

Notably one of those causes a build error on recent versions of Xcode: #26780 

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [N/A] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [N/A] The feature is well documented and sample code can be built with the project CMake
2025-02-06 13:34:46 +03:00
Alexander Smorkalov
9a566d772f
Merge pull request #26878 from asmorkalov:as/respect_namespace_hal
Do not rely on cv namespace in HAL
2025-02-06 12:51:58 +03:00
Alexander Smorkalov
b7663086fb Do not rely on cv namespace in HAL. 2025-02-06 10:00:28 +03:00
Suleyman TURKMEN
e8e49ab7a8
Merge pull request #26872 from sturkmen72:ImageEncoders_revisions
Performance tests for image encoders and decoders and code cleanup #26872

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-02-04 12:21:55 +03:00
lawrencec98
59c3b6c995
Merge pull request #26600 from lawrencec98:Issue25250-lens-distortion-documentation-unclear
Issue25250 lens distortion documentation unclear #26600

### Pull Request Readiness Checklist

This pull request addresses the issue in https://github.com/opencv/opencv/issues/25250. Using the method recommended by oleg-alexandrov.

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2025-02-03 17:32:25 +03:00
Maksim Shabunin
b310233ea4
CI: unified Linux pipeline (#26607) 2025-02-03 10:11:07 +03:00
Alexander Smorkalov
3356b36d72
Merge pull request #26863 from shyama7004:minor-change
minor change
2025-02-03 08:37:57 +03:00
Alexander Smorkalov
0e747e592b
Merge pull request #26866 from opencv:revert-26857-patch-1
Revert "Update OpenCVFindWebP.cmake with sturkmen72's suggestion"
2025-02-03 08:25:17 +03:00
Alexander Smorkalov
4e2f0471bd
Revert "Update OpenCVFindWebP.cmake with sturkmen72's suggestion" 2025-02-01 09:27:43 +03:00
Alexander Smorkalov
43cebe52eb
Merge pull request #26857 from hmaarrfk:patch-1
Update OpenCVFindWebP.cmake with sturkmen72's suggestion
2025-02-01 09:16:57 +03:00
shyama7004
0cfc2e8fd8 minor change 2025-01-31 21:12:36 +05:30
Suleyman TURKMEN
fbd2105067
Merge pull request #26762 from sturkmen72:avif_cmake
Fixed AVIF linkage on Windows #26762 

Closes #26747

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-01-31 16:41:08 +03:00
天音あめ
13b2caffe0
Merge pull request #26789 from amane-ame:minmax_hal_rvv
Add RISC-V HAL implementation for minMaxIdx #26789

On the RISC-V platform, `minMaxIdx` cannot benefit from Universal Intrinsics because the UI-optimized `minMaxIdx` only supports `CV_SIMD128` (and does not accept `CV_SIMD_SCALABLE` for RVV).

1d701d1690/modules/core/src/minmax.cpp (L209-L214)

This patch implements `minMaxIdx` function in RVV_HAL using native intrinsic, optimizing the performance for all data types with one channel.

Tested on MUSE-PI for both gcc 14.2 and clang 20.0.

```
$ opencv_test_core --gtest_filter="*MinMaxLoc*"
$ opencv_perf_core --gtest_filter="*minMaxLoc*"
```
<img width="1122" alt="Untitled" src="https://github.com/user-attachments/assets/6a246852-87af-42c5-a50b-c349c2765f3f" />

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-01-31 14:26:49 +03:00
Vincent Rabaud
c21d0ad9d0
Merge pull request #26854 from vrabaud:png_leak
Fix oss-fuzz bugs 391934081 and 392318892 #26854

- fix a potential overflow in x0+w0
- use the proper function to deal with background color to deal with all cases of the spec
- use BGR layout for APNG background color

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-01-31 11:00:23 +03:00
Alexander Smorkalov
2f58f82e84
Merge pull request #26853 from horror-proton:rvv-fast-atan
Add RISC-V HAL implementation for fastAtan32f/fastAtan64f
2025-01-31 09:29:58 +03:00
Alexander Smorkalov
ea404df069
Merge pull request #26856 from mshabunin:fix-rvv-tests-2
RISC-V: increase DotProduct test threshold a bit
2025-01-31 09:29:15 +03:00
Mark Harfouche
2a0092cf75
Update OpenCVFindWebP.cmake with sturkmen72's suggestion
@sturkmen72  feel free to fold into https://github.com/opencv/opencv/pull/26762 but I would just like a dedicated patch to try.
2025-01-30 16:18:48 -05:00
Maksim Shabunin
f6c9ca5602 RISC-V: increase DotProduct test threshold a bit 2025-01-30 15:09:32 +03:00
Alexander Smorkalov
d5f69305cb
Merge pull request #26851 from sturkmen72:fix-22551
fix related the issue 22551
2025-01-29 18:06:22 +03:00
Skreg
bb798d15e1
Merge pull request #26831 from shyama7004:fix-denoising.cpp
Added 16-bit support to fastNlMeansDenoising and updated tests #26831

Fixes : #26582

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-01-29 15:45:40 +03:00
Horror Proton
86241653a7 Add RISC-V HAL implementation for cv::phase 2025-01-29 12:07:59 +08:00
Suleyman TURKMEN
585226a5fd fix for large tEXt chunk 2025-01-28 16:54:00 +03:00
Maxim Smolskiy
08a24ba2cf
Merge pull request #26846 from MaximSmolskiy:fix_bug_with_int64_support_for_FileStorage
Fix bug with int64 support for FileStorage #26846

### Pull Request Readiness Checklist

Fix #26829, https://github.com/opencv/opencv-python/issues/1078

In current implementation of `int64` support raw size of recorded integer is variable (`4` or `8` bytes depending on value). But then we iterate over nodes we need to know it exact value
dfad11aae7/modules/core/src/persistence.cpp (L2596-L2609)

Bug is that `rawSize` method still return `4` for any integer. I haven't figured out a way how to get variable raw size for integer in this method. I made raw size for integer is constant and equal to `8`.

Yes, after this patch memory consumption for integers will increase, but I don't know a better way to do it yet. At least this fixes bug and implementation becomes more correct

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2025-01-28 11:08:26 +03:00
Alexander Smorkalov
0049cde1f7
Merge pull request #26826 from devatbosch:4.x
Workaround solution for isuue #26818
2025-01-28 08:01:41 +03:00
Skreg
e62ab4ff71
Merge pull request #26850 from shyama7004:update-headers
Update includes in filter.hpp #26850

Fixes :
```
identifier "Mat" is undefinedC/C++(20)
namespace "std" has no member "vector"C/C++(135)
```

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-01-28 07:26:40 +03:00
Kumataro
c840e24e94
Merge pull request #26844 from Kumataro:fix26843
imgcodecs: jpegxl: imdecode() directly read from memory #26844

Close #26843 

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-01-27 17:18:28 +03:00
eplankin
ae57c54d83
Merge pull request #26463 from eplankin:icv_update_2022.0.0
Update IPP integration #26463

Please merge together with https://github.com/opencv/opencv_3rdparty/pull/88
Supported IPP version was updated to IPP 2022.0.0 for Linux and Windows. 32-bit binaries are dropped since this release.

Previous update: https://github.com/opencv/opencv/pull/25935
2025-01-27 17:02:36 +03:00
Alexander Smorkalov
f5c06f8b91
Merge pull request #26848 from vrabaud:png
Fix overlow pointers.
2025-01-27 16:54:10 +03:00
Alexander Smorkalov
4403e3bad8
Merge pull request #26847 from IHni3:4.x
Fix bug different marker ordering with findChessboardCornersSBWithMeta and CALIB_CB_LARGER flag
2025-01-27 16:10:55 +03:00
Alexander Smorkalov
33da8763a5
Merge pull request #26717 from s-trinh:add_border_type_doc_examples
Add examples for each `cv::BorderTypes` enum types in the documentation
2025-01-27 14:02:38 +03:00
Vincent Rabaud
c5f6ed6fef Fix overlow pointers.
`step` and `maskStep` are used to increase/decrease `pImage`.
But it's done on unsigned type, relying on overflow, which is UB.
(step is size_t but seed.y is int and can be negative, the result
is therefore unsigned which can overflow)
2025-01-27 11:55:10 +01:00
tho
9dde7790cf fix bug different marker ordering with findChessboardCornersSBWithMeta and CALIB_CB_LARGER flag 2025-01-27 11:10:26 +01:00
s-trinh
df5da4abcd
Merge pull request #26754 from s-trinh:add_bibtex_direct_pdf_links
Add direct pdf links in the bibliography #26754

Update and add pdf links in the bibliography.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-01-27 10:28:38 +03:00
Snehasish Basu
8a8e59c8fd
Update predefined_types.py
Updated predefined_types.py to keep changes only as suggested in 

https://github.com/opencv/opencv/pull/26826#pullrequestreview-2572608505

https://github.com/opencv/opencv/pull/26826#issuecomment-2613926475
2025-01-27 10:52:28 +05:30
Johnny
4b2a33a5c6
Merge pull request #26820 from johnnynunez:patch-1
Initial support Blackwell GPU arch #26820 
 
10.0 blackwell b100/b200
12.0 blackwell rtx50

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-01-25 09:51:27 +03:00
Alexander Smorkalov
6bffa64af4
Merge pull request #26841 from MaximSmolskiy:fix_data01.xml-file-for-example_cpp_logistic_regression
Fix data01.xml file for example_cpp_logistic_regression
2025-01-25 09:45:01 +03:00
Alexander Smorkalov
c637dd2646
Merge pull request #26828 from sturkmen72:imgcodecs_improvements
Imgcodecs minor improvements for better code readibility
2025-01-25 09:41:10 +03:00
Suleyman TURKMEN
d4eed1c5aa
Merge pull request #26835 from sturkmen72:patch-4
Corrections on bKGD chunk writing and reading in PNG #26835 

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-01-25 09:31:00 +03:00
Rüdiger Ihle
a2dd4ddbb2
Merge pull request #26837 from warped-rudi:zoom
Zoom functionality for Android native camera capture #26837

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-01-25 09:29:00 +03:00
MaximSmolskiy
8cb3ef177c Fix data01.xml file for example_cpp_logistic_regression 2025-01-25 00:53:52 +03:00
Suleyman TURKMEN
ca51d55ee3 minor improvement for better code readibility 2025-01-24 15:31:53 +03:00
Gou Minghao
9bb01e799f
Merge pull request #26669 from GouMinghao:4.x
solvePnPRansac implementation for Fisheye camera model #26669

Related: https://github.com/opencv/opencv/pull/25028

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2025-01-24 14:51:10 +03:00
Pierre Chatelier
3cbb4acd2d
Merge pull request #26836 from chacha21:thresholding_compute_threshold_only
Add cv::THRESH_DRYRUN flag to get adaptive threshold values without thresholding #26836

A first proposal for #26777

Adds a `cv::THRESH_DRYRUN` flag to let cv::threshold() compute the threshold (useful for OTSU/TRIANGLE), but without actually running the thresholding. This flags is a proposal instead of a new function cv::computeThreshold()

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [X] The PR is proposed to the proper branch
- [X] There is a reference to the original bug report and related work
- [X] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-01-24 14:25:21 +03:00
Kumataro
ab77e1cfc8
Merge pull request #26678 from Kumataro:fix26673
OpenEXR 2.2 or earlier cannot be used with C++17 or later #26678

Close https://github.com/opencv/opencv/issues/26673
Close https://github.com/opencv/opencv/issues/25313

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2025-01-24 14:18:29 +03:00
Alexander Smorkalov
9a77bef92b
Merge pull request #26832 from vrabaud:png
Move the checks to read_chunk.
2025-01-24 12:53:09 +03:00
Alexander Smorkalov
a3e95ec6d0
Merge pull request #26660 from NekoAsakura:4.x
Cocoa/highgui: replace with `@autoreleasepool` blocks
2025-01-24 11:34:27 +03:00
Vincent Rabaud
4e4eaea9a3 Move the checks to read_chunk.
Only user chunks need to be compared to PNG_USER_CHUNK_MALLOC_MAX
2025-01-23 16:37:46 +01:00
Alexander Smorkalov
4a4031dc48
Merge pull request #26601 from dai-xin:4.x
VideoCapture open camera slow
2025-01-22 20:48:29 +03:00
Rüdiger Ihle
c623a5afc1
Merge pull request #26646 from warped-rudi:refactoring
Android camera refactoring #26646

This patch set does not contain any functional changes. It just cleans up the code structure to improve readability and to prepare for future changes.

* videoio(Android): Use 'unique_ptr' instead of 'shared_ptr'
Using shared pointers for unshared data is considered an antipattern.
* videoio(Android): Make callback functions private static members
Don't leak internal functions into global namespace. Some member
variables are now private as well.
* videoio(Android): Move resolution matching into separate function
Also make internally used member functions private.
* videoio(Android): Move ranges query into separate function
Also remove some unneccessary initialisations from initCapture().
* videoio(Android): Wrap extremly long source code lines
* videoio(Android): Rename members of 'RangeValue'

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-01-22 16:58:14 +03:00
Vincent Rabaud
7728dd3387
Merge pull request #26782 from vrabaud:png_leak
Fix potential READ memory access #26782

This fixes https://oss-fuzz.com/testcase-detail/4923671881252864 and https://oss-fuzz.com/testcase-detail/5048650127966208

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-01-22 14:47:28 +03:00
Skreg
f6aa472acc
Merge pull request #26800 from shyama7004:fix-cap-orientation-auto-default
Fixed default cap_prop_orientation_auto behaviour #26800

Fixes : #26795

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-01-22 13:55:48 +03:00
Skreg
055dbbb848
Merge pull request #26815 from shyama7004:fix-deprecation
Replaced sprintf with snprintf #26815

Fixes : #26814

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-01-22 13:53:59 +03:00
Maxim Smolskiy
8ab0ad6e1b
Merge pull request #26810 from MaximSmolskiy:improve-robustness-for-fitEllipseAMS
Improve robustness for fitEllipseAMS #26810

### Pull Request Readiness Checklist

Related to #26694 

Added functionality to add noise to points in degenerate cases and try again for `fitEllipseAMS`. `fitEllipseNoDirect` and `fitEllipseDirect` already have this

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2025-01-22 12:49:12 +03:00
Kumataro
ea023b72ce
Merge pull request #26788 from Kumataro:fix26767
jpegxl: support cv::IMREAD_UNCHANGED and other ImreadFlags #26788

Close https://github.com/opencv/opencv/issues/26767

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2025-01-22 10:50:43 +03:00
Suleyman TURKMEN
db962ea069
Merge pull request #26813 from sturkmen72:fix_animation
Added CV_WRAP to Animation struct #26813

closes #26808
### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-01-22 10:40:08 +03:00
Alexander Smorkalov
459bb12466
Merge pull request #26778 from vidipsingh:doc-fix-fontscale-behavior-puttext
Added fontScale behavior description to putText() documentation
2025-01-21 11:21:42 +03:00
Skreg
fe9405e8c0
Merge pull request #26806 from shyama7004:fix-typo
* fix a small typo

* removal of unused variable
2025-01-20 17:14:27 +03:00
Maxim Smolskiy
a2a3f5e86c
Merge pull request #26773 from MaximSmolskiy:improve-robustness-for-ellipse-fitting
Improve robustness for ellipse fitting #26773

### Pull Request Readiness Checklist

Related to #26694 

Current noise addition is not very good because for example it turns degenerate case of one horizontal line into degenerate case of two parallel horizontal lines

Improving noise addition leads to improved robustness of algorithms

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2025-01-20 14:25:40 +03:00
Alexander Smorkalov
6f24d755f2
Merge pull request #26798 from brad0:opencv_powerpc_elf_aux_info
Add CMake checks for getauxval and elf_aux_info for POWER
2025-01-20 13:33:46 +03:00
Alexander Smorkalov
2db6b29a76
Merge pull request #26787 from MaximSmolskiy:fix_memory_leaks_for_JpegXLDecoder
Fix memory leaks for JpegXLDecoder
2025-01-20 13:33:10 +03:00
Alexander Smorkalov
5949cb10ee
Merge pull request #26793 from UnnamedOrange:4.x
Fix an is-empty condition in FFmpeg video capture when parsing FFmpeg options defined in the environment variables
2025-01-20 11:30:42 +03:00
Kumataro
3e1fafefbe
Merge pull request #26802 from Kumataro:fix26801
3rdparty:ittnotify: update to v3.25.4 #26802

Close https://github.com/opencv/opencv/issues/26801
See https://github.com/opencv/opencv/pull/26797

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-01-20 10:54:13 +03:00
Alexander Smorkalov
133fda3c56
Merge pull request #26803 from brad0:opencl_openbsd
OpenCL: OpenBSD build fix
2025-01-20 09:31:46 +03:00
Alexander Smorkalov
16cbdcf582
Merge pull request #26805 from MaximSmolskiy:fix-typo-in-matchTemplate-description
Fix typo in matchTemplate description
2025-01-20 09:28:12 +03:00
Brad Smith
918196ec1b Add CMake checks for getauxval and elf_aux_info for POWER
- Change __unix__ check for feature detection. NetBSD does not
have either API.
- Adds support for OpenBSD/powerpc64.
2025-01-19 13:27:20 -05:00
MaximSmolskiy
500e1ff763 Fix typo in matchTemplate description 2025-01-19 17:31:03 +03:00
Brad Smith
93023e1a68 OpenCL: OpenBSD build fix 2025-01-19 02:46:25 -05:00
Maksim Shabunin
3effe195cb
Merge pull request #26786 from mshabunin/fix-ppc64-gcc15
core: fixed VSX build with GCC 15
2025-01-18 16:13:28 +03:00
UnnamedOrange
8482caf348 Fix an is-empty condition in FFmpeg video capture 2025-01-18 17:01:22 +08:00
MaximSmolskiy
b7e1cba660 Fix memory leaks for JpegXLDecoder 2025-01-17 00:39:32 +03:00
Maksim Shabunin
63ef786a3a core: fixed VSX build with GCC 15 2025-01-16 23:48:29 +03:00
Neko Asakura
eff12685c5 Cocoa/highgui: replace with @autoreleasepool blocks and clean up extraneous comments 2025-01-16 11:40:41 +08:00
Vidip Singh
6ba8f4838b Added fontScale behavior description to putText() documentation
- Updated the documentation of the putText function to clarify the behavior of the fontScale parameter.
- Explained how fontScale affects text rendering: magnifying (>1), minimizing (<1), and mirroring (<0).
2025-01-15 19:17:29 +05:30
Alexander Smorkalov
1d701d1690
Merge pull request #26776 from vrabaud:ub_warp
Don't overflow pointer addition
2025-01-15 15:19:19 +03:00
Vincent Rabaud
e76924ef0d Don't overflow pointer addition
In both cases we add negative value (as unsigned type), so
pointer addition wraps, which is undefined behavior.
2025-01-15 11:07:43 +01:00
Alexander Smorkalov
796adf5dc6
Merge pull request #26769 from y-guyon:patch-1
Avoid adding value to nullptr
2025-01-14 19:06:52 +03:00
Alexander Smorkalov
1a6ef7e08c
Merge pull request #26765 from asmorkalov:as/android_vulkan_build_fix
Fixed Android build with Vulkan support.
2025-01-14 16:23:22 +03:00
Yannis Guyon
b62ab874d1
Avoid adding value to nullptr
This UB can be avoided by postponing calculation until needed.
2025-01-14 10:50:53 +01:00
Alexander Smorkalov
534243647e Fixed Android build with Vulkan support. 2025-01-13 21:13:22 +03:00
Alexander Smorkalov
342ced1e04
Merge pull request #26763 from vrabaud:remove_c
Remove useless C headers
2025-01-13 20:23:10 +03:00
Vincent Rabaud
bfb54aa691 Remove useless C headers 2025-01-13 16:34:28 +01:00
Alexander Smorkalov
6931a4cc06
Merge pull request #26744 from Diego1V:fixHoughSIGSEGV
Fix #26086 - Update types inside HoughLinesProbabilistic
2025-01-13 13:05:21 +03:00
Skreg
08a88816ed
Merge pull request #26753 from shyama7004:RotatedMarkers
Fix rotated aruco marker board generation #26753

### Issue : [25884](https://github.com/opencv/opencv/issues/25884)
### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-01-13 10:51:03 +03:00
Alexander Smorkalov
d6f60d4ab8
Merge pull request #26757 from shyama7004:test-fix
fix threshold for photo_calibratedebevec regression test
2025-01-13 10:25:18 +03:00
Diego1V
052b2c43c3 Update types inside HoughLinesProbabilistic in order to handle great images. 2025-01-13 09:36:44 +03:00
shyama7004
5b7b887200 Photo_CalibrateDebevec.regression-fix 2025-01-12 19:45:22 +05:30
Maksym Ivashechkin
e29a70c17f
Merge pull request #26742 from ivashmak:fix_homography_inliers
Bug fix for #25546 - Updating inliers for homography estimation #26742

Fixes #25546

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2025-01-11 18:08:58 +03:00
Super
2c2866a7a6
Merge pull request #26738 from redhecker:fix
Fix bugs in GIF decoding #26738 

### Pull Request Readiness Checklist

this is related to #25691 

i solved two bugs here:

1. the decoding setting:
according to [https://www.w3.org/Graphics/GIF/spec-gif89a.txt](https://www.w3.org/Graphics/GIF/spec-gif89a.txt)

```
    DEFERRED CLEAR CODE IN LZW COMPRESSION

    There has been confusion about where clear codes can be found in the
    data stream.  As the specification says, they may appear at anytime.  There
    is not a requirement to send a clear code when the string table is full.

    It is the encoder's decision as to when the table should be cleared.  When
    the table is full, the encoder can chose to use the table as is, making no
    changes to it until the encoder chooses to clear it.  The encoder during
    this time sends out codes that are of the maximum Code Size.

    As we can see from the above, when the decoder's table is full, it must
    not change the table until a clear code is received.  The Code Size is that
    of the maximum Code Size.  Processing other than this is done normally.

    Because of a large base of decoders that do not handle the decompression in
    this manner, we ask developers of GIF encoding software to NOT implement
    this feature until at least January 1991 and later if they see that their
    particular market is not ready for it.  This will give developers of GIF
    decoding software time to implement this feature and to get it into the
    hands of their clients before the decoders start "breaking" on the new
    GIF's.  It is not required that encoders change their software to take
    advantage of the deferred clear code, but it is for decoders.
```
at first i didn't consider this case, thus leads to a bug discussed in #25691. the changes made in function lzwDecode() is aiming at solving this.

2. the fetch method of loopCount:
in the codes at https://github.com/opencv/opencv/blob/4.x/modules/imgcodecs/src/grfmt_gif.cpp#L410, if the branch is taken, 3 more bytes will be taken, leading to unpredictable behavior.

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-01-11 10:34:49 +03:00
Alexander Smorkalov
1e31f8047d
Merge pull request #26748 from vrabaud:png_leak
Fix remaining bugs in PNG reader
2025-01-11 10:21:38 +03:00
Alexander Smorkalov
bb79493a89
Merge pull request #26750 from mshabunin:fix-ppc64-vsx
core: fixed VSX intrinsics implementation
2025-01-11 09:40:21 +03:00
Vincent Rabaud
ee86f1c969 Fix remaining bugs in PNG reader
- free chunk before a potential longjmp
- do not try to allocate when the chunk is > PNG_USER_CHUNK_MALLOC_MAX
2025-01-10 17:04:39 +01:00
Maksim Shabunin
97f3f39066 core: fixed VSX intrinsics implementation 2025-01-10 18:34:11 +03:00
Skreg
f00814e38d
Merge pull request #26602 from shyama7004:minor-fix
Improved dumpVector, cv::Rect operator<< and exceptions #26602

- Applied format for vector element formatting to ensure consistent and clear output representation.  
- Moved `operator<<` to the `cv` namespace to align with OpenCV's coding standards and improve maintainability.  
- Enhanced error handling by including detailed exception messages using `e.what()` for better debugging.  

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-01-10 15:02:18 +03:00
Junrou Nishida
85f9ac4e23
Merge pull request #26713 from homuler:fix/build-ios-framework
Ensure Obj-C header files are generated correctly if under /private/var #26713

Fix #26712 

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-01-10 14:48:56 +03:00
Alexander Smorkalov
68187de4ad
Merge pull request #26741 from shyama7004:minor-update
added POST_BUILD to add_custom_command in python_loader.cmake to avoid warning
2025-01-10 13:39:58 +03:00
Vincent Rabaud
d12fa37eed
Merge pull request #26739 from vrabaud:png_leak
Add more boundary checks. #26739

Also fix a bug in read_chunk where we could end up with png_get_uint_32(len) + 12 < 4

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-01-10 11:33:43 +03:00
shyama7004
05bc484eed addition of POST_BUILD 2025-01-09 20:40:56 +05:30
Alexander Smorkalov
bdb6a968ce
Merge pull request #26706 from Kumataro:fix26705
imgcodecs: fix EXR tests
2025-01-09 15:46:52 +03:00
Alexander Smorkalov
0da8c760d3
Merge pull request #26710 from Kumataro:fix26709
core: validate OPENCV_ALGO_HINT_DEFAULT option
2025-01-09 15:44:19 +03:00
Alexander Smorkalov
7e7c75e239
Merge pull request #26737 from shyama7004:minor-change
minor change
2025-01-09 15:14:28 +03:00
shyama7004
938f89a20e minor change 2025-01-08 20:17:17 +05:30
Alexander Smorkalov
d744296bbd Merge branch 'as/release_4.11.0' into 4.x 2025-01-08 17:25:28 +03:00
Alexander Smorkalov
31b0eeea0b Release 4.11.0 2025-01-08 15:47:46 +03:00
Alexander Smorkalov
d2704548b4
Merge pull request #26734 from asmorkalov:as/png_corrupted
Fixed fread size check for corrupted PNGs
2025-01-08 15:47:19 +03:00
Alexander Smorkalov
198f23890e Fixed fread size check for corrupted PNGs. 2025-01-08 14:23:43 +03:00
Alexander Smorkalov
66ffeae4b1
Merge pull request #26728 from vrabaud:png_behavior
Fix behavior change when PNG buffer is incomplete.
2025-01-08 13:23:02 +03:00
Alexander Smorkalov
38b86591ba
Merge pull request #26729 from MaximSmolskiy:change-article-for-fitEllipseDirect-function
Change article for fitEllipseDirect function
2025-01-08 12:14:45 +03:00
Vincent Rabaud
cb959b3915 Fix behavior change when PNG buffer is incomplete. 2025-01-08 09:11:38 +01:00
Alexander Smorkalov
e34eff9ab2
Merge pull request #26721 from MaximSmolskiy:fix-comment-for-fitEllipse-Java-case-accurracy-test
Fix comment for fitEllipse Java case accurracy test
2025-01-08 11:09:32 +03:00
Alexander Smorkalov
0dfd2b3628
Merge pull request #26719 from MaximSmolskiy:remove-code-duplication-from-tests-for-ellipse-fitting
Remove code duplication from tests for ellipse fitting
2025-01-08 11:07:55 +03:00
Alexander Smorkalov
4b35101d55
Merge pull request #26720 from vrabaud:png_leak
Use RAII to avoid leaks in PNG reader.
2025-01-08 10:57:28 +03:00
Alexander Smorkalov
d5087a2bd6
Merge pull request #26726 from vrabaud:png_comment
Remove extra /* in /**/ comment
2025-01-08 10:56:59 +03:00
MaximSmolskiy
0331af01ae Change article for fitEllipseDirect function 2025-01-07 22:24:48 +03:00
Vincent Rabaud
0e3d71b0e0 Remove extra /* in /**/ comment 2025-01-07 11:49:30 +01:00
MaximSmolskiy
9b85ab0a63 Fix comment for fitEllipse Java case accurracy test 2025-01-06 19:14:57 +03:00
Vincent Rabaud
d86387347d Use RAII to avoid leaks in PNG reader. 2025-01-06 16:42:30 +01:00
MaximSmolskiy
56dd9d51b1 Remove code duplication from tests for ellipse fitting 2025-01-06 17:13:32 +03:00
Masahiro Ogawa
fc994a6ae8
Merge pull request #21407 from sensyn-robotics:feature/weighted_hough
Feature: weighted Hough Transform #21407

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or other license that is incompatible with OpenCV
- [x] The PR is proposed to proper branch
- [x] There is reference to original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2025-01-06 15:35:35 +03:00
Suleyman TURKMEN
2aee94752a
Merge pull request #26714 from sturkmen72:png
Fix for png durations and memory leak #26714

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-01-06 14:25:08 +03:00
Alexander Smorkalov
904dbe9555
Merge pull request #26716 from MaximSmolskiy:fix-tests-for-ellipse-fitting
Fix tests for ellipse fitting
2025-01-06 14:07:29 +03:00
Alexander Smorkalov
3d7eb55f75
Merge pull request #26715 from asmorkalov:as/png_leak
Fixed some memory leaks in PNG/APNG implementation
2025-01-06 12:00:17 +03:00
Alexander Smorkalov
ad36f68500 Fixed some memory leaks in PNG/APNG implementation. 2025-01-06 10:41:33 +03:00
Souriya Trinh
5000ec50db Add examples for each cv::BorderTypes enum types to better illustrate the result of each method in the documentation. 2025-01-06 03:57:35 +01:00
MaximSmolskiy
3e534bb7c8 Fix tests for ellipse fitting 2025-01-06 01:27:06 +03:00
Alexander Smorkalov
ff18c9cc79
Merge pull request #26688 from sturkmen72:gif-png-webp-avif
Animated GIF APNG WEBP AVIF revisions
2025-01-04 16:44:14 +03:00
Rüdiger Ihle
a6f72f813d
Merge pull request #26698 from warped-rudi:mediandk2
AndroidMediaNdkVideoWriter pixel format enhancement #26698

* videoio(Android): Add source pixel formats RGBA and GRAY to AndroidMediaNdkVideoWriter

Let AndroidMediaNdkVideoWriter::write() deduce source pixel format from matrix type:

CV_8UC3 -> BGR   (as before)
CV_8UC4 -> RGBA  (use in conjunction with CvCameraViewFrame)
CV_8UC1 -> GRAY

* samples/android/video-recorder: Send images to VideoWriter in RGBA format

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-01-03 17:53:00 +03:00
Alexander Smorkalov
f65006eee1
Merge pull request #26699 from vrabaud:bmp_overflow
Fix integer overflow in in cv::BmpDecoder::readHeader
2025-01-03 14:43:23 +03:00
Suleyman TURKMEN
b4d0325666 GIF APNG WEBP AVIF revisions 2025-01-03 14:29:18 +03:00
Vincent Rabaud
0538e64b13
Fix leaks in cv:Merge pull request #26701 from vrabaud:png_leak
Fix leaks in cv::PngDecoder #26701

Bug: oss-fuzz:386688709

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2025-01-03 14:13:18 +03:00
Kumataro
b9505ac861 core: validate OPENCV_ALGO_HINT_DEFAULT option 2025-01-03 19:22:57 +09:00
Alexander Smorkalov
9e8b9a0ebb
Merge pull request #26695 from albertoZurini:py_pose_coordinates
fix: cast coordinates to int32 for compatibility with line function
2025-01-03 11:43:30 +03:00
Vincent Rabaud
845616d82c Fix integer overflow in in cv::BmpDecoder::readHeader
Bug: oss-fuzz:371546812
2025-01-03 08:59:43 +01:00
Alexander Smorkalov
5e1eed5026
Merge pull request #26700 from vrabaud:png_buffer_overflow
Fix heap buffer overflow in cv::PngDecoder::read_from_io
2025-01-03 10:39:23 +03:00
Alexander Smorkalov
ffd0548651
Merge pull request #26704 from vrabaud:imgcodecs_flaky
Fix flaky Imgcodecs_APNG.imwriteanimation_bgcolor
2025-01-03 10:25:06 +03:00
Kumataro
1281317e17 imgcodecs: fix EXR tests 2025-01-03 09:53:01 +09:00
Vincent Rabaud
2f0035b23f Fix flaky Imgcodecs_APNG.imwriteanimation_bgcolor 2025-01-02 22:53:06 +01:00
Vincent Rabaud
12963ea699 Fix heap buffer overflow in cv::PngDecoder::read_from_io
Bug: oss-fuzz:386688710
2025-01-02 14:51:20 +01:00
Alberto Zurini
f2878eb337 fix: cast coordinates to int32 for compatibility with line function 2025-01-01 21:13:40 +01:00
Alexander Smorkalov
4d26e16af8
Merge pull request #26690 from MaximSmolskiy:speed-up-and-reduce-memory-consumption-for-findContours
Speed up and reduce memory consumption for findContours
2024-12-31 12:31:11 +03:00
cDc
1db982780f
Merge pull request #26379 from cdcseacave:jxl_codec
Add jxl (JPEG XL) codec support #26379

### Pull Request Readiness Checklist

Related CI and Docker changes:
- https://github.com/opencv/ci-gha-workflow/pull/190
- https://github.com/opencv-infrastructure/opencv-gha-dockerfile/pull/44

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work https://github.com/opencv/opencv/issues/20178
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2024-12-31 11:56:35 +03:00
Alexander Smorkalov
c803aa2ddd
Merge pull request #26057 from asmorkalov:as/android_16k_pages
Android builds update #26057

Fixes https://github.com/opencv/opencv/issues/26027
Should also address https://github.com/opencv/opencv/issues/26542
 
Changes:
- Switched to Android build tools 34, NDK 26d, target API level 34 (required by Google Play).
- Use flexible page size on Android by default to support Android 15+.
- Dummy stub for R and BuildConfig classes for javadoc.
- Java 17 everywhere.
- Strict ndkVersion and ABI list in release package.

Related:
- Docker: https://github.com/opencv-infrastructure/opencv-gha-dockerfile/pull/41
- Pipeline: https://github.com/opencv/ci-gha-workflow/pull/183

Related IPP issue with NDK 27+: https://github.com/opencv/opencv/issues/26072

Google documentation for 16kb pages support : https://developer.android.com/guide/practices/page-sizes?hl=en

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2024-12-31 11:53:04 +03:00
Alexander Smorkalov
269ff8cd83
Merge pull request #26691 from asmorkalov:as/unstable_vkcom
Tune threshold to stabinlize test with Vulkan backend.
2024-12-31 10:35:44 +03:00
Alexander Smorkalov
d2264d5868 Tune threshold to stabinlize test with Vulkan backend. 2024-12-31 10:23:13 +03:00
MaximSmolskiy
f15fa21c6b Speed up and reduce memory consumption for findContours 2024-12-31 02:49:15 +03:00
Suleyman TURKMEN
8bc65a1d13
Merge pull request #25715 from sturkmen72:apng_support
Animated PNG Support #25715

Continues https://github.com/opencv/opencv/pull/25608

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2024-12-30 11:32:31 +03:00
Rüdiger Ihle
d39aae6bdf
Merge pull request #26656 from warped-rudi:mediandk
AndroidMediaNdkCapture pixel format enhancement #26656

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ x] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2024-12-30 11:09:11 +03:00
Alexander Smorkalov
9c33baebbd
Merge pull request #26675 from hanliutong:rvv-hal-fix
Add test cases and fix bugs in the RISC-V Vector HAL.
2024-12-29 18:09:21 +03:00
Alexander Alekhin
1b48eafe48 Merge pull request #26672 from opencv-pushbot:gitee/alalek/update_ffmpeg_4.x 2024-12-29 01:42:29 +00:00
Liutong HAN
b31f7694c5 Add test cases and fix bugs in the RVV HAL. 2024-12-27 08:39:52 +00:00
Alexander Smorkalov
94bccbecc0
Merge pull request #26635 from FantasqueX:remove-no-long-long-1
Remove useless -Wno-long-long option
2024-12-27 10:01:53 +03:00
Alexander Smorkalov
aeb7a9b383
Merge pull request #26671 from asmorkalov:as/fastcv_cmake_fix
Sevral fixes for FastCV handling.
2024-12-26 18:36:08 +03:00
Alexander Smorkalov
707ab39454
Merge pull request #26164 from CSBVision:patch-7
Update haveCUDA() to detect CUDA support at runtime
2024-12-26 15:56:03 +03:00
Alexander Alekhin
c64fe91ff4 ffmpeg/4.x: update FFmpeg wrapper 2024.12 2024-12-26 12:30:48 +00:00
Alexander Alekhin
4c7ea70051 videoio(test): re-enable FFmpeg tests on WIN32
- related PR25874
2024-12-26 12:29:45 +00:00
Alexander Alekhin
09892c9d17 fix FFmpeg wrapper build 2024-12-26 12:15:46 +00:00
Dmitry Kurtaev
e9982e856f
Merge pull request #25584 from dkurt:videocapture_from_buffer
Open VideoCapture from data stream #25584

### Pull Request Readiness Checklist

Add VideoCapture option to read a raw binary video data from `std::streambuf`.

There are multiple motivations:
1. Avoid disk file creation in case of video already in memory (received by network or from database).
2. Streaming mode. Frames decoding starts during sequential file transfer by chunks.

Suppoted backends:
* FFmpeg
* MSMF (no streaming mode)

Supporter interfaces:
* C++ (std::streambuf)
* Python (io.BufferedIOBase)

resolves https://github.com/opencv/opencv/issues/24400

- [x] test h264
- [x]  test IP camera like approach with no metadata but key frame only?
- [x] C API plugin

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2024-12-26 12:48:49 +03:00
Alexander Smorkalov
745a12c03b Sevral fixes for FastCV handling. 2024-12-26 09:54:09 +03:00
Alexander Smorkalov
96d6395a6d
Merge pull request #26668 from WangWeiLin-MV:gstreamer/add-include-chrono
Add include chrono gstreamersource.cpp
2024-12-24 18:44:49 +03:00
WangWeiLin-MV
fe649f4adb Add include chrono gstreamersource.cpp 2024-12-24 06:53:30 +00:00
Alexander Smorkalov
f106866d3e
Merge pull request #26666 from mshabunin:fix-rvv-tests
RISC-V: enabled intrinsics in dotProd, relaxed test thresholds
2024-12-24 08:59:25 +03:00
Maksim Shabunin
0756dbfe3d RISC-V: enabled intrinsics in dotProd, relaxed test thresholds 2024-12-24 00:58:54 +03:00
Alexander Smorkalov
b42075f3e2
Merge pull request #26664 from asmorkalov:update_version_4.11.0-pre
pre: OpenCV 4.11.0 (version++)
2024-12-23 15:34:39 +03:00
Alexander Smorkalov
1399672a83
Merge pull request #26663 from mshabunin:cleanup-dnn-ie-test
dnn: remove obsolete OV models tests
2024-12-23 14:32:06 +03:00
Alexander Smorkalov
a2ce9e1bac pre: OpenCV 4.11.0 (version++) 2024-12-23 13:58:08 +03:00
Maksim Shabunin
ec2208f5f7 dnn: remove obsolete OV models tests 2024-12-23 12:59:33 +03:00
FantasqueX
4efd52f676
Merge pull request #26650 from FantasqueX:fix-26642
Use size_t when calculating size of all_points #26650

Closes: #26642 

Asan log
```
=================================================================
==41401==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7fc55a02a3fc at pc 0x7fc58e304131 bp 0x7ffd54787b00 sp 0x7ffd54787af8
WRITE of size 4 at 0x7fc55a02a3fc thread T0
    #0 0x7fc58e304130 in cv::QRDetectMulti::checkSets(std::vector<std::vector<cv::Point_<float>, std::allocator<cv::Point_<float> > >, std::allocator<std::vector<cv::Point_<float>, std::allocator<cv::Point_<float> > > > >&, std::vector<std::vector<cv::Point_<float>, std::allocator<cv::Point_<float> > >, std::allocator<std::vector<cv::Point_<float>, std::allocator<cv::Point_<float> > > > >&, std::vector<cv::Point_<float>, std::allocator<cv::Point_<float> > >&) /home/fanta/source/opencv/modules/objdetect/src/qrcode.cpp:3726
    #1 0x7fc58e3054b0 in cv::QRDetectMulti::localization() /home/fanta/source/opencv/modules/objdetect/src/qrcode.cpp:3829
    #2 0x7fc58e308020 in cv::ImplContour::detectMulti(cv::_InputArray const&, cv::_OutputArray const&) const /home/fanta/source/opencv/modules/objdetect/src/qrcode.cpp:3987
    #3 0x7fc58e30b5b1 in cv::ImplContour::detectAndDecodeMulti(cv::_InputArray const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&, cv::_OutputArray const&, cv::_OutputArray const&) const /home/fanta/source/opencv/modules/objdetect/src/qrcode.cpp:4176
    #4 0x7fc58e28922f in cv::GraphicalCodeDetector::detectAndDecodeMulti(cv::_InputArray const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&, cv::_OutputArray const&, cv::_OutputArray const&) const /home/fanta/source/opencv/modules/objdetect/src/graphical_code_detector.cpp:42
    #5 0x5954e8 in Body /home/fanta/source/opencv/modules/objdetect/test/test_qrcode.cpp:48
    #6 0x594fc0 in TestBody /home/fanta/source/opencv/modules/objdetect/test/test_qrcode.cpp:42
    #7 0x67ee6a in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/fanta/source/opencv/modules/ts/src/ts_gtest.cpp:3919
    #8 0x6734a4 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/fanta/source/opencv/modules/ts/src/ts_gtest.cpp:3955
    #9 0x641fe8 in testing::Test::Run() /home/fanta/source/opencv/modules/ts/src/ts_gtest.cpp:3993
    #10 0x6431ac in testing::TestInfo::Run() /home/fanta/source/opencv/modules/ts/src/ts_gtest.cpp:4169
    #11 0x643d15 in testing::TestCase::Run() /home/fanta/source/opencv/modules/ts/src/ts_gtest.cpp:4287
    #12 0x659ff3 in testing::internal::UnitTestImpl::RunAllTests() /home/fanta/source/opencv/modules/ts/src/ts_gtest.cpp:6662
    #13 0x681205 in bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /home/fanta/source/opencv/modules/ts/src/ts_gtest.cpp:3919
    #14 0x675127 in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /home/fanta/source/opencv/modules/ts/src/ts_gtest.cpp:3955
    #15 0x65734c in testing::UnitTest::Run() /home/fanta/source/opencv/modules/ts/src/ts_gtest.cpp:6271
    #16 0x5907f0 in RUN_ALL_TESTS() /home/fanta/source/opencv/modules/ts/include/opencv2/ts/ts_gtest.h:22240
    #17 0x590cdd in main (/home/fanta/source/opencv-build-4.x-clang/bin/opencv_test_objdetect+0x590cdd) (BuildId: a9363fc788d57c48225fc0559ac9199d07d415db)
    #18 0x7fc58ab242ad in __libc_start_call_main (/lib64/libc.so.6+0x2a2ad) (BuildId: 03f1631dc9760d3e30311fe62e15cc4baaa89db7)
    #19 0x7fc58ab24378 in __libc_start_main@@GLIBC_2.34 (/lib64/libc.so.6+0x2a378) (BuildId: 03f1631dc9760d3e30311fe62e15cc4baaa89db7)
    #20 0x417014 in _start ../sysdeps/x86_64/start.S:115

0x7fc55a02a3fc is located 0 bytes after 2938510332-byte region [0x7fc4aadc8800,0x7fc55a02a3fc)
allocated by thread T0 here:
    #0 0x7fc58e590298 in operator new(unsigned long) (/lib64/libasan.so.8+0xfd298) (BuildId: da72ee674d801ced58193987786b90646d94ff8d)
    #1 0x7fc58e34d010 in std::__new_allocator<cv::Vec<int, 3> >::allocate(unsigned long, void const*) /usr/include/c++/14/bits/new_allocator.h:151

SUMMARY: AddressSanitizer: heap-buffer-overflow /home/fanta/source/opencv/modules/objdetect/src/qrcode.cpp:3726 in cv::QRDetectMulti::checkSets(std::vector<std::vector<cv::Point_<float>, std::allocator<cv::Point_<float> > >, std::allocator<std::vector<cv::Point_<float>, std::allocator<cv::Point_<float> > > > >&, std::vector<std::vector<cv::Point_<float>, std::allocator<cv::Point_<float> > >, std::allocator<std::vector<cv::Point_<float>, std::allocator<cv::Point_<float> > > > >&, std::vector<cv::Point_<float>, std::allocator<cv::Point_<float> > >&)

Shadow bytes around the buggy address:
  0x7fc55a02a100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x7fc55a02a180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x7fc55a02a200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x7fc55a02a280: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x7fc55a02a300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x7fc55a02a380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00[04]
  0x7fc55a02a400: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x7fc55a02a480: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x7fc55a02a500: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x7fc55a02a580: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x7fc55a02a600: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==41401==ABORTING
```

`(true_points_group[i].size()` is 1794 and `(true_points_group[i].size() - 2 ) * (true_points_group[i].size() - 1) * true_points_group[i].size())` is 5764222464 which overflows `int`

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2024-12-23 11:36:39 +03:00
alexlyulkov
aa52dafc90
Merge pull request #26127 from alexlyulkov:al/blob-from-images
Faster implementation of blobFromImages for cpu nchw output #26127

Faster implementation of blobFromImage and blobFromImages for
HWC cv::Mat images -> NCHW cv::Mat
case

Running time on my pc in ms:

**blobFromImage**
```
image size            old        new   speed-up
32x32x3             0.008      0.002       4.0x
64x64x3             0.021      0.009       2.3x
128x128x3           0.164      0.037       4.4x
256x256x3           0.728      0.158       4.6x
512x512x3           3.310      0.628       5.2x
1024x1024x3        14.503      3.124       4.6x
2048x2048x3        61.647     28.049       2.2x
```

**blobFromImages**
```
image size            old        new   speed-up
16x32x32x3          0.122      0.041       3.0x
16x64x64x3          0.790      0.165       4.8x
16x128x128x3        3.313      0.652       5.1x
16x256x256x3       13.495      3.127       4.3x
16x512x512x3       58.795     28.127       2.1x
16x1024x1024x3    251.135    121.955       2.1x
16x2048x2048x3   1023.570    487.188       2.1x
```


### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2024-12-23 10:04:34 +03:00
Suleyman TURKMEN
d9a139f9e8
Merge pull request #25608 from sturkmen72:animated_webp_support
Animated WebP Support #25608

related issues #24855 #22569 

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2024-12-20 13:06:28 +03:00
alex-urm
0903061589
Merge pull request #25500 from alex-urm:v4l_default_image_size
V4l default image size #25500

Added ability to set default image width and height for V4L capture.  This is required for cameras that does not support 640x480 resolution because otherwise V4L capture cannot be opened and failed with "Pixel format of incoming image is unsupported by OpenCV" and then with "can't open camera by index" message. Because of the videoio architecture it is not possible to insert actions between CvCaptureCAM_V4L::CvCaptureCAM_V4L and CvCaptureCAM_V4L::open so the only way I found is to use environment variables to preselect the resolution.

Related bug report is [#25499](https://github.com/opencv/opencv/issues/25499)
Maybe (but not confirmed) this is also related to [#24551](https://github.com/opencv/opencv/issues/24551)

This fix was made and verified in my local environment: capture board AVMATRIX VC42, Ubuntu 20, NVidia Jetson Orin.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [X] I agree to contribute to the project under Apache 2 License.
- [X] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [X] The PR is proposed to the proper branch
- [X] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2024-12-20 11:00:30 +03:00
Alexander Smorkalov
f3d9d56ebe
Merge pull request #26625 from NekoAsakura:4.x
Cocoa/highgui: fix leak in cvGetWindowRect_COCOA
2024-12-20 09:03:33 +03:00
Alexander Smorkalov
8ffc4a6bd5
Merge pull request #26652 from mshabunin:fix-ffmpeg-plugin
videoio: fixed writer setProperty with FFmpeg plugin
2024-12-20 08:31:13 +03:00
Maksim Shabunin
b53fa94745 videoio: fixed writer setProperty with FFmpeg plugin 2024-12-19 22:04:24 +03:00
Alexander Smorkalov
6a0affdbce
Merge pull request #26147 from vrabaud:opencv_js
js: fix enum generation issues
2024-12-19 17:35:16 +03:00
Alexander Smorkalov
3073ba28cc
Merge pull request #26644 from vrabaud:opencv_js2
js: Fix C preprocessor stringification
2024-12-19 17:05:50 +03:00
Alexander Smorkalov
5baca5275e
Merge pull request #26633 from asmorkalov:as/optional_python_types
Made some pre-defined Python types optional to disable modules
2024-12-19 15:10:08 +03:00
quic-apreetam
d037b40faa
Merge pull request #26621 from CodeLinaro:apreetam_2ndPost
FastCV-based HAL for OpenCV acceleration 2ndpost-3 #26621

### Detailed description:

- Add cv_hal_canny for Canny API

Requires binary from [opencv/opencv_3rdparty#90](https://github.com/opencv/opencv_3rdparty/pull/90) 
Depends on: [opencv/opencv#26617](https://github.com/opencv/opencv/pull/26617)
Depends on: [opencv/opencv#26619](https://github.com/opencv/opencv/pull/26619) 

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2024-12-19 13:31:26 +03:00
Suleyman TURKMEN
60d35d1bd5
Merge pull request #26511 from sturkmen72:proposed_fix_for_21902
* add alternative flags to cv::seamlessClone

* Update photo.hpp

* Update seamless_cloning.cpp

* Update seamless_cloning_impl.cpp
2024-12-19 11:57:58 +03:00
Alexander Smorkalov
cdad0b7027
Merge pull request #26082 from mshabunin:fix-hal-cvt-functions
imgproc: restore multiplanar conversion functions in cv::hal namespace
2024-12-19 11:56:04 +03:00
Alexander Smorkalov
ebf3c400d2
Merge pull request #26387 from sturkmen72:js-imgproc
Add some functions to OpenCV JS API
2024-12-19 09:45:23 +03:00
adsha-quic
59f762b2f0
Merge pull request #26619 from CodeLinaro:adsha_2ndPost
FastCV-based HAL for OpenCV acceleration 2ndpost-2 #26619

### Detailed description:

- Add support for multiply 8u, 16s and 32f
- Add support for cv_hal_pyrdown 8u
- Add support for cv_hal_cvtBGRtoHSV and cv_hal_cvtBGRtoYUVApprox 8u

Requires binary from [opencv/opencv_3rdparty#90](https://github.com/opencv/opencv_3rdparty/pull/90)
Depends on: [opencv/opencv#26617](https://github.com/opencv/opencv/pull/26617)

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2024-12-19 08:28:24 +03:00
Alexander Smorkalov
537a2566cf
Merge pull request #26643 from vrabaud:js_clone_fix
js: Rename Mat::clone binding because it is used in Emscripten.
2024-12-19 08:20:07 +03:00
Maxim Smolskiy
9f64f021de
Merge pull request #26637 from MaximSmolskiy:fix-VideoCapture-fails-to-read-single-image-with-digits-in-name
Fix VideoCapture fails to read single image with digits in name #26637

### Pull Request Readiness Checklist

Fix #26457 

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2024-12-19 08:17:05 +03:00
Alexander Smorkalov
e747ed11cb
Merge pull request #26645 from FantasqueX:fix-typo-3
fix typo
2024-12-19 08:13:07 +03:00
Alexander Smorkalov
5cd448377a
Merge pull request #26638 from vrabaud:opencv_js1
js: add types included in bound APIs
2024-12-19 08:11:54 +03:00
Vincent Rabaud
79d019b4f1
Merge pull request #26640 from vrabaud:opencv_js3
js: fix generation of "const const" in code #26640

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2024-12-19 07:59:01 +03:00
Letu Ren
428d93114f fix typo 2024-12-19 10:59:17 +08:00
Vincent Rabaud
914a83fa0c Fix enum generation issues. 2024-12-18 22:20:05 +01:00
Vincent Rabaud
a628417f2a Fix C preprocessor stringification 2024-12-18 22:17:08 +01:00
Vincent Rabaud
773bd1a90a Rename Mat::clone binding because it is used in Emscripten.
This is in emscripten 3.1.71 and above, cf
https://github.com/emscripten-core/emscripten/pull/22734
There was a temptative fix upstream to no avail:
https://github.com/emscripten-core/emscripten/pull/23132
2024-12-18 21:52:21 +01:00
Liutong HAN
3fbaad36d7
Merge pull request #26624 from hanliutong:rvv-mean
Add RISC-V HAL implementation for meanStdDev #26624

`meanStdDev` benefits from the Universal Intrinsic backend of RVV, but we also found that the performance on the `8UC4` type is worse than the scalar version when there is a mask, and there is no optimization implementation on `32FC1`.

This patch implements `meanStdDev` function in RVV_HAL using native intrinsic, significantly optimizing the performance for `8UC1`, `8UC4` and `32FC1`.

This patch is tested on BPI-F3 for both gcc 14.2 and clang 19.1.
```
$ opencv_test_core --gtest_filter="*MeanStdDev*"
$ opencv_perf_core --gtest_filter="Size_MatType_meanStdDev*
```

![1734077611879](https://github.com/user-attachments/assets/71c85c9d-1db1-470d-81d1-bf546e27ad86)

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2024-12-18 22:19:02 +03:00
Alexander Smorkalov
7c0c9e1e55
Merge pull request #26639 from vrabaud:opencv_js2
js: fix helper.js to not trigger warnings
2024-12-18 18:49:25 +03:00
Vincent Rabaud
874e57512e js: fix helper.js to not trigger warnings 2024-12-18 11:49:08 +01:00
Vincent Rabaud
1fe9dd0c3b js: add types included in bound APIs
This fixes #25239
2024-12-18 11:43:39 +01:00
Rüdiger Ihle
d369cf6d50
Merge pull request #26627 from warped-rudi:torch
Android camera feature enhancements #26627

Closes https://github.com/opencv/opencv/issues/24687

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2024-12-18 12:16:50 +03:00
quic-xuezha
1c28a98b34
Merge pull request #26617 from CodeLinaro:xuezha_2ndPost
FastCV-based HAL for OpenCV acceleration 2ndpost-1 #26617

### Detailed description:

- Add parallel support for cv_hal_sobel
- Add cv_hal_gaussianBlurBinomial and parallel support.
- Add cv_hal_addWeighted8u and parallel support
- Add cv_hal_warpPerspective and parallel support

Requires binary from [opencv/opencv_3rdparty#90](https://github.com/opencv/opencv_3rdparty/pull/90)
Related patch to opencv_contrib: [opencv/opencv_contrib#3844](https://github.com/opencv/opencv_contrib/pull/3844)

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2024-12-18 09:34:13 +03:00
Alexander Smorkalov
23f6a9ee3e
Merge pull request #26636 from FantasqueX:fix-re-warning-2
Fix Syntax warning in ts summary.py
2024-12-18 08:54:58 +03:00
Letu Ren
3899a060a3 Fix Syntax warning in ts summary.py 2024-12-18 05:27:10 +08:00
Alexander Smorkalov
7ddc02907e
Merge pull request #26634 from FantasqueX:fix-test-exif-1
Fix test_exif compilation when none of JPEG, PNG, AVIF is enabled
2024-12-17 22:36:42 +03:00
Pierre Chatelier
d77abeddd0
Merge pull request #26472 from chacha21:gpumatnd_step
More convenient GpuMatND constructor #26472

Closes #26471

For convenience, GpuMatND can now accept a step.size() equal to size.size(), as long as the last step is equal to elemSize()

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [X] The PR is proposed to the proper branch
- [X] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2024-12-17 17:26:14 +03:00
Letu Ren
59b9681af6 Remove useless -Wno-long-long option
According to GCC doc, -Wlong-long: Warn if long long type is used.
This is enabled by either -Wpedantic or -Wtraditional in ISO C90
and C++98 modes. To inhibit the warning messages, use -Wno-long-long.

OpenCV 4.x requires C++11. As result, this option is useless.

Ref: https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html
2024-12-17 21:51:35 +08:00
Letu Ren
2e21e11318 Fix test_exif compilation when non of JPEG, PNG, AVIF is enabled
When none of JPEG, PNG, AVIF is enabled, exif_files is a zero-length
array, which is prohibited by C++ reference.
2024-12-17 21:41:45 +08:00
Vincent Rabaud
e0001903ce
Merge pull request #26490 from vrabaud:4x_calibration_base
Switch calibration.cpp to C++ #26490

The CvLevMarq code has to be kept in order to keep the same accuracy (the C++ solver is not as good).

There are two ways to review this PR: by comparing to the old code, or by checking what is different from the 5.x version (which is the first commit).

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2024-12-17 16:36:14 +03:00
Alexander Smorkalov
a8f4019932 Made some pre-defined Python types optional to disable modules 2024-12-17 15:54:06 +03:00
Neko Asakura
dbb330d7be Cocoa/highgui: fix leak in cvGetWindowRect_COCOA 2024-12-17 22:31:37 +10:00
Alexander Smorkalov
0ca98d437b
Merge pull request #26632 from fengyuentau:dnn/gelu_cann
dnn: Fix CANN build
2024-12-17 15:09:33 +03:00
Yuantao Feng
51ec7fedaf fix build 2024-12-17 10:17:15 +00:00
Kumataro
260f511dfb
Merge pull request #26590 from Kumataro:fix26589
Support C++20 standard #26590

Close https://github.com/opencv/opencv/issues/26589
Related https://github.com/opencv/opencv_contrib/pull/3842
Related: https://github.com/opencv/opencv/issues/20269

- do not arithmetic enums and ( different enums or floating numeric) 
- remove unused variable

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2024-12-17 07:40:27 +03:00
Alexander Smorkalov
1a1b1901e8
Merge pull request #26623 from asmorkalov:as/kelidicv_0.3
Update KleidiCV to version 0.3
2024-12-16 16:58:51 +03:00
Alexander Smorkalov
a4c8d318e6 Update KleidiCV to version 0.3. 2024-12-16 15:32:44 +03:00
Alexander Smorkalov
03f90aaf85
Merge pull request #26614 from KangJialiang:fix-multi-channel-mean-scale-sample-dnn-yolo
Fix normalization parameters in YOLO example to support multi-channel mean and scale factors
2024-12-16 15:29:43 +03:00
Alexander Smorkalov
71581d9c97
Merge pull request #26618 from KangJialiang:fix/yoloPostProcessing-variable-nc
Fix yoloPostProcessing to handle variable number of classes (nc)
2024-12-14 20:59:45 +03:00
KangJialiang
25fe85bbbb Fix yoloPostProcessing` to handle variable number of classes (nc)
Previously, the yoloPostProcessing function assumed that the number of classes (nc) was fixed at 80. This caused incorrect behavior when a different number of classes was specified, leading to mismatched output shapes.

This update modifies the code to use the provided `nc` value dynamically, ensuring that the output shapes are correctly calculated based on the specified number of classes. This prevents issues when `nc` is not equal to 80 and allows for greater flexibility in model configurations.
2024-12-12 15:41:14 +08:00
KangJialiang
42be822c1d Fix normalization parameters in YOLO example to support multi-channel mean and scale factors
This branch and commit address an issue in the YOLO example (samples/dnn/yolo_detector.cpp) where the mean and scale parameters only affected the first channel (B) due to single-value input. The modification updates these parameters to accept multi-channel values, ensuring consistent preprocessing across all image channels.
2024-12-11 20:16:21 +08:00
Maksim Shabunin
1d4110884b
Revert "CI: enable AVX2 build" (#26610) 2024-12-10 17:23:13 +03:00
Alexander Smorkalov
dc8a9d5d3d
Merge pull request #26604 from mshabunin:add-avx2-build
CI: enable AVX2 build
2024-12-10 12:11:08 +03:00
Maksim Shabunin
5ef062343c CI: enable AVX2 build 2024-12-10 11:35:56 +03:00
anandkaranubc
d85c13bcbb
Merge pull request #26587 from anandkaranubc:fix-nu-svc-parameter-check
Fix #25812: Add error handling for invalid nu parameter in SVM NU_SVC #26587

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work [Issue #25812](https://github.com/opencv/opencv/issues/25812)
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2024-12-10 09:21:25 +03:00
Alexander Smorkalov
0dfc5d416f
Merge pull request #26598 from mshabunin:fix-doc-footer
doc: upgraded for compatibility with doxygen 1.12
2024-12-10 09:19:02 +03:00
Maksim Shabunin
4ade7931e1 doc: upgraded for compatibility with doxygen 1.12 2024-12-09 23:08:18 +03:00
Alexander Smorkalov
0a2669daba
Merge pull request #26596 from y-guyon:4.x_absl_str
Support string_view in caffe_importer
2024-12-09 18:03:38 +03:00
MurtazaSaherwala
3a8d7ec75a
Merge pull request #26524 from MurtazaSaherwala:DocumentationUpdation
Updated trackbar callback function and improved documentation #26524

This Fixes #26467

Description:
This pull request improve the OpenCV documentation regarding the Trackbar functionality. The current documentation does not provide clear guidance on certain aspects, such as handling the value pointer deprecation and utilizing callback arguments in C. This update addresses those gaps and provides an updated example for better clarity.

Changes:
Updated Documentation:

Clarified the usage of the value pointer and explained how to pass an initial value, since the value pointer is deprecated.
Added more detailed explanations about callback arguments in C, ensuring that users understand how to access and use them in Trackbar callbacks.
Added a note on how to properly handle initial value passing without relying on the deprecated value pointer.
Updated Tutorial Example:

Renamed and used callback function parameters to make them more understandable.
Included a demonstration on how to utilize userdata in the callback function.
Additional Notes:

Removed reliance on the value pointer for updating trackbar values. Users are now encouraged to use other mechanisms as per the current implementation to avoid the runtime warning.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [] The feature is well documented and sample code can be built with the project CMake
2024-12-09 16:23:07 +03:00
dai-xin
f825b4d1ab Resolve the issue of slow camera opening when using dshow as the backend for VideoCapture. 2024-12-09 17:55:46 +08:00
Yannis Guyon
1db93911ae
Support string_view in caffe_importer
An upcoming change in Protobuf will change the return types of various
methods like Descriptor::name() and Message::GetTypeName() from const
std::string& or std::string to absl::string_view. This CL fixes users
of those methods to work both before and after the change.
2024-12-09 10:24:01 +01:00
Alexander Smorkalov
1f2e7adb4b
Merge pull request #26591 from shyama7004:fix-typos
Fix typo: renamed 'search_widow_size' to 'search_window_size'
2024-12-09 11:03:49 +03:00
shyama7004
acdb707ba4 Fix typo: rename 'search_widow_size' to 'search_window_size' 2024-12-08 13:41:48 +05:30
Super
082cd7a74e
Merge pull request #25691 from redhecker:gifSupport
[GSoC] Add GIF decode and encode for imgcodecs #25691

this is related to #24855 

we add  gif support for `imread`, `imreadmulti`, `imwrite` and `imwritemulti`

opencv_extra: https://github.com/opencv/opencv_extra/pull/1203

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2024-12-07 10:17:41 +03:00
Alexander Smorkalov
7fbf3c1fec
Merge pull request #26579 from FantasqueX:fix-re-warning-1
Fix python re warning
2024-12-06 19:24:05 +03:00
Letu Ren
ed9d64c9d3 Fix python re warning in gen_objc 2024-12-06 19:54:23 +08:00
Alexander Smorkalov
646e87c728
Merge pull request #26580 from opencv-pushbot:gitee/alalek/videoio_test_filter_unstable_gstreamer
videoio(test): filter unstable GStreamer tests
2024-12-06 14:37:14 +03:00
Alexander Alekhin
7edfb57f5a videoio(test): filter unstable GStreamer tests
- observed on Ubuntu 24.04
2024-12-06 08:12:36 +00:00
Skreg
3d91d75f1a
Merge pull request #26564 from shyama7004:improve-macos-install-docs
Improvement of macOS installation guide in documentation #26564

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2024-12-06 08:55:41 +03:00
Amir Hassan
23fcea0d33
Merge pull request #26563 from kallaballa:wayland_and_xkbcommon_missing_include_dirs
Missing include directories needed for wayland-util and xkbcommon #26563

See: #26561

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2024-12-04 17:20:15 +03:00
Alexander Smorkalov
03cedee0b0
Merge pull request #26547 from mshabunin:fix-type-cast
Fixed several cases of unaligned pointer cast
2024-12-03 11:08:45 +03:00
Alexander Smorkalov
8897002fcc
Merge pull request #26560 from savuor:rv/perf_box_5x5
Perf tests for cv::boxFilter(): 5x5 added
2024-12-03 07:40:35 +03:00
Rostislav Vasilikhin
31d04f8fd9 5x5 added for boxfilter perf tests 2024-12-02 16:46:37 +01:00
Maksim Shabunin
c58b6bf11f Fixed several cases of unaligned pointer cast 2024-12-02 16:15:23 +03:00
Alexander Smorkalov
89c19f1f1a
Merge pull request #26557 from mshabunin/fix-doc-1.12
doc: fixed issue with doxygen 1.12
2024-12-02 13:44:42 +03:00
Maksim Shabunin
f4db63ca71 doc: fixed issue with doxygen 1.12 2024-12-02 12:08:59 +03:00
Alexander Smorkalov
5f1b05af0e
Merge pull request #26556 from asmorkalov:FastcvHAL_1stPost
Added Fastcv HAL changes in the 3rdparty folder.
Code Changes includes HAL code , Fastcv libs and Headers

Change-Id: I2f0ddb1f57515c82ae86ba8c2a82965b1a9626ec

Requires binaries from https://github.com/opencv/opencv_3rdparty/pull/86.
Related patch to opencv_contrib: https://github.com/opencv/opencv_contrib/pull/3811

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2024-12-02 10:50:38 +03:00
Suleyman TURKMEN
6eaa77461e add some functions and tests
applyColorMap
approxPolyN
arrowedLine
blendLinear
boxPoints
clipLine
convertMaps
createHanningWindow
divSpectrums
drawMarker
findContoursLinkRuns
fitEllipseAMS
fitEllipseDirect
getFontScaleFromHeight
getRectSubPix
HuMoments
intersectConvexConvex
invertAffineTransform
minEnclosingTriangle
preCornerDetect
rotatedRectangleIntersection
sqrBoxFilter
spatialGradient
stackBlur
2024-12-01 23:17:35 +03:00
Alexander Smorkalov
96dab6ba71
Merge pull request #26532 from mshabunin:fix-qr-bitstream
objdetect: fix invalid vector access in QR de/encoder
2024-11-29 17:55:13 +03:00
Maksim Shabunin
e953fcfaa4 objdetect: fix invalid vector access in QR encoder 2024-11-29 14:40:53 +03:00
Alexander Smorkalov
bef3585245
Merge pull request #26513 from sturkmen72:fix_for_26264
Fix for issue 26264
2024-11-29 14:26:29 +03:00
Philip Lamb
a5f8711ce1
Merge pull request #26537 from artoolkitx:emscripten-build-fixes
Emscripten build fixes #26537

- Corrects typo in Emscripten-only intrinsics header (Fixes https://github.com/opencv/opencv/issues/26536)
- Updates deprecated intrinsic title (as per LLVM final intrinsic name).

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2024-11-28 09:25:01 +03:00
Alexander Smorkalov
a27d749471
Merge pull request #26516 from asmorkalov:as/pano_component_wrap
Document some stitching methods and enable bindings for them.
2024-11-28 08:19:55 +03:00
Alexander Smorkalov
fb0a40ded4
Merge pull request #26544 from savuor:rv/perf_rotate_8uc2
Perf test for rotate(): CV_8UC2 added
2024-11-28 08:18:08 +03:00
Rostislav Vasilikhin
bf914a7681 8uc2 added 2024-11-28 01:59:18 +01:00
Alexander Smorkalov
c8c64f69dd
Merge pull request #26533 from mshabunin:fix-reduce-test
test: fix technical issue with min_element in reduce tests
2024-11-27 09:08:01 +03:00
Alexander Smorkalov
68941ef8e7
Merge pull request #26530 from mshabunin:fix-usac-vector-access
calib3d: fix vector access in USAC
2024-11-26 22:15:05 +03:00
Maksim Shabunin
55b4c2ac59 test: fix technical issue with min_element in reduce tests 2024-11-26 21:55:41 +03:00
Alexander Smorkalov
fb422a62d2
Merge pull request #26529 from asmorkalov:as/include_chrono
Fixed missing include chrono in g-api tests
2024-11-26 19:11:41 +03:00
Maksim Shabunin
82c45dde5b calib3d: fix vector access in USAC 2024-11-26 16:15:51 +03:00
Alexander Smorkalov
0b01712dd3 Fixed missing include chrono in g-api tests. 2024-11-26 15:22:57 +03:00
Suleyman TURKMEN
b385767c1c Update drawing.cpp and test_contours.cpp 2024-11-25 20:35:20 +03:00
Alexander Smorkalov
65d4112fa5
Merge pull request #26512 from sturkmen72:fix_build_js_warnings
Fix for build_js warnings
2024-11-25 16:05:53 +03:00
Alexander Smorkalov
905cc45f85 Document some stitching methods and enable bindings for them. 2024-11-25 14:03:49 +03:00
Alexander Smorkalov
7095cb6904
Merge pull request #26510 from Kumataro:fix26509
doc: fix to supported depth for TIFF
2024-11-25 10:09:52 +03:00
Kumataro
5080be6669 doc: fix to supported depth for TIFF 2024-11-24 15:03:58 +09:00
Suleyman TURKMEN
1358af180c fix build_js warnings 2024-11-24 04:24:44 +03:00
Alexander Smorkalov
7be5181bff
Merge pull request #26501 from asmorkalov:as/external_kleidicv
Fixed KLEIDICV_SOURCE_PATH handling for external KleidiCV
2024-11-21 18:45:14 +03:00
Alexander Smorkalov
c3ca3f4f00 Fixed KLEIDICV_SOURCE_PATH handling for extenral KleidiCV. 2024-11-21 11:50:25 +03:00
Rostislav Vasilikhin
64d3111377
Merge pull request #26459 from savuor:rv/hal_absdiff_scalar
HAL added for absdiff(array, scalar) + related fixes #26459

### This PR changes
* HAL for `absdiff` when one of arguments is a scalar, including multichannel arrays and scalars
* several channels support for HAL `addScalar`
* proper data type check for `addScalar` when one of arguments is a scalar

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2024-11-19 10:35:49 +03:00
Alexander Smorkalov
64273c8a5b
Merge pull request #26461 from vrabaud:4x_calibration_base
Remove internal calib3d_c_api.h
2024-11-19 09:45:10 +03:00
Alexander Smorkalov
1b0d58a554
Merge pull request #26468 from savuor:rv/warp_perspective_test_border
warpPerspective test: borderType argument fixed
2024-11-18 11:41:18 +03:00
Alexander Smorkalov
474028ea87
Merge pull request #26478 from xkszltl:exr_ver
Check existence of OpenEXR version macros before using.
2024-11-18 11:03:34 +03:00
xkszltl
d0c8b36de8
Check existence of OpenEXR version macros before using.
It is introduced in 2.0.1 (not even in 2.0.0) and some old system like CentOS 7 still has 1.7 in stock.
- 60cdff8a6f (diff-c4bae0726aebe410e407db9abd406d9cf2684f82dd8a08f46d84e8b7c35cf22aR67)
2024-11-17 18:37:13 -08:00
Rostislav Vasilikhin
21cb138be8 warpPerspective border type test 2024-11-15 19:28:16 +01:00
Vincent Rabaud
8c6339c04d Remove internal calib3d_c_api.h
The new C++ code is copy/pasted from OpenCV5:
- functions initIntrinsicParams2D, subMatrix (the first 160 lines)
- function prepareDistCoeffs
- the different asserts

Not all the API/code is ported to C++ yet to ease the review.
2024-11-15 09:31:30 +01:00
Alexander Smorkalov
4866811933
Merge pull request #26155 from mshabunin:dnn-dispatch
dnn: use dispatching for Winograd optimizations
2024-11-14 21:28:42 +03:00
Alexander Smorkalov
3dace76c3f
Merge pull request #26462 from mshabunin:cleanup-flann-hdf5
flann: remove unused hdf5 header
2024-11-14 21:21:27 +03:00
Maksim Shabunin
b7e609d5e8 flann: remove unused hdf5 header 2024-11-14 19:44:10 +03:00
Alexander Smorkalov
e5a8e2ac79
Merge pull request #26460 from asmorkalov:as/core_c_removal_packport
Backport some of C API removal in core module implementation.
2024-11-14 18:55:53 +03:00
Alexander Smorkalov
1ff16cb551 Backport some of C API removal in core module implementation. 2024-11-14 11:24:00 +03:00
Alexander Smorkalov
e1d66643b3
Merge pull request #26303 from asmorkalov:as/kleidicv_offline
Skip KleidiCV in offline build
2024-11-13 20:05:13 +03:00
Alexander Smorkalov
11a4a06fa4
Merge pull request #26181 from sturkmen72:png_exif_test
Enable PNG exif orientation test
2024-11-13 16:57:18 +03:00
Alexander Smorkalov
1a775198ce Skip KleidiCV in offline build. 2024-11-13 15:13:19 +03:00
Rostislav Vasilikhin
67f07b16cb
Merge pull request #25624 from savuor:rv/hal_addscalar
HAL added for add(array, scalar) #25624

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2024-11-13 08:33:19 +03:00
Alexander Smorkalov
8fe70a1877
Merge pull request #26452 from BillyONeal:add-windows-sdk-note
Add note for people debugging DirectML detection failures to check their Windows SDK version.
2024-11-13 08:27:57 +03:00
Billy Robert O'Neal III
5f95827a5f Add note for people debugging DirectML detection failures to check their Windows SDK version.
DirectML was first included with 10.0.18362.0, but dxcore.lib necessary to make the check pass was first in 10.0.19041.0.
2024-11-12 11:57:39 -08:00
Alexander Smorkalov
ff639d11d4
Merge pull request #26451 from savuor:rv/fix_get_handle
Build fix for opencl_core.cpp
2024-11-12 20:50:48 +03:00
Rostislav Vasilikhin
641f43dd48 build fix 2024-11-12 17:04:42 +01:00
Dmitry Kurtaev
c230841105
Merge pull request #26446 from dkurt:file_storage_empty_and_1d_mat
* Change style of empty and 1d Mat in FileStorage

* Remove misleading 1d Mat test
2024-11-12 17:51:10 +03:00
Dmitry Kurtaev
37c2af63f0
Merge pull request #26434 from dkurt:dk/int64_file_storage_4.x
int64 data type support for FileStorage. 1d and empty Mat with exact dimensions #26434

### Pull Request Readiness Checklist

Port of https://github.com/opencv/opencv/pull/26399 to 4.x branch

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2024-11-11 14:13:33 +03:00
Vincent Rabaud
6f8c3b13d8
Merge pull request #26437 from vrabaud:4x_calibration_base
Backport C++ stereo/stereo_geom.cpp:5.x to calib3d/stereo_geom.cpp:4.x #26437

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2024-11-11 10:22:56 +03:00
Alexander Smorkalov
3fddea2ade
Merge pull request #26435 from vrabaud:4x_calibration_base
Remove unused internal C functions
2024-11-08 14:43:30 +03:00
Vincent Rabaud
3d89824423 Remove unused internal C functions 2024-11-08 10:27:02 +01:00
Vincent Rabaud
6873bdee70
backport C++ 3d/calibration_base.cpp:5.x to calib3d/calibration_base.cpp:4.x (#26414)
* Add vanilla calibration_base from 5.x

This is from 55105719dd

* Have the C implementation use the new C++ one.
2024-11-08 11:56:49 +03:00
Maksim Shabunin
9d64e2959f dnn: use dispatcher for Winograd 2024-11-07 10:51:16 +03:00
Alexander Smorkalov
5817b562b3
Merge pull request #26364 from plctlab:rvp_pt2
3rdparty: NDSRVP - Part 2.1: Filter-Related Functions
2024-11-05 18:53:00 +03:00
Alexander Smorkalov
c287423b33
Merge pull request #26331 from mshabunin:fix-unified-getenv
build: made environment access a separate feature
2024-11-05 11:03:11 +03:00
Alexander Smorkalov
c3747a6847
Merge pull request #26402 from asmorkalov:as/win_uwp_ci
Added Universal Windows Package build to CI.
2024-11-02 13:05:05 +03:00
Alexander Smorkalov
9b635da563 Added Universal Windows Package build to CI. 2024-11-02 12:20:13 +03:00
Alexander Smorkalov
ee95bfe244
Merge pull request #26203 from FantasqueX:generic-simd-warpAffineBlocklineNN
Use generic SIMD in warpAffineBlocklineNN
2024-11-01 11:16:51 +03:00
Alexander Smorkalov
ddc03c0769
Merge pull request #26390 from asmorkalov:as/kleidicv_no_sme2
Disable SME2 branches in KleidiCV as it's incompatible with some CLang versions, e.g. NDK 28b1
2024-10-31 14:09:24 +03:00
Alexander Smorkalov
cf87380fad Disable SME2 branches in KleidiCV as it's incompatible with some CLang versions, e.g. NDK 28b1. 2024-10-31 08:14:30 +03:00
Alexander Smorkalov
725ce48837
Merge pull request #26388 from vrabaud:4_8u
Fix test typo.
2024-10-31 07:58:24 +03:00
Maksim Shabunin
04818d6dd5 build: made environment access a separate feature 2024-10-30 18:37:22 +03:00
Vincent Rabaud
265a2c39b2 Fix test typo. 2024-10-30 15:05:30 +01:00
Alexander Smorkalov
2756c20e3e
Merge pull request #26384 from mshabunin:fix-winrt-warnings-2
WinRT/UWP build: fix more warnings in media part
2024-10-30 16:04:32 +03:00
Maksim Shabunin
7654d06b83 WinRT/UWP build: fix more warnings in media part 2024-10-29 19:19:09 +03:00
Alexander Smorkalov
41489f983d
Merge pull request #26381 from dkurt:dk/hotfix_dnn_debug
Hotfix ie_ngraph.cpp in Debug
2024-10-29 12:33:08 +03:00
Dmitry Kurtaev
0e80a97f87
Hotfix ie_ngraph.cpp in Debug 2024-10-29 10:20:51 +03:00
Oちゃん
8791cd147c
Merge pull request #26374 from OrkWard:fix-js-build-script
Fix incorrect string format in js build script #26374

I accidentally met this small problem mentioned in https://github.com/opencv/opencv/pull/25084#discussion_r1710838120 when play with wasm build. It seems https://github.com/EDVTAZ didn't fix it yet, so I create this tiny pr.

Additionally, I remove a redundant argument in `add_argument` call. `'store_true'` already set the default, see https://docs.python.org/3/library/argparse.html#action.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2024-10-28 17:07:15 +03:00
Junyan721113
bf7ab8eebd feat: medianBlur & bilateralFilter 2024-10-28 17:54:45 +08:00
Alexander Smorkalov
dd08328228
Merge pull request #26368 from hanliutong:rvv-hal-license
Add the missing license header in hal_rvv.
2024-10-26 09:36:07 +03:00
Alexander Smorkalov
24a497acd8
Merge pull request #26370 from mshabunin:fix-winrt-warnings
WinRT/UWP build: fix some specific warnings
2024-10-26 09:34:00 +03:00
Maksim Shabunin
52100328d8 WinRT/UWP build: fix some specific warnings 2024-10-25 22:32:44 +03:00
Maksim Shabunin
d6fe289a79 imgproc: restore multiplanar conversion functions in cv::hal namespace 2024-10-25 20:21:33 +03:00
Liutong HAN
515b4a2689 Add the missing license description. 2024-10-25 11:37:07 +00:00
Alexander Smorkalov
e4bcd46f64
Merge pull request #26356 from hardikkamboj:4.x
Update py_thresholding.markdown
2024-10-24 12:39:43 +03:00
Liutong HAN
35571be570
Merge pull request #26318 from hanliutong:rvv-intrin-m2
Use LMUL=2 in the RISC-V Vector (RVV) backend of Universal Intrinsic. #26318

The modification of this patch involves the RVV backend of Universal Intrinsic, replacing `LMUL=1` with `LMUL=2`.

Now each Universal Intrinsic type actually corresponds to two RVV vector registers, and each Intrinsic function also operates two vector registers. Considering that algorithms written using Universal Intrinsic usually do not use the maximum number of registers, this can help the RVV backend utilize more register resources without modifying the algorithm implementation

This patch is generally beneficial in performance.

We compiled OpenCV with `Clang-19.1.1` and `GCC-14.2.0` , ran it on `CanMV-k230` and `Banana-Pi F3`. Then we have four scenarios on combinations of compilers and devices. In `opencv_perf_core`, there are 3363 cases, of which:
- 901 (26.8%) cases achieved more than `5%` performance improvement in all four scenarios, and the average speedup of these test cases (compared to scalar) increased from `3.35x` to `4.35x`
- 75 (2.2%) cases had more than `5%` performance loss in all four scenarios, indicating that these cases are better with `LMUL=1` instead of `LMUL=2`. This involves `Mat_Transform`, `hasNonZero`, `KMeans`, `meanStdDev`, `merge` and `norm2`. Among them, `Mat_Transform` only has performance degradation in a few cases (`8UC3`), and the actual execution time of `hasNonZero` is so short that it can be ignored. For `KMeans`, `meanStdDev`, `merge` and `norm2`, we should be able to use the HAL to optimize/restore their performance. (In fact, we have already done this for `merge`  #26216 )

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2024-10-24 10:08:43 +03:00
Alexander Smorkalov
331412dfad
Merge pull request #26357 from dkurt:dkurt/ov_out_names_from_graph
OpenVINO friendly output names from non-compiled Model
2024-10-23 13:42:01 +03:00
Dmitry Kurtaev
d193554a5f OpenVINO friendly output names from non-compiled Model 2024-10-23 09:29:05 +03:00
Alexander Smorkalov
898a2a3811
Merge pull request #26353 from asmorkalov:as/ade_1.2e
ADE update to 0.1.2e
2024-10-23 08:10:16 +03:00
Hardik Kamboj
9fc7ca8ed1
Update py_thresholding.markdown
Changed "If the pixel value is smaller than the threshold" to "If the pixel value is smaller than or equal to the threshold" to make the line align with the working of the code.
2024-10-23 09:49:23 +05:30
Alexander Smorkalov
983086411f ADE update to 0.1.2e 2024-10-22 17:45:00 +03:00
Alexander Smorkalov
57ccbee25d
Merge pull request #26245 from cudawarped:cuda_update_to_npp_stream_ctx
cuda - update npp calls to use the new NppStreamContext API if available
2024-10-22 14:44:42 +03:00
Kumataro
4398e0b62b
Merge pull request #26340 from Kumataro:wa26339
doc: fix the position of toggle button #26340 

Close #26339 

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2024-10-22 11:57:14 +03:00
Alexander Smorkalov
94d5ad09ff
Merge pull request #26284 from fzuuzf:enum_arithmetic_fixes_for_c++26
C++26 Deprecated Enum Arithmetic Conversion: Fix core/mat.inl.hpp
2024-10-21 15:47:53 +03:00
Alexander Smorkalov
e026a5ad8a
Merge pull request #26281 from kallaballa:clgl_device_discovery
Rewrote OpenCL-OpenGL-interop device discovery routine without extensions and with Apple support
2024-10-18 15:52:17 +03:00
Alexander Smorkalov
c79b72a838
Merge pull request #26335 from migueldaipre:4.x
fix: performance typo
2024-10-18 15:44:32 +03:00
Kumataro
35dbf32227
Merge pull request #26211 from Kumataro:fix26207
imgcodecs: implement imencodemulti() #26211

Close #26207
### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2024-10-18 14:44:55 +03:00
Miguel Daipré
888469a842
fix: performance typo 2024-10-18 08:37:32 -03:00
Septimiu Neaga
3919f33e21
Merge pull request #26293 from SeptimiuIoachimNeagaIntel:EISW-140103_optimization_flag
G-API: Introduce level optimization flag for ONNXRT backend #26293

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2024-10-17 10:22:08 +03:00
FantasqueX
489df18a13
Merge pull request #26313 from FantasqueX:ipp-warp-affine-border-value
Use border value in ipp version of warp affine #26313

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2024-10-17 08:50:30 +03:00
Alexander Smorkalov
d20c456ab7
Merge pull request #26320 from mshabunin:fix-cmake-in-list
build: set cmake policy for if(IN_LIST) support
2024-10-17 07:36:02 +03:00
Maksim Shabunin
8ba76e65e9 build: set cmake policy for if(IN_LIST) support 2024-10-16 22:40:47 +03:00
Suleyman TURKMEN
8e5dbc03fe
Merge pull request #26298 from sturkmen72:avif
Proposed solution for the issue 26297 #26298

closes #26297

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2024-10-14 11:23:02 +03:00
Alexander Smorkalov
1909ac8650
Merge pull request #26212 from jamacias:feature/TickMeter-lasttime
Enhance cv::TickMeter to be able to get the last elapsed time
2024-10-14 07:56:24 +03:00
Letu Ren
45b9398d68 Use generic SIMD in warpAffineBlocklineNN 2024-10-14 01:28:41 +08:00
Zach Lowry
08f7f13dfa
Merge pull request #26234 from zachlowry:apply-gcc6-fix-on-each-directory
Move the gcc6 compatibility check to occur on a per-directory basis, … #26234

Proposed fix for #26233 https://github.com/opencv/opencv/issues/26233

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2024-10-11 17:00:59 +03:00
kallaballa
3edcf410b6 more guarding 2024-10-11 02:18:14 +02:00
Alexander Smorkalov
0f234209da
Merge pull request #26278 from Quantizs:feature-create-face-recognizer-from-buffer
Added buffer-based model loading to FaceRecognizerSF
2024-10-10 17:17:00 +03:00
kallaballa
4cbb96b396 use new instead of malloc and guard it 2024-10-10 15:14:58 +02:00
kallaballa
50f6d54f87 renaming 2024-10-10 14:48:49 +02:00
Wanli
687e37e6a8
Merge pull request #25892 from WanliZhong:v_sincos
Add support for v_sin and v_cos (Sine and Cosine) #25892

This PR aims to implement `v_sincos(v_float16 x)`, `v_sincos(v_float32 x)` and `v_sincos(v_float64 x)`. 
Merged after https://github.com/opencv/opencv/pull/25891 and https://github.com/opencv/opencv/pull/26023

**NOTE:** 
Also, the patch changes already added `v_exp`, `v_log` and `v_erf` to pass parameters by reference instead of by value, to match API of other universal intrinsics.

TODO:
- [x] double and half float precision
- [x] tests for them
- [x] doc to explain the implementation

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2024-10-10 13:25:12 +03:00
Karsten Wiese
2a681bbb6b C++26 Deprecated Arithmetic Conversion: Fix core/mat.inl.hpp
Prefix enums with '+' to make clang c++26 add to them again.
2024-10-10 10:40:19 +02:00
kallaballa
63b5dee274 fixed bug: variable shadowing 2024-10-10 06:35:42 +02:00
kallaballa
8ba7389b21 properly size the devices array 2024-10-10 06:32:22 +02:00
kallaballa
885bbc643f renaming 2024-10-10 06:30:33 +02:00
kallaballa
dceeb47cd3 rewrote clgl device discovery 2024-10-10 00:02:56 +02:00
Alexander Smorkalov
69803e7b99
Merge pull request #26216 from hanliutong:rvv-hal-merge
Add the HAL implementation for the merge function on RISC-V Vector.
2024-10-09 17:07:57 +03:00
quantizs
e1b06371ad Added buffer-based model loading to FaceRecognizerSF
- Implemented a new `create` method in `FaceRecognizerSF` to allow model and configuration loading from memory buffers (std::vector<uchar>), similar to the existing functionality in `FaceDetectorYN`.
- Updated `face_recognize.cpp` with a new constructor in `FaceRecognizerSFImpl` that supports buffer-based loading for both model weights and network configuration.
- Ensured compatibility with both file-based and buffer-based model loading by maintaining consistent backend and target settings across both constructors.
- This change improves flexibility, allowing FaceRecognizerSF to be instantiated from memory buffers, which is useful for dynamic model loading scenarios such as embedded systems or applications where models are loaded in-memory.
2024-10-09 15:13:47 +02:00
Suleyman TURKMEN
e72efd0d32
Merge pull request #26260 from sturkmen72:upd_doc_4_x
Update Documentation #26260

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2024-10-09 09:09:51 +03:00
george
cefde84a76
Merge pull request #25909 from gblikas:patch-1
Update intrin_wasm.hpp #25909

See https://github.com/microsoft/vcpkg/issues/33443 for some build context when using 

```vcpkg install opencv4:wasm32-emscripten```

`__EMSCRIPTEN_major__`, `__EMSCRIPTEN_minor__` and `__EMSCRIPTEN_tiny__` in `emsdk` >= 3.1.4 are in a header, as opposed to command line. 

We could potentially be more aggressive with how I'm checking this property; let me know if I should make the change. 

It should also be suggested that `-msimd128` is auto-included in the associated portfile for opencv, but that's a separate issue. Someone let me know if I should also make that change as well. 

Special thanks to https://github.com/youar for supporting this work; please inform if applying a copyright-header is appropriate attribution.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2024-10-09 08:36:10 +03:00
Alexander Smorkalov
7d9014e09e
Merge pull request #26263 from mlourakis:4.x
inversion checks
2024-10-08 20:50:15 +03:00
Kumataro
40428d919d
Merge pull request #26259 from Kumataro:fix26258
core: C-API cleanup: RNG algorithms in core(4.x) #26259

- replace CV_RAND_UNI and NORMAL to cv::RNG::UNIFORM and cv::RNG::NORMAL.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2024-10-08 15:55:00 +03:00
Alexander Smorkalov
28efc21530
Merge pull request #26187 from inayd:26130-fixFillPolyBoundaries
Fix fillPoly drawing over boundaries
2024-10-07 17:13:03 +03:00
Alexander Smorkalov
cda9f4197e
Merge pull request #26266 from mshabunin:fix-rvv071-build
RISC-V: fix build with RVV 0.7.1
2024-10-07 16:14:01 +03:00
Maksim Shabunin
73d68f3f49 RISC-V: fix build with RVV 0.7.1 2024-10-07 12:53:23 +03:00
Manolis Lourakis
fa6d6520c7
inversion checks
Extra checks for corner cases in 3x3 matrix inversion
2024-10-06 17:24:15 +03:00
cudawarped
e375d5786b cuda - update npp calls to use the new NppStreamContext API if available 2024-10-03 15:13:04 +03:00
Alexander Smorkalov
3901426d85
Merge pull request #26241 from asmorkalov:as/kelidicv-0.2
Updated KleidiCV HAL to version 0.2. #26241

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2024-10-03 15:04:25 +03:00
Alexander Smorkalov
ae1fb8c033
Merge pull request #26224 from mshabunin:cpp-videoio-backport
C-API cleanup: backport videoio changes from 5.x
2024-10-03 14:41:20 +03:00
Wanli
783fe72756
Resolve Compilation Error for v_func Function in SIMD Emulator (#25891)
* use 2 parms for now to identify the error

* Revert "use 2 parms for now to identify the error"

This reverts commit 86faf993a7f291708c6ab44f4b984650d4542b38.

* replace += with =

* add v_log ref

* refactor intrin_math code

* Add include guard to `intrin_math.hpp` to prevent multiple inclusions

* rename VX to V; make fp64 impl in neon be optional

* add v_setall, v_setzero for all backends; rewrite the intrin_math

* fix error on rvv_scalable

* let v_erf use v_exp_default_32f function

* 1. replaced 'v_setzero(VecType dummy)' with 'v_setzero_<VecType>()'
2. replaced 'v_setall(LaneType x, VecType dummy)' with 'v_setall_<VecType>(LaneType x)'
3. added tests for the new v_setzero_<> and v_setall_<>.

* gcc does not seem to like static_assert in functions even when they are not used

* trying to fix compile errors in Debug mode on Linux

---------

Co-authored-by: Vadim Pisarevsky <vadim.pisarevsky@gmail.com>
2024-10-02 21:28:48 +03:00
Alexander Smorkalov
73b3b24c56
Merge pull request #26236 from asmorkalov:as/HAL_pyrlk_hack_documentation
Added HAL documentation note for out-of-bound hack in optical flow LK.
2024-10-02 17:49:09 +03:00
Alexander Smorkalov
1aa325a460 Added HAL documentation note for out-of-bound hack in optical flow LK. 2024-10-02 12:38:25 +03:00
Alexander Smorkalov
292ee28913
Merge pull request #26230 from mshabunin:cpp-photo-4x
C-API cleanup: inpaint algorithms in photo (4.x)
2024-10-02 08:13:39 +03:00
Alexander Smorkalov
b8eed54ced
Merge pull request #26228 from mshabunin:cpp-features2d-4x
C-API cleanup: use AutoBuffer in MSER (4.x)
2024-10-02 08:11:28 +03:00
inayd
93a882d2e2 Fix fillPoly drawing over boundaries 2024-10-01 21:17:42 +02:00
Maksim Shabunin
807170d5c9 C-API cleanup: inpaint algorithms in photo 2024-10-01 20:10:35 +03:00
Maksim Shabunin
72023951ea C-API cleanup: use AutoBuffer in MSER 2024-10-01 18:44:22 +03:00
Maksim Shabunin
305b57e622 C-API cleanup: backport videoio changes from 5.x 2024-10-01 17:06:08 +03:00
Alexander Smorkalov
658336b366
Merge pull request #26219 from mlourakis:4.x
SQPnP solver updates
2024-10-01 14:53:20 +03:00
Manolis Lourakis
086b999013
SQPnP solver updates
Mirror most recent changes from https://github.com/terzakig/sqpnp/pull/24
  - rank revealing QR in nullspace computation
  - sqrt-free Cholesky (i.e., L*D*Lt) in the SQP solution
  - replaced divisions with multiplications by inverses
  - simplified checks in computeRowAndNullspace()
  - removed unnecessary negations
  - broke some dependency chains with parentheses
  - minor other changes
2024-09-30 16:17:22 +03:00
Liutong HAN
8a36f119ce Add the HAL implementation for the merge function on RISC-V Vector 2024-09-29 13:39:53 +00:00
Javier Macias Sola
679931dcde Enhance cv::TickMeter to be able to get the last ellapsed time 2024-09-28 12:24:36 +02:00
Suleyman TURKMEN
48a48fe11c Enable PNG exif orientation test 2024-09-27 00:04:12 +03:00
Alexander Smorkalov
450e741f8d
Merge pull request #26176 from najasnake12:fixed_minor_typos_in_js_tutorials
Fixed minor typos in JS tutorials
2024-09-23 08:51:44 +03:00
Alexander Smorkalov
a6ec12f58b
Merge pull request #26163 from asmorkalov:as/HAL_schaar_deriv
HAL interface for Sharr derivatives needed for Lukas-Kanade algorithm #26163

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2024-09-23 08:44:22 +03:00
Scott
50eebbd21f Fixed minor typos in js tutorials 2024-09-22 11:40:06 +02:00
Alexander Smorkalov
b2e118ea94
Merge pull request #26166 from mshabunin:fix-intrin-ops
build: fix AVX2/AVX512 builds failed due to intrinsics operator usage
2024-09-20 19:09:45 +03:00
Maksim Shabunin
6ef357fd54 build: fix AVX2/AVX512 builds failed due to intrinsics operator usage 2024-09-20 13:38:59 +03:00
CSBVision
fab419a484 Update op_cuda.hpp 2024-09-20 12:00:17 +02:00
Suleyman TURKMEN
f503890c2b
Merge pull request #26152 from sturkmen72:m_buf_supported
Documentation update for imagecodecs #26152

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2024-09-17 09:34:38 +03:00
Alessandro de Oliveira Faria (A.K.A.CABELO)
e043d5d9d6
Merge pull request #26154 from cabelo:yolov5l
Added and tested yolov5l model. #26154

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [X] The PR is proposed to the proper branch
- [X] There is a reference to the original bug report and related work
- [X] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [X] The feature is well documented and sample code can be built with the project CMake

Below is evidence of the test:

![v5l](https://github.com/user-attachments/assets/f31eff0b-11fc-44de-bdaf-640e67d1d924)
2024-09-17 08:58:22 +03:00
Alexander Smorkalov
881440c6c6
Merge pull request #26143 from asmorkalov:as/HAL_opticalFlowLK
Added HAL interface for Lukas-Kanade optical flow #26143

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2024-09-16 17:07:06 +03:00
Alexander Smorkalov
ee685017c3
Merge pull request #26153 from onurcankaraman:sample/trackerUpdate
sample: tracker parameters updated
2024-09-16 10:31:01 +03:00
Onur Can KARAMAN
aa11a898a4 sample: tracker parameters updated
Signed-off-by: Onur Can KARAMAN <onurcankaraman340@gmail.com>
2024-09-14 23:36:57 +03:00
Alexander Smorkalov
e1fec15627
Merge pull request #26148 from mshabunin:fix-sift-corruption
features2d: fixed out of bounds access in SIFT
2024-09-13 15:46:00 +03:00
Maksim Shabunin
6308739638 features2d: fixed out of bounds access in SIFT 2024-09-13 14:30:27 +03:00
Alexander Smorkalov
bf998429f6
Merge pull request #26146 from mshabunin:fix-test-overrides
ts: add some missing override markers
2024-09-13 13:33:54 +03:00
Maksim Shabunin
9663245459 ts: add some missing override markers 2024-09-13 12:48:05 +03:00
Robert Mitchell
f143f45fa2
Merge pull request #25785 from refmitchell:issue_25784
Documentation update for minMaxLoc #25785

Fixes #25784

Update documentation for minMaxLoc to be more specific about when multi-channel images are and are not supported.

Testing:
Built documentation locally to check that updates were incorporated correctly.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2024-09-13 12:34:01 +03:00
Wanli
c8080aa415
Merge pull request #26109 from WanliZhong:univ_intrin_operator2warpper
Replace operators with wrapper functions on universal intrinsics backends #26109

This PR aims to replace the operators(logic, arithmetic, bit) with wrapper functions(v_add, v_eq, v_and...)

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2024-09-13 10:56:48 +03:00
Maksim Shabunin
4c81e174bf
Merge pull request #25901 from mshabunin:fix-riscv-aarch-baseline
RISC-V/AArch64: disable CPU features detection #25901

This PR is the first step in fixing current issues with NEON/RVV, FP16, BF16 and other CPU features on AArch64 and RISC-V platforms.

On AArch64 and RISC-V platforms we usually have the platform set by default in the toolchain when we compile it or in the cmake toolchain file or in CMAKE_CXX_FLAGS by user. Then, there are two ways to set platform options: a) "-mcpu=<some_cpu>" ; b) "-march=<arch description>" (e.g. "rv64gcv"). Furthermore, there are no similar "levels" of optimizations as for x86_64, instead we have features (RVV, FP16,...) which can be enabled or disabled. So, for example, if a user has "rv64gc" set by the toolchain and we want to enable RVV. Then we need to somehow parse their current feature set and append "v" (vector optimizations) to this string. This task is quite hard and the whole procedure is prone to errors.

I propose to use "CPU_BASELINE=DETECT" by default on AArch64 and RISC-V platforms. And somehow remove other features or make them read-only/detect-only, so that OpenCV wouldn't add any extra "-march" flags to the default configuration. We would rely only on the flags provided by the compiler and cmake toolchain file. We can have some predefined configurations in our cmake toolchain files.

Changes made by this PR:
- `CMakeLists.txt`: 
  - use `CMAKE_CROSSCOMPILING` instead of `CMAKE_TOOLCHAIN_FILE` to detect cross-compilation. This might be useful in cases of native compilation with a toolchain file
  - removed obsolete variables `ENABLE_NEON` and `ENABLE_VFPV3`, the first one have been turned ON by default on AArch64 platform which caused setting `CPU_BASELINE=NEON`
  - raise minimum cmake version allowed to 3.7 to allow using `CMAKE_CXX_FLAGS_INIT` in toolchain files
- added separate files with arch flags for native compilation on AArch64 and RISC-V, these files will be used in our toolchain files and in regular cmake
- use `DETECT` as default value for `CPU_BASELINE` also allow `NATIVE`, warn user if other values were used (only for AArch64 and RISC-V)
- for each feature listed in `CPU_DISPATCH` check if corresponding `CPU_${opt}_FLAGS_ON` has been provided, warn user if it is empty (only for AArch64 and RISC-V)
- use `CPU_BASELINE_DISABLE` variable to actually turn off macros responsible for corresponding features even if they are enabled by compiler
- removed Aarch64 feature merge procedure (it didn't support `-mcpu` and built-in `-march`)
- reworked AArch64 and two RISC-V cmake toolchain files (does not affect Android/OSX/iOS/Win):
  - use `CMAKE_CXX_FLAGS_INIT` to set compiler flags
  - use variables `ENABLE_BF16`, `ENABLE_DOTPROD`, `ENABLE_RVV`, `ENABLE_FP16` to control `-march`
  - AArch64: removed other compiler and linker flags
    - `-fdata-sections`, `-fsigned-char`, `-Wl,--no-undefined`, `-Wl,--gc-sections`   - already set by OpenCV
    - `-Wa,--noexecstack`, `-Wl,-z,noexecstack`, `-Wl,-z,relro`, `-Wl,-z,now` - can be enabled by OpenCV via `ENABLE_HARDENING`
    - `-Wno-psabi` - this option used to disable some warnings on older ARM platforms, shouldn't harm
  - ARM: removed same common flags as for AArch64, but left `-mthumb` and `--fix-cortex-a8`, `-z nocopyreloc`
2024-09-12 18:07:24 +03:00
FantasqueX
85923c8f30
Merge pull request #26113 from FantasqueX:zlib-ng-2-2-1
Update zlib-ng to 2.2.1 #26113

Release: https://github.com/zlib-ng/zlib-ng/releases/tag/2.2.1
ARM diagnostics patch: https://github.com/zlib-ng/zlib-ng/pull/1774

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2024-09-12 16:05:24 +03:00
Alexander Smorkalov
7de3a8e960
Merge pull request #26088 from plctlab:rvp_pt2
3rdparty: NDSRVP - Part 2: Filter
2024-09-11 12:18:42 +03:00
Alexander Smorkalov
fcfdd311ab
Merge pull request #26134 from savuor:rv/mixed_arithm_channels
Mixed arithmetics tests: multichannel support
2024-09-10 13:20:04 +03:00
Alexander Smorkalov
976fb3e8d6
Merge pull request #26132 from asmorkalov:as/tlile_leakyRelu
Leaky RELU support for TFLite.
2024-09-10 10:28:35 +03:00
Rostislav Vasilikhin
8725a7e21c Mixed arithmetics tests: multichannel 2024-09-09 13:54:00 +02:00
Alexander Smorkalov
209802c9f6 Leaky RELU support for TFLite. 2024-09-09 12:40:35 +03:00
pasbi
79faf857d9
Merge pull request #26042 from pasbi:add-PtrStepSz_size
Add size() to CUDA PtrStepSz #26042

According to [cppreference.com compiler support table](https://en.cppreference.com/w/cpp/compiler_support/17), `nvcc` supports `[[nodiscard]]` from version 11.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake

Related: https://github.com/opencv/opencv/pull/25659
2024-09-09 08:47:26 +03:00
Alexander Smorkalov
e5790c0241
Merge pull request #26116 from savuor:rv/warp_affine_perf_types
Added more data types for warpAffine() perf tests
2024-09-09 07:45:55 +03:00
Alexander Smorkalov
a7d942b681
Merge pull request #26125 from asmorkalov:as/HAL_fix_nullprt_leak
Excluded nullptr leak to arithmetic HAL got from empty Mat.
2024-09-07 14:20:34 +03:00
Alexander Smorkalov
307dc2a298 Excluded nullptr leak to arithmetic HAL got from empty Mat. 2024-09-06 16:49:14 +03:00
Alexander Smorkalov
6cc166985d
Merge pull request #26117 from FantasqueX:update-remap-tutorial-1
Update remap tutorial
2024-09-06 08:53:19 +03:00
Rostislav Vasilikhin
7590813b69
Merge pull request #26115 from savuor:rv/flip_ocl_dtypes
Added more data types to OCL flip() and rotate() perf tests #26115

Connected PR with updated sanity data: https://github.com/opencv/opencv_extra/pull/1206

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
2024-09-06 08:26:00 +03:00
Alexander Smorkalov
2a8d4c6025
Merge pull request #26120 from mshabunin:fix-rvv-init-4.x
RISC-V: remove statically initialized global RVV variables (4.x)
2024-09-06 08:21:47 +03:00
Maksim Shabunin
dbd53fe89a RISC-V: remove statically initialized global RVV variables 2024-09-05 19:50:43 +03:00
Letu Ren
b743edd466 Update remap tutorial
- Make x a math symbol
- Fix a typo
2024-09-05 16:54:37 +08:00
Alexander Smorkalov
06db881ca9
Merge pull request #26114 from savuor:rv/rgb2gray_bit_exact
Added bit-exact tests for RGB2Gray
2024-09-05 11:29:29 +03:00
Rostislav Vasilikhin
c0a0852f05 added more data types for warpAffine() perf tests 2024-09-05 05:46:17 +02:00
Rostislav Vasilikhin
9ef574a213 added bit-exact tests for RGB2Gray 2024-09-05 03:34:35 +02:00
Alexander Smorkalov
76f495dce9
Merge pull request #26105 from mshabunin:fix-cmake-3.30
build: minor changes for cmake 3.30 and some cleanup
2024-09-04 09:16:28 +03:00
Maksim Shabunin
32d3d6fa97 build: minor changes for cmake 3.30 and some cleanup 2024-09-03 16:35:45 +03:00
Vincent Rabaud
8561f45c2a
Merge pull request #26084 from vrabaud:avif_check
Avoid uninitialized value read in resize. #26084

When there is no point falling right, an hypothetical value is computed (but unused) using an uninitialized ofst. This triggers warnings in the sanitizers.

Including those values in the for loops is also possible but messy when SIMD is involved.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
2024-09-03 15:16:22 +03:00
tingboliao
88f99edc65
Merge pull request #26071 from tingboliao:4.x
Remove the redundant codes of cv::convertMaps and mRGBA2RGBA<uchar> #26071

(1) cv::convertMaps: the branch [else if( m1type == CV_32FC2 && dstm1type == CV_16SC2 ) if( nninterpolate )] is unreachable,
    as the condition is satisfied in lines 1959 to 1961, calculated in advance and return directly.
(2) mRGBA2RGBA<uchar>: dst[0], dst[1], dst[2] and dst[3] is calculated repeatedly. Introduced in https://github.com/opencv/opencv/pull/13440

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [ ] I agree to contribute to the project under Apache 2 License.
- [ ] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2024-09-03 07:56:37 +03:00
Alexander Smorkalov
e9c3e1acb5
Merge pull request #26102 from FantasqueX:make-t-a-math-symbol
Make T a math symbol
2024-09-03 07:50:27 +03:00
Letu Ren
3995ad8458 Make T a math symbol 2024-09-03 10:54:10 +08:00
Suleyman TURKMEN
e2ba36bf9c
Merge pull request #26093 from sturkmen72:related_issue_22090
Update test_tiff.cpp #26093

related #22090

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
2024-09-02 15:26:24 +03:00
Alexander Smorkalov
960aaa32db
Merge pull request #26094 from catree:fix_cameramatrix_doc_typo
Fix typo with cameramatrix command for documentation
2024-09-02 09:34:13 +03:00
Alexander Smorkalov
b72d7e3b05
Merge pull request #26091 from asmorkalov:as/arm_version_check
Got rid of CAROTENE_NEON_ARCH and use standard __ARM_ARCH check
2024-09-02 09:02:28 +03:00
catree
165bf25c46 Fix typo with cameramatrix command for documentation.
Fix link for "RANSAC for Dummies" tutorial.
2024-09-01 01:03:57 +02:00
Alexander Smorkalov
a905526f71 Got rid of CAROTENE_NEON_ARCH and use standard __ARM_ARCH check. 2024-08-30 12:09:04 +03:00
llh721113
e087cc8fd1 feat: NDSRVP Filter 2024-08-30 07:59:51 +08:00
650 changed files with 41680 additions and 20170 deletions

View File

@ -6,24 +6,21 @@ on:
- 4.x
jobs:
Linux:
uses: opencv/ci-gha-workflow/.github/workflows/OCV-PR-Linux.yaml@main
with:
workflow_branch: main
Ubuntu2004-ARM64:
uses: opencv/ci-gha-workflow/.github/workflows/OCV-PR-4.x-ARM64.yaml@main
Ubuntu2004-ARM64-Debug:
uses: opencv/ci-gha-workflow/.github/workflows/OCV-PR-4.x-ARM64-Debug.yaml@main
Ubuntu2004-x64:
uses: opencv/ci-gha-workflow/.github/workflows/OCV-PR-4.x-U20.yaml@main
Ubuntu2004-x64-OpenVINO:
uses: opencv/ci-gha-workflow/.github/workflows/OCV-PR-4.x-U20-OpenVINO.yaml@main
Ubuntu2204-x64:
uses: opencv/ci-gha-workflow/.github/workflows/OCV-PR-4.x-U22.yaml@main
Ubuntu2404-x64:
uses: opencv/ci-gha-workflow/.github/workflows/OCV-PR-4.x-U24.yaml@main
Ubuntu2004-x64-CUDA:
if: "${{ contains(github.event.pull_request.labels.*.name, 'category: dnn') }} || ${{ contains(github.event.pull_request.labels.*.name, 'category: dnn (onnx)') }}"
uses: opencv/ci-gha-workflow/.github/workflows/OCV-PR-4.x-U20-Cuda.yaml@main
@ -31,9 +28,6 @@ jobs:
Windows10-x64:
uses: opencv/ci-gha-workflow/.github/workflows/OCV-PR-4.x-W10.yaml@main
Windows10-ARM64:
uses: opencv/ci-gha-workflow/.github/workflows/OCV-PR-4.x-W10-ARM64.yaml@main
Windows10-x64-Vulkan:
uses: opencv/ci-gha-workflow/.github/workflows/OCV-PR-4.x-W10-Vulkan.yaml@main

View File

@ -42,17 +42,9 @@ endif()
if(WITH_NEON)
target_compile_definitions(carotene_objs PRIVATE "-DWITH_NEON")
if(NOT DEFINED CAROTENE_NEON_ARCH )
elseif(CAROTENE_NEON_ARCH EQUAL 8)
target_compile_definitions(carotene_objs PRIVATE "-DCAROTENE_NEON_ARCH=8")
elseif(CAROTENE_NEON_ARCH EQUAL 7)
target_compile_definitions(carotene_objs PRIVATE "-DCAROTENE_NEON_ARCH=7")
else()
target_compile_definitions(carotene_objs PRIVATE "-DCAROTENE_NEON_ARCH=0")
endif()
endif()
if(MINGW)
if(MINGW)
target_compile_definitions(carotene_objs PRIVATE "-D_USE_MATH_DEFINES=1")
endif()

View File

@ -119,7 +119,7 @@ private: \
#define TEGRA_BINARYOP(type, op, src1, sz1, src2, sz2, dst, sz, w, h) \
( \
CAROTENE_NS::isSupportedConfiguration() ? \
parallel_for_(Range(0, h), \
parallel_for_(cv::Range(0, h), \
TegraGenOp_##op##_Invoker<const type, type>(src1, sz1, src2, sz2, dst, sz, w, h), \
(w * h) / static_cast<double>(1<<16)), \
CV_HAL_ERROR_OK \
@ -154,7 +154,7 @@ TegraUnaryOp_Invoker(bitwiseNot, bitwiseNot)
#define TEGRA_UNARYOP(type, op, src1, sz1, dst, sz, w, h) \
( \
CAROTENE_NS::isSupportedConfiguration() ? \
parallel_for_(Range(0, h), \
parallel_for_(cv::Range(0, h), \
TegraGenOp_##op##_Invoker<const type, type>(src1, sz1, dst, sz, w, h), \
(w * h) / static_cast<double>(1<<16)), \
CV_HAL_ERROR_OK \
@ -254,32 +254,32 @@ TegraGenOp_Invoker(cmpLE, cmpGE, 2, 1, 0, RANGE_DATA(ST, src2_data, src2_step),
( \
CAROTENE_NS::isSupportedConfiguration() ? \
((op) == cv::CMP_EQ) ? \
parallel_for_(Range(0, h), \
parallel_for_(cv::Range(0, h), \
TegraGenOp_cmpEQ_Invoker<const type, CAROTENE_NS::u8>(src1, sz1, src2, sz2, dst, sz, w, h), \
(w * h) / static_cast<double>(1<<16)), \
CV_HAL_ERROR_OK : \
((op) == cv::CMP_NE) ? \
parallel_for_(Range(0, h), \
parallel_for_(cv::Range(0, h), \
TegraGenOp_cmpNE_Invoker<const type, CAROTENE_NS::u8>(src1, sz1, src2, sz2, dst, sz, w, h), \
(w * h) / static_cast<double>(1<<16)), \
CV_HAL_ERROR_OK : \
((op) == cv::CMP_GT) ? \
parallel_for_(Range(0, h), \
parallel_for_(cv::Range(0, h), \
TegraGenOp_cmpGT_Invoker<const type, CAROTENE_NS::u8>(src1, sz1, src2, sz2, dst, sz, w, h), \
(w * h) / static_cast<double>(1<<16)), \
CV_HAL_ERROR_OK : \
((op) == cv::CMP_GE) ? \
parallel_for_(Range(0, h), \
parallel_for_(cv::Range(0, h), \
TegraGenOp_cmpGE_Invoker<const type, CAROTENE_NS::u8>(src1, sz1, src2, sz2, dst, sz, w, h), \
(w * h) / static_cast<double>(1<<16)), \
CV_HAL_ERROR_OK : \
((op) == cv::CMP_LT) ? \
parallel_for_(Range(0, h), \
parallel_for_(cv::Range(0, h), \
TegraGenOp_cmpLT_Invoker<const type, CAROTENE_NS::u8>(src1, sz1, src2, sz2, dst, sz, w, h), \
(w * h) / static_cast<double>(1<<16)), \
CV_HAL_ERROR_OK : \
((op) == cv::CMP_LE) ? \
parallel_for_(Range(0, h), \
parallel_for_(cv::Range(0, h), \
TegraGenOp_cmpLE_Invoker<const type, CAROTENE_NS::u8>(src1, sz1, src2, sz2, dst, sz, w, h), \
(w * h) / static_cast<double>(1<<16)), \
CV_HAL_ERROR_OK : \
@ -310,7 +310,7 @@ TegraGenOp_Invoker(cmpLE, cmpGE, 2, 1, 0, RANGE_DATA(ST, src2_data, src2_step),
#define TEGRA_BINARYOPSCALE(type, op, src1, sz1, src2, sz2, dst, sz, w, h, scales) \
( \
CAROTENE_NS::isSupportedConfiguration() ? \
parallel_for_(Range(0, h), \
parallel_for_(cv::Range(0, h), \
TegraGenOp_##op##_Invoker<const type, type>(src1, sz1, src2, sz2, dst, sz, w, h, scales), \
(w * h) / static_cast<double>(1<<16)), \
CV_HAL_ERROR_OK \
@ -332,7 +332,7 @@ TegraBinaryOpScale_Invoker(divf, div, 1, scale)
#define TEGRA_UNARYOPSCALE(type, op, src1, sz1, dst, sz, w, h, scales) \
( \
CAROTENE_NS::isSupportedConfiguration() ? \
parallel_for_(Range(0, h), \
parallel_for_(cv::Range(0, h), \
TegraGenOp_##op##_Invoker<const type, type>(src1, sz1, dst, sz, w, h, scales), \
(w * h) / static_cast<double>(1<<16)), \
CV_HAL_ERROR_OK \
@ -928,17 +928,17 @@ TegraRowOp_Invoker(split4, split4, 1, 4, 0, RANGE_DATA(ST, src1_data, 4*sizeof(S
( \
CAROTENE_NS::isSupportedConfiguration() ? \
cn == 2 ? \
parallel_for_(Range(0, len), \
parallel_for_(cv::Range(0, len), \
TegraRowOp_split2_Invoker<const type, type>(src, dst[0], dst[1]), \
(len) / static_cast<double>(1<<16)), \
CV_HAL_ERROR_OK : \
cn == 3 ? \
parallel_for_(Range(0, len), \
parallel_for_(cv::Range(0, len), \
TegraRowOp_split3_Invoker<const type, type>(src, dst[0], dst[1], dst[2]), \
(len) / static_cast<double>(1<<16)), \
CV_HAL_ERROR_OK : \
cn == 4 ? \
parallel_for_(Range(0, len), \
parallel_for_(cv::Range(0, len), \
TegraRowOp_split4_Invoker<const type, type>(src, dst[0], dst[1], dst[2], dst[3]), \
(len) / static_cast<double>(1<<16)), \
CV_HAL_ERROR_OK : \
@ -990,17 +990,17 @@ TegraRowOp_Invoker(combine4, combine4, 4, 1, 0, RANGE_DATA(ST, src1_data, sizeof
( \
CAROTENE_NS::isSupportedConfiguration() ? \
cn == 2 ? \
parallel_for_(Range(0, len), \
parallel_for_(cv::Range(0, len), \
TegraRowOp_combine2_Invoker<const type, type>(src[0], src[1], dst), \
(len) / static_cast<double>(1<<16)), \
CV_HAL_ERROR_OK : \
cn == 3 ? \
parallel_for_(Range(0, len), \
parallel_for_(cv::Range(0, len), \
TegraRowOp_combine3_Invoker<const type, type>(src[0], src[1], src[2], dst), \
(len) / static_cast<double>(1<<16)), \
CV_HAL_ERROR_OK : \
cn == 4 ? \
parallel_for_(Range(0, len), \
parallel_for_(cv::Range(0, len), \
TegraRowOp_combine4_Invoker<const type, type>(src[0], src[1], src[2], src[3], dst), \
(len) / static_cast<double>(1<<16)), \
CV_HAL_ERROR_OK : \
@ -1033,7 +1033,7 @@ TegraRowOp_Invoker(phase, phase, 2, 1, 1, RANGE_DATA(ST, src1_data, sizeof(CAROT
#define TEGRA_FASTATAN(y, x, dst, len, angleInDegrees) \
( \
CAROTENE_NS::isSupportedConfiguration() ? \
parallel_for_(Range(0, len), \
parallel_for_(cv::Range(0, len), \
TegraRowOp_phase_Invoker<const CAROTENE_NS::f32, CAROTENE_NS::f32>(x, y, dst, angleInDegrees ? 1.0f : M_PI/180), \
(len) / static_cast<double>(1<<16)), \
CV_HAL_ERROR_OK \
@ -1049,7 +1049,7 @@ TegraRowOp_Invoker(magnitude, magnitude, 2, 1, 0, RANGE_DATA(ST, src1_data, size
#define TEGRA_MAGNITUDE(x, y, dst, len) \
( \
CAROTENE_NS::isSupportedConfiguration() ? \
parallel_for_(Range(0, len), \
parallel_for_(cv::Range(0, len), \
TegraRowOp_magnitude_Invoker<const CAROTENE_NS::f32, CAROTENE_NS::f32>(x, y, dst), \
(len) / static_cast<double>(1<<16)), \
CV_HAL_ERROR_OK \
@ -1563,17 +1563,17 @@ TegraCvtColor_Invoker(rgbx2bgrx, rgbx2bgrx, src_data + static_cast<size_t>(range
scn == 3 ? \
dcn == 3 ? \
swapBlue ? \
parallel_for_(Range(0, height), \
parallel_for_(cv::Range(0, height), \
TegraCvtColor_rgb2bgr_Invoker(src_data, src_step, dst_data, dst_step, width, height), \
(width * height) / static_cast<double>(1<<16)), \
CV_HAL_ERROR_OK : \
CV_HAL_ERROR_NOT_IMPLEMENTED : \
dcn == 4 ? \
(swapBlue ? \
parallel_for_(Range(0, height), \
parallel_for_(cv::Range(0, height), \
TegraCvtColor_rgb2bgrx_Invoker(src_data, src_step, dst_data, dst_step, width, height), \
(width * height) / static_cast<double>(1<<16)) : \
parallel_for_(Range(0, height), \
parallel_for_(cv::Range(0, height), \
TegraCvtColor_rgb2rgbx_Invoker(src_data, src_step, dst_data, dst_step, width, height), \
(width * height) / static_cast<double>(1<<16)) ), \
CV_HAL_ERROR_OK : \
@ -1581,16 +1581,16 @@ TegraCvtColor_Invoker(rgbx2bgrx, rgbx2bgrx, src_data + static_cast<size_t>(range
scn == 4 ? \
dcn == 3 ? \
(swapBlue ? \
parallel_for_(Range(0, height), \
parallel_for_(cv::Range(0, height), \
TegraCvtColor_rgbx2bgr_Invoker(src_data, src_step, dst_data, dst_step, width, height), \
(width * height) / static_cast<double>(1<<16)) : \
parallel_for_(Range(0, height), \
parallel_for_(cv::Range(0, height), \
TegraCvtColor_rgbx2rgb_Invoker(src_data, src_step, dst_data, dst_step, width, height), \
(width * height) / static_cast<double>(1<<16)) ), \
CV_HAL_ERROR_OK : \
dcn == 4 ? \
swapBlue ? \
parallel_for_(Range(0, height), \
parallel_for_(cv::Range(0, height), \
TegraCvtColor_rgbx2bgrx_Invoker(src_data, src_step, dst_data, dst_step, width, height), \
(width * height) / static_cast<double>(1<<16)), \
CV_HAL_ERROR_OK : \
@ -1613,19 +1613,19 @@ TegraCvtColor_Invoker(rgbx2rgb565, rgbx2rgb565, src_data + static_cast<size_t>(r
greenBits == 6 && CAROTENE_NS::isSupportedConfiguration() ? \
scn == 3 ? \
(swapBlue ? \
parallel_for_(Range(0, height), \
parallel_for_(cv::Range(0, height), \
TegraCvtColor_rgb2bgr565_Invoker(src_data, src_step, dst_data, dst_step, width, height), \
(width * height) / static_cast<double>(1<<16)) : \
parallel_for_(Range(0, height), \
parallel_for_(cv::Range(0, height), \
TegraCvtColor_rgb2rgb565_Invoker(src_data, src_step, dst_data, dst_step, width, height), \
(width * height) / static_cast<double>(1<<16)) ), \
CV_HAL_ERROR_OK : \
scn == 4 ? \
(swapBlue ? \
parallel_for_(Range(0, height), \
parallel_for_(cv::Range(0, height), \
TegraCvtColor_rgbx2bgr565_Invoker(src_data, src_step, dst_data, dst_step, width, height), \
(width * height) / static_cast<double>(1<<16)) : \
parallel_for_(Range(0, height), \
parallel_for_(cv::Range(0, height), \
TegraCvtColor_rgbx2rgb565_Invoker(src_data, src_step, dst_data, dst_step, width, height), \
(width * height) / static_cast<double>(1<<16)) ), \
CV_HAL_ERROR_OK : \
@ -1646,19 +1646,19 @@ TegraCvtColor_Invoker(bgrx2gray, bgrx2gray, CAROTENE_NS::COLOR_SPACE_BT601, src_
depth == CV_8U && CAROTENE_NS::isSupportedConfiguration() ? \
scn == 3 ? \
(swapBlue ? \
parallel_for_(Range(0, height), \
parallel_for_(cv::Range(0, height), \
TegraCvtColor_rgb2gray_Invoker(src_data, src_step, dst_data, dst_step, width, height), \
(width * height) / static_cast<double>(1<<16)) : \
parallel_for_(Range(0, height), \
parallel_for_(cv::Range(0, height), \
TegraCvtColor_bgr2gray_Invoker(src_data, src_step, dst_data, dst_step, width, height), \
(width * height) / static_cast<double>(1<<16)) ), \
CV_HAL_ERROR_OK : \
scn == 4 ? \
(swapBlue ? \
parallel_for_(Range(0, height), \
parallel_for_(cv::Range(0, height), \
TegraCvtColor_rgbx2gray_Invoker(src_data, src_step, dst_data, dst_step, width, height), \
(width * height) / static_cast<double>(1<<16)) : \
parallel_for_(Range(0, height), \
parallel_for_(cv::Range(0, height), \
TegraCvtColor_bgrx2gray_Invoker(src_data, src_step, dst_data, dst_step, width, height), \
(width * height) / static_cast<double>(1<<16)) ), \
CV_HAL_ERROR_OK : \
@ -1674,12 +1674,12 @@ TegraCvtColor_Invoker(gray2rgbx, gray2rgbx, src_data + static_cast<size_t>(range
( \
depth == CV_8U && CAROTENE_NS::isSupportedConfiguration() ? \
dcn == 3 ? \
parallel_for_(Range(0, height), \
parallel_for_(cv::Range(0, height), \
TegraCvtColor_gray2rgb_Invoker(src_data, src_step, dst_data, dst_step, width, height), \
(width * height) / static_cast<double>(1<<16)), \
CV_HAL_ERROR_OK : \
dcn == 4 ? \
parallel_for_(Range(0, height), \
parallel_for_(cv::Range(0, height), \
TegraCvtColor_gray2rgbx_Invoker(src_data, src_step, dst_data, dst_step, width, height), \
(width * height) / static_cast<double>(1<<16)), \
CV_HAL_ERROR_OK : \
@ -1700,19 +1700,19 @@ TegraCvtColor_Invoker(bgrx2ycrcb, bgrx2ycrcb, src_data + static_cast<size_t>(ran
isCbCr && depth == CV_8U && CAROTENE_NS::isSupportedConfiguration() ? \
scn == 3 ? \
(swapBlue ? \
parallel_for_(Range(0, height), \
parallel_for_(cv::Range(0, height), \
TegraCvtColor_rgb2ycrcb_Invoker(src_data, src_step, dst_data, dst_step, width, height), \
(width * height) / static_cast<double>(1<<16)) : \
parallel_for_(Range(0, height), \
parallel_for_(cv::Range(0, height), \
TegraCvtColor_bgr2ycrcb_Invoker(src_data, src_step, dst_data, dst_step, width, height), \
(width * height) / static_cast<double>(1<<16)) ), \
CV_HAL_ERROR_OK : \
scn == 4 ? \
(swapBlue ? \
parallel_for_(Range(0, height), \
parallel_for_(cv::Range(0, height), \
TegraCvtColor_rgbx2ycrcb_Invoker(src_data, src_step, dst_data, dst_step, width, height), \
(width * height) / static_cast<double>(1<<16)) : \
parallel_for_(Range(0, height), \
parallel_for_(cv::Range(0, height), \
TegraCvtColor_bgrx2ycrcb_Invoker(src_data, src_step, dst_data, dst_step, width, height), \
(width * height) / static_cast<double>(1<<16)) ), \
CV_HAL_ERROR_OK : \
@ -1742,34 +1742,34 @@ TegraCvtColor_Invoker(bgrx2hsvf, bgrx2hsv, src_data + static_cast<size_t>(range.
scn == 3 ? \
(swapBlue ? \
isFullRange ? \
parallel_for_(Range(0, height), \
parallel_for_(cv::Range(0, height), \
TegraCvtColor_rgb2hsvf_Invoker(src_data, src_step, dst_data, dst_step, width, height), \
(width * height) / static_cast<double>(1<<16)) : \
parallel_for_(Range(0, height), \
parallel_for_(cv::Range(0, height), \
TegraCvtColor_rgb2hsv_Invoker(src_data, src_step, dst_data, dst_step, width, height), \
(width * height) / static_cast<double>(1<<16)) : \
isFullRange ? \
parallel_for_(Range(0, height), \
parallel_for_(cv::Range(0, height), \
TegraCvtColor_bgr2hsvf_Invoker(src_data, src_step, dst_data, dst_step, width, height), \
(width * height) / static_cast<double>(1<<16)) : \
parallel_for_(Range(0, height), \
parallel_for_(cv::Range(0, height), \
TegraCvtColor_bgr2hsv_Invoker(src_data, src_step, dst_data, dst_step, width, height), \
(width * height) / static_cast<double>(1<<16)) ), \
CV_HAL_ERROR_OK : \
scn == 4 ? \
(swapBlue ? \
isFullRange ? \
parallel_for_(Range(0, height), \
parallel_for_(cv::Range(0, height), \
TegraCvtColor_rgbx2hsvf_Invoker(src_data, src_step, dst_data, dst_step, width, height), \
(width * height) / static_cast<double>(1<<16)) : \
parallel_for_(Range(0, height), \
parallel_for_(cv::Range(0, height), \
TegraCvtColor_rgbx2hsv_Invoker(src_data, src_step, dst_data, dst_step, width, height), \
(width * height) / static_cast<double>(1<<16)) : \
isFullRange ? \
parallel_for_(Range(0, height), \
parallel_for_(cv::Range(0, height), \
TegraCvtColor_bgrx2hsvf_Invoker(src_data, src_step, dst_data, dst_step, width, height), \
(width * height) / static_cast<double>(1<<16)) : \
parallel_for_(Range(0, height), \
parallel_for_(cv::Range(0, height), \
TegraCvtColor_bgrx2hsv_Invoker(src_data, src_step, dst_data, dst_step, width, height), \
(width * height) / static_cast<double>(1<<16)) ), \
CV_HAL_ERROR_OK : \
@ -1857,7 +1857,7 @@ TegraCvtColor_Invoker(bgrx2hsvf, bgrx2hsv, src_data + static_cast<size_t>(range.
#endif
// The optimized branch was developed for old armv7 processors and leads to perf degradation on armv8
#if defined(DCAROTENE_NEON_ARCH) && (DCAROTENE_NEON_ARCH == 7)
#if defined(__ARM_ARCH) && (__ARM_ARCH == 7)
inline CAROTENE_NS::BORDER_MODE borderCV2Carotene(int borderType)
{
switch(borderType)
@ -1928,8 +1928,54 @@ inline int TEGRA_GaussianBlurBinomial(const uchar* src_data, size_t src_step, uc
#undef cv_hal_gaussianBlurBinomial
#define cv_hal_gaussianBlurBinomial TEGRA_GaussianBlurBinomial
#endif // DCAROTENE_NEON_ARCH=7
#endif // __ARM_ARCH=7
#endif // OPENCV_IMGPROC_HAL_INTERFACE_H
// The optimized branch was developed for old armv7 processors
#if defined(__ARM_ARCH) && (__ARM_ARCH == 7)
inline int TEGRA_LKOpticalFlowLevel(const uchar *prev_data, size_t prev_data_step,
const short* prev_deriv_data, size_t prev_deriv_step,
const uchar* next_data, size_t next_step,
int width, int height, int cn,
const float *prev_points, float *next_points, size_t point_count,
uchar *status, float *err,
const int win_width, const int win_height,
int termination_count, double termination_epsilon,
bool get_min_eigen_vals,
float min_eigen_vals_threshold)
{
if (!CAROTENE_NS::isSupportedConfiguration())
return CV_HAL_ERROR_NOT_IMPLEMENTED;
CAROTENE_NS::pyrLKOptFlowLevel(CAROTENE_NS::Size2D(width, height), cn,
prev_data, prev_data_step, prev_deriv_data, prev_deriv_step,
next_data, next_step,
point_count, prev_points, next_points,
status, err, CAROTENE_NS::Size2D(win_width, win_height),
termination_count, termination_epsilon,
get_min_eigen_vals, min_eigen_vals_threshold);
return CV_HAL_ERROR_OK;
}
#undef cv_hal_LKOpticalFlowLevel
#define cv_hal_LKOpticalFlowLevel TEGRA_LKOpticalFlowLevel
#endif // __ARM_ARCH=7
#if 0 // OpenCV provides fater parallel implementation
inline int TEGRA_ScharrDeriv(const uchar* src_data, size_t src_step,
short* dst_data, size_t dst_step,
int width, int height, int cn)
{
if (!CAROTENE_NS::isSupportedConfiguration())
return CV_HAL_ERROR_NOT_IMPLEMENTED;
CAROTENE_NS::ScharrDeriv(CAROTENE_NS::Size2D(width, height), cn, src_data, src_step, dst_data, dst_step);
return CV_HAL_ERROR_OK;
}
#undef cv_hal_ScharrDeriv
#define cv_hal_ScharrDeriv TEGRA_ScharrDeriv
#endif
#endif

View File

@ -2485,7 +2485,7 @@ namespace CAROTENE_NS {
u8 *status, f32 *err,
const Size2D &winSize,
u32 terminationCount, f64 terminationEpsilon,
u32 level, u32 maxLevel, bool useInitialFlow, bool getMinEigenVals,
bool getMinEigenVals,
f32 minEigThreshold);
}

View File

@ -58,17 +58,6 @@
namespace CAROTENE_NS { namespace internal {
#ifndef CAROTENE_NEON_ARCH
# if defined(__aarch64__) || defined(__aarch32__)
# define CAROTENE_NEON_ARCH 8
# else
# define CAROTENE_NEON_ARCH 7
# endif
#endif
#if ( !defined(__aarch64__) && !defined(__aarch32__) ) && (CAROTENE_NEON_ARCH == 8 )
# error("ARMv7 doen't support A32/A64 Neon instructions")
#endif
inline void prefetch(const void *ptr, size_t offset = 32*10)
{
#if defined __GNUC__

View File

@ -58,7 +58,7 @@ void pyrLKOptFlowLevel(const Size2D &size, s32 cn,
u8 *status, f32 *err,
const Size2D &winSize,
u32 terminationCount, f64 terminationEpsilon,
u32 level, u32 maxLevel, bool useInitialFlow, bool getMinEigenVals,
bool getMinEigenVals,
f32 minEigThreshold)
{
internal::assertSupportedConfiguration();
@ -74,32 +74,11 @@ void pyrLKOptFlowLevel(const Size2D &size, s32 cn,
for( u32 ptidx = 0; ptidx < ptCount; ptidx++ )
{
f32 levscale = (1./(1 << level));
u32 ptref = ptidx << 1;
f32 prevPtX = prevPts[ptref+0]*levscale;
f32 prevPtY = prevPts[ptref+1]*levscale;
f32 nextPtX;
f32 nextPtY;
if( level == maxLevel )
{
if( useInitialFlow )
{
nextPtX = nextPts[ptref+0]*levscale;
nextPtY = nextPts[ptref+1]*levscale;
}
else
{
nextPtX = prevPtX;
nextPtY = prevPtY;
}
}
else
{
nextPtX = nextPts[ptref+0]*2.f;
nextPtY = nextPts[ptref+1]*2.f;
}
nextPts[ptref+0] = nextPtX;
nextPts[ptref+1] = nextPtY;
f32 prevPtX = prevPts[ptref+0];
f32 prevPtY = prevPts[ptref+1];
f32 nextPtX = nextPts[ptref+0];
f32 nextPtY = nextPts[ptref+1];
s32 iprevPtX, iprevPtY;
s32 inextPtX, inextPtY;
@ -111,13 +90,10 @@ void pyrLKOptFlowLevel(const Size2D &size, s32 cn,
if( iprevPtX < -(s32)winSize.width || iprevPtX >= (s32)size.width ||
iprevPtY < -(s32)winSize.height || iprevPtY >= (s32)size.height )
{
if( level == 0 )
{
if( status )
status[ptidx] = false;
if( err )
err[ptidx] = 0;
}
if( status )
status[ptidx] = false;
if( err )
err[ptidx] = 0;
continue;
}
@ -333,7 +309,7 @@ void pyrLKOptFlowLevel(const Size2D &size, s32 cn,
if( minEig < minEigThreshold || D < FLT_EPSILON )
{
if( level == 0 && status )
if( status )
status[ptidx] = false;
continue;
}
@ -353,7 +329,7 @@ void pyrLKOptFlowLevel(const Size2D &size, s32 cn,
if( inextPtX < -(s32)winSize.width || inextPtX >= (s32)size.width ||
inextPtY < -(s32)winSize.height || inextPtY >= (s32)size.height )
{
if( level == 0 && status )
if( status )
status[ptidx] = false;
break;
}
@ -469,8 +445,7 @@ void pyrLKOptFlowLevel(const Size2D &size, s32 cn,
prevDeltaX = deltaX;
prevDeltaY = deltaY;
}
if( status && status[ptidx] && err && level == 0 && !getMinEigenVals )
if( status && status[ptidx] && err && !getMinEigenVals )
{
f32 nextPointX = nextPts[ptref+0] - halfWinX;
f32 nextPointY = nextPts[ptref+1] - halfWinY;
@ -526,9 +501,6 @@ void pyrLKOptFlowLevel(const Size2D &size, s32 cn,
(void)winSize;
(void)terminationCount;
(void)terminationEpsilon;
(void)level;
(void)maxLevel;
(void)useInitialFlow;
(void)getMinEigenVals;
(void)minEigThreshold;
(void)ptCount;
@ -536,4 +508,3 @@ void pyrLKOptFlowLevel(const Size2D &size, s32 cn,
}
}//CAROTENE_NS

View File

@ -57,7 +57,7 @@ namespace CAROTENE_NS { namespace internal {
inline uint32x4_t vroundq_u32_f32(const float32x4_t val)
{
#if CAROTENE_NEON_ARCH >= 8 /* get ready for ARMv9 */
#if defined(__ARM_ARCH) && (__ARM_ARCH >= 8)
return vcvtnq_u32_f32(val);
#else
const float32x4_t delta = vdupq_n_f32(CAROTENE_ROUND_DELTA);
@ -67,7 +67,7 @@ inline uint32x4_t vroundq_u32_f32(const float32x4_t val)
inline uint32x2_t vround_u32_f32(const float32x2_t val)
{
#if CAROTENE_NEON_ARCH >= 8 /* get ready for ARMv9 */
#if defined(__ARM_ARCH) && (__ARM_ARCH >= 8)
return vcvtn_u32_f32(val);
#else
const float32x2_t delta = vdup_n_f32(CAROTENE_ROUND_DELTA);
@ -77,7 +77,7 @@ inline uint32x2_t vround_u32_f32(const float32x2_t val)
inline int32x4_t vroundq_s32_f32(const float32x4_t val)
{
#if CAROTENE_NEON_ARCH >= 8 /* get ready for ARMv9 */
#if defined(__ARM_ARCH) && (__ARM_ARCH >= 8)
return vcvtnq_s32_f32(val);
#else
const float32x4_t delta = vdupq_n_f32(CAROTENE_ROUND_DELTA);
@ -87,7 +87,7 @@ inline int32x4_t vroundq_s32_f32(const float32x4_t val)
inline int32x2_t vround_s32_f32(const float32x2_t val)
{
#if CAROTENE_NEON_ARCH >= 8 /* get ready for ARMv9 */
#if defined(__ARM_ARCH) && (__ARM_ARCH >= 8)
return vcvtn_s32_f32(val);
#else
const float32x2_t delta = vdup_n_f32(CAROTENE_ROUND_DELTA);

32
3rdparty/fastcv/CMakeLists.txt vendored Normal file
View File

@ -0,0 +1,32 @@
if(HAVE_FASTCV)
set(FASTCV_HAL_VERSION 0.0.1 CACHE INTERNAL "")
set(FASTCV_HAL_LIBRARIES "fastcv_hal" CACHE INTERNAL "")
set(FASTCV_HAL_INCLUDE_DIRS "${CMAKE_CURRENT_SOURCE_DIR}/include" CACHE INTERNAL "")
set(FASTCV_HAL_HEADERS
"${CMAKE_CURRENT_SOURCE_DIR}/include/fastcv_hal_core.hpp"
"${CMAKE_CURRENT_SOURCE_DIR}/include/fastcv_hal_imgproc.hpp"
CACHE INTERNAL "")
file(GLOB FASTCV_HAL_FILES "${CMAKE_CURRENT_SOURCE_DIR}/src/*.cpp")
add_library(fastcv_hal STATIC ${FASTCV_HAL_FILES})
target_include_directories(fastcv_hal PRIVATE
${CMAKE_SOURCE_DIR}/modules/core/include
${CMAKE_SOURCE_DIR}/modules/imgproc/include
${FASTCV_HAL_INCLUDE_DIRS} ${FastCV_INCLUDE_PATH})
target_link_libraries(fastcv_hal PUBLIC ${FASTCV_LIBRARY})
set_target_properties(fastcv_hal PROPERTIES ARCHIVE_OUTPUT_DIRECTORY ${3P_LIBRARY_OUTPUT_PATH})
if(NOT BUILD_SHARED_LIBS)
ocv_install_target(fastcv_hal EXPORT OpenCVModules ARCHIVE DESTINATION ${OPENCV_3P_LIB_INSTALL_PATH} COMPONENT dev)
endif()
if(ENABLE_SOLUTION_FOLDERS)
set_target_properties(fastcv_hal PROPERTIES FOLDER "3rdparty")
endif()
else()
message(STATUS "FastCV is not available, disabling related HAL")
endif(HAVE_FASTCV)

44
3rdparty/fastcv/fastcv.cmake vendored Normal file
View File

@ -0,0 +1,44 @@
function(download_fastcv root_dir)
# Commit SHA in the opencv_3rdparty repo
set(FASTCV_COMMIT "f4413cc2ab7233fdfc383a4cded402c072677fb0")
# Define actual FastCV versions
if(ANDROID)
if(AARCH64)
message(STATUS "Download FastCV for Android aarch64")
set(FCV_PACKAGE_NAME "fastcv_android_aarch64_2024_12_11.tgz")
set(FCV_PACKAGE_HASH "9dac41e86597305f846212dae31a4a88")
else()
message(STATUS "Download FastCV for Android armv7")
set(FCV_PACKAGE_NAME "fastcv_android_arm32_2024_12_11.tgz")
set(FCV_PACKAGE_HASH "fe2d30334180b17e3031eee92aac43b6")
endif()
elseif(UNIX AND NOT APPLE AND NOT IOS AND NOT XROS)
if(AARCH64)
set(FCV_PACKAGE_NAME "fastcv_linux_aarch64_2025_02_12.tgz")
set(FCV_PACKAGE_HASH "33ac2a59cf3e7d6402eee2e010de1202")
else()
message("FastCV: fastcv lib for 32-bit Linux is not supported for now!")
endif()
endif(ANDROID)
# Download Package
set(OPENCV_FASTCV_URL "https://raw.githubusercontent.com/opencv/opencv_3rdparty/${FASTCV_COMMIT}/fastcv/")
ocv_download( FILENAME ${FCV_PACKAGE_NAME}
HASH ${FCV_PACKAGE_HASH}
URL ${OPENCV_FASTCV_URL}
DESTINATION_DIR ${root_dir}
ID FASTCV
STATUS res
UNPACK
RELATIVE_URL)
if(res)
set(HAVE_FASTCV TRUE CACHE BOOL "FastCV status")
else()
message(WARNING "FastCV: package download failed!")
endif()
endfunction()

View File

@ -0,0 +1,222 @@
/*
* Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved.
* SPDX-License-Identifier: Apache-2.0
*/
#ifndef OPENCV_FASTCV_HAL_CORE_HPP_INCLUDED
#define OPENCV_FASTCV_HAL_CORE_HPP_INCLUDED
#include <opencv2/core/base.hpp>
#undef cv_hal_lut
#define cv_hal_lut fastcv_hal_lut
#undef cv_hal_normHammingDiff8u
#define cv_hal_normHammingDiff8u fastcv_hal_normHammingDiff8u
#undef cv_hal_mul8u16u
#define cv_hal_mul8u16u fastcv_hal_mul8u16u
#undef cv_hal_sub8u32f
#define cv_hal_sub8u32f fastcv_hal_sub8u32f
#undef cv_hal_transpose2d
#define cv_hal_transpose2d fastcv_hal_transpose2d
#undef cv_hal_meanStdDev
#define cv_hal_meanStdDev fastcv_hal_meanStdDev
#undef cv_hal_flip
#define cv_hal_flip fastcv_hal_flip
#undef cv_hal_rotate90
#define cv_hal_rotate90 fastcv_hal_rotate
#undef cv_hal_addWeighted8u
#define cv_hal_addWeighted8u fastcv_hal_addWeighted8u
#undef cv_hal_mul8u
#define cv_hal_mul8u fastcv_hal_mul8u
#undef cv_hal_mul16s
#define cv_hal_mul16s fastcv_hal_mul16s
#undef cv_hal_mul32f
#define cv_hal_mul32f fastcv_hal_mul32f
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
/// @brief look-up table transform of an array.
/// @param src_data Source image data
/// @param src_step Source image step
/// @param src_type Source image type
/// @param lut_data Pointer to lookup table
/// @param lut_channel_size Size of each channel in bytes
/// @param lut_channels Number of channels in lookup table
/// @param dst_data Destination data
/// @param dst_step Destination step
/// @param width Width of images
/// @param height Height of images
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
int fastcv_hal_lut(
const uchar* src_data,
size_t src_step,
size_t src_type,
const uchar* lut_data,
size_t lut_channel_size,
size_t lut_channels,
uchar* dst_data,
size_t dst_step,
int width,
int height);
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
/// @brief Hamming distance between two vectors
/// @param a pointer to first vector data
/// @param b pointer to second vector data
/// @param n length of vectors
/// @param cellSize how many bits of the vectors will be added and treated as a single bit, can be 1 (standard Hamming distance), 2 or 4
/// @param result pointer to result output
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
int fastcv_hal_normHammingDiff8u(const uchar* a, const uchar* b, int n, int cellSize, int* result);
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
int fastcv_hal_mul8u16u(
const uchar * src1_data,
size_t src1_step,
const uchar * src2_data,
size_t src2_step,
ushort * dst_data,
size_t dst_step,
int width,
int height,
double scale);
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
int fastcv_hal_sub8u32f(
const uchar *src1_data,
size_t src1_step,
const uchar *src2_data,
size_t src2_step,
float *dst_data,
size_t dst_step,
int width,
int height);
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
int fastcv_hal_transpose2d(
const uchar* src_data,
size_t src_step,
uchar* dst_data,
size_t dst_step,
int src_width,
int src_height,
int element_size);
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
int fastcv_hal_meanStdDev(
const uchar * src_data,
size_t src_step,
int width,
int height,
int src_type,
double * mean_val,
double * stddev_val,
uchar * mask,
size_t mask_step);
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
/// @brief Flips a 2D array around vertical, horizontal, or both axes
/// @param src_type source and destination image type
/// @param src_data source image data
/// @param src_step source image step
/// @param src_width source and destination image width
/// @param src_height source and destination image height
/// @param dst_data destination image data
/// @param dst_step destination image step
/// @param flip_mode 0 flips around x-axis, 1 around y-axis, -1 both
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
int fastcv_hal_flip(
int src_type,
const uchar* src_data,
size_t src_step,
int src_width,
int src_height,
uchar* dst_data,
size_t dst_step,
int flip_mode);
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
/// @brief Rotates a 2D array in multiples of 90 degrees.
/// @param src_type source and destination image type
/// @param src_data source image data
/// @param src_step source image step
/// @param src_width source image width
/// @If angle has value [180] it is also destination image width
/// If angle has values [90, 270] it is also destination image height
/// @param src_height source and destination image height (destination image width for angles [90, 270])
/// If angle has value [180] it is also destination image height
/// If angle has values [90, 270] it is also destination image width
/// @param dst_data destination image data
/// @param dst_step destination image step
/// @param angle clockwise angle for rotation in degrees from set [90, 180, 270]
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
int fastcv_hal_rotate(
int src_type,
const uchar* src_data,
size_t src_step,
int src_width,
int src_height,
uchar* dst_data,
size_t dst_step,
int angle);
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
/// @brief weighted sum of two arrays using formula: dst[i] = a * src1[i] + b * src2[i]
/// @param src1_data first source image data
/// @param src1_step first source image step
/// @param src2_data second source image data
/// @param src2_step second source image step
/// @param dst_data destination image data
/// @param dst_step destination image step
/// @param width width of the images
/// @param height height of the images
/// @param scalars numbers a, b, and c
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
int fastcv_hal_addWeighted8u(
const uchar* src1_data,
size_t src1_step,
const uchar* src2_data,
size_t src2_step,
uchar* dst_data,
size_t dst_step,
int width,
int height,
const double scalars[3]);
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
int fastcv_hal_mul8u(
const uchar *src1_data,
size_t src1_step,
const uchar *src2_data,
size_t src2_step,
uchar *dst_data,
size_t dst_step,
int width,
int height,
double scale);
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
int fastcv_hal_mul16s(
const short *src1_data,
size_t src1_step,
const short *src2_data,
size_t src2_step,
short *dst_data,
size_t dst_step,
int width,
int height,
double scale);
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
int fastcv_hal_mul32f(
const float *src1_data,
size_t src1_step,
const float *src2_data,
size_t src2_step,
float *dst_data,
size_t dst_step,
int width,
int height,
double scale);
#endif

View File

@ -0,0 +1,268 @@
/*
* Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved.
* SPDX-License-Identifier: Apache-2.0
*/
#ifndef OPENCV_FASTCV_HAL_IMGPROC_HPP_INCLUDED
#define OPENCV_FASTCV_HAL_IMGPROC_HPP_INCLUDED
#include <opencv2/core/base.hpp>
#undef cv_hal_medianBlur
#define cv_hal_medianBlur fastcv_hal_medianBlur
#undef cv_hal_sobel
#define cv_hal_sobel fastcv_hal_sobel
#undef cv_hal_boxFilter
#define cv_hal_boxFilter fastcv_hal_boxFilter
#undef cv_hal_adaptiveThreshold
#define cv_hal_adaptiveThreshold fastcv_hal_adaptiveThreshold
#undef cv_hal_gaussianBlurBinomial
#define cv_hal_gaussianBlurBinomial fastcv_hal_gaussianBlurBinomial
#undef cv_hal_warpPerspective
#define cv_hal_warpPerspective fastcv_hal_warpPerspective
#undef cv_hal_pyrdown
#define cv_hal_pyrdown fastcv_hal_pyrdown
#undef cv_hal_cvtBGRtoHSV
#define cv_hal_cvtBGRtoHSV fastcv_hal_cvtBGRtoHSV
#undef cv_hal_cvtBGRtoYUVApprox
#define cv_hal_cvtBGRtoYUVApprox fastcv_hal_cvtBGRtoYUVApprox
#undef cv_hal_canny
#define cv_hal_canny fastcv_hal_canny
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
/// @brief Calculate medianBlur filter
/// @param src_data Source image data
/// @param src_step Source image step
/// @param dst_data Destination image data
/// @param dst_step Destination image step
/// @param width Source image width
/// @param height Source image height
/// @param depth Depths of source and destination image
/// @param cn Number of channels
/// @param ksize Size of kernel
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
int fastcv_hal_medianBlur(
const uchar* src_data,
size_t src_step,
uchar* dst_data,
size_t dst_step,
int width,
int height,
int depth,
int cn,
int ksize);
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
/// @brief Computes Sobel derivatives
///
/// @param src_data Source image data
/// @param src_step Source image step
/// @param dst_data Destination image data
/// @param dst_step Destination image step
/// @param width Source image width
/// @param height Source image height
/// @param src_depth Depth of source image
/// @param dst_depth Depths of destination image
/// @param cn Number of channels
/// @param margin_left Left margins for source image
/// @param margin_top Top margins for source image
/// @param margin_right Right margins for source image
/// @param margin_bottom Bottom margins for source image
/// @param dx orders of the derivative x
/// @param dy orders of the derivative y
/// @param ksize Size of kernel
/// @param scale Scale factor for the computed derivative values
/// @param delta Delta value that is added to the results prior to storing them in dst
/// @param border_type Border type
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
int fastcv_hal_sobel(
const uchar* src_data,
size_t src_step,
uchar* dst_data,
size_t dst_step,
int width,
int height,
int src_depth,
int dst_depth,
int cn,
int margin_left,
int margin_top,
int margin_right,
int margin_bottom,
int dx,
int dy,
int ksize,
double scale,
double delta,
int border_type);
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
int fastcv_hal_boxFilter(
const uchar* src_data,
size_t src_step,
uchar* dst_data,
size_t dst_step,
int width,
int height,
int src_depth,
int dst_depth,
int cn,
int margin_left,
int margin_top,
int margin_right,
int margin_bottom,
size_t ksize_width,
size_t ksize_height,
int anchor_x,
int anchor_y,
bool normalize,
int border_type);
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
int fastcv_hal_adaptiveThreshold(
const uchar* src_data,
size_t src_step,
uchar* dst_data,
size_t dst_step,
int width,
int height,
double maxValue,
int adaptiveMethod,
int thresholdType,
int blockSize,
double C);
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
/// @brief Blurs an image using a Gaussian filter.
/// @param src_data Source image data
/// @param src_step Source image step
/// @param dst_data Destination image data
/// @param dst_step Destination image step
/// @param width Source image width
/// @param height Source image height
/// @param depth Depth of source and destination image
/// @param cn Number of channels
/// @param margin_left Left margins for source image
/// @param margin_top Top margins for source image
/// @param margin_right Right margins for source image
/// @param margin_bottom Bottom margins for source image
/// @param ksize Kernel size
/// @param border_type Border type
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
int fastcv_hal_gaussianBlurBinomial(
const uchar* src_data,
size_t src_step,
uchar* dst_data,
size_t dst_step,
int width,
int height,
int depth,
int cn,
size_t margin_left,
size_t margin_top,
size_t margin_right,
size_t margin_bottom,
size_t ksize,
int border_type);
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
/// @brief Applies a perspective transformation to an image.
///
/// @param src_type Source and destination image type
/// @param src_data Source image data
/// @param src_step Source image step
/// @param src_width Source image width
/// @param src_height Source image height
/// @param dst_data Destination image data
/// @param dst_step Destination image step
/// @param dst_width Destination image width
/// @param dst_height Destination image height
/// @param M 3x3 matrix with transform coefficients
/// @param interpolation Interpolation mode (CV_HAL_INTER_NEAREST, ...)
/// @param border_type Border processing mode (CV_HAL_BORDER_REFLECT, ...)
/// @param border_value Values to use for CV_HAL_BORDER_CONSTANT mode
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
int fastcv_hal_warpPerspective(
int src_type,
const uchar* src_data,
size_t src_step,
int src_width,
int src_height,
uchar* dst_data,
size_t dst_step,
int dst_width,
int dst_height,
const double M[9],
int interpolation,
int border_type,
const double border_value[4]);
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
int fastcv_hal_pyrdown(
const uchar* src_data,
size_t src_step,
int src_width,
int src_height,
uchar* dst_data,
size_t dst_step,
int dst_width,
int dst_height,
int depth,
int cn,
int border_type);
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
int fastcv_hal_cvtBGRtoHSV(
const uchar * src_data,
size_t src_step,
uchar * dst_data,
size_t dst_step,
int width,
int height,
int depth,
int scn,
bool swapBlue,
bool isFullRange,
bool isHSV);
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
int fastcv_hal_cvtBGRtoYUVApprox(
const uchar * src_data,
size_t src_step,
uchar * dst_data,
size_t dst_step,
int width,
int height,
int depth,
int scn,
bool swapBlue,
bool isCbCr);
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
/// @brief Canny edge detector
/// @param src_data Source image data
/// @param src_step Source image step
/// @param dst_data Destination image data
/// @param dst_step Destination image step
/// @param width Source image width
/// @param height Source image height
/// @param cn Number of channels
/// @param lowThreshold low hresholds value
/// @param highThreshold high thresholds value
/// @param ksize Kernel size for Sobel operator.
/// @param L2gradient Flag, indicating use of L2 or L1 norma.
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
int fastcv_hal_canny(
const uchar* src_data,
size_t src_step,
uchar* dst_data,
size_t dst_step,
int width,
int height,
int cn,
double lowThreshold,
double highThreshold,
int ksize,
bool L2gradient);
#endif

View File

@ -0,0 +1,84 @@
/*
* Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved.
* SPDX-License-Identifier: Apache-2.0
*/
#ifndef OPENCV_FASTCV_HAL_UTILS_HPP_INCLUDED
#define OPENCV_FASTCV_HAL_UTILS_HPP_INCLUDED
#include "fastcv.h"
#include <opencv2/core/utils/logger.hpp>
#define INITIALIZATION_CHECK \
{ \
if (!FastCvContext::getContext().isInitialized) \
{ \
return CV_HAL_ERROR_UNKNOWN; \
} \
}
#define CV_HAL_RETURN(status, func) \
{ \
if( status == FASTCV_SUCCESS ) \
{ \
CV_LOG_DEBUG(NULL, "FastCV HAL for "<<#func<<" run successfully!"); \
return CV_HAL_ERROR_OK; \
} \
else if(status == FASTCV_EBADPARAM || status == FASTCV_EUNALIGNPARAM || \
status == FASTCV_EUNSUPPORTED || status == FASTCV_EHWQDSP || \
status == FASTCV_EHWGPU) \
{ \
CV_LOG_DEBUG(NULL, "FastCV status:"<<getFastCVErrorString(status) \
<<", Switching to default OpenCV solution!"); \
return CV_HAL_ERROR_NOT_IMPLEMENTED; \
} \
else \
{ \
CV_LOG_ERROR(NULL,"FastCV error:"<<getFastCVErrorString(status)); \
return CV_HAL_ERROR_UNKNOWN; \
} \
}
#define CV_HAL_RETURN_NOT_IMPLEMENTED(reason) \
{ \
CV_LOG_DEBUG(NULL,"Switching to default OpenCV\nInfo: "<<reason); \
return CV_HAL_ERROR_NOT_IMPLEMENTED; \
}
#define FCV_KernelSize_SHIFT 3
#define FCV_MAKETYPE(ksize,depth) ((ksize<<FCV_KernelSize_SHIFT) + depth)
#define FCV_CMP_EQ(val1,val2) (fabs(val1 - val2) < FLT_EPSILON)
const char* getFastCVErrorString(int status);
const char* borderToString(int border);
const char* interpolationToString(int interpolation);
struct FastCvContext
{
public:
// initialize at first call
// Defines a static local variable context. Variable is created only once.
static FastCvContext& getContext()
{
static FastCvContext context;
return context;
}
FastCvContext()
{
if (fcvSetOperationMode(FASTCV_OP_CPU_PERFORMANCE) != 0)
{
CV_LOG_WARNING(NULL, "Failed to switch FastCV operation mode");
isInitialized = false;
}
else
{
CV_LOG_INFO(NULL, "FastCV Operation Mode Switched");
isInitialized = true;
}
}
bool isInitialized;
};
#endif

574
3rdparty/fastcv/src/fastcv_hal_core.cpp vendored Normal file
View File

@ -0,0 +1,574 @@
/*
* Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved.
* SPDX-License-Identifier: Apache-2.0
*/
#include "fastcv_hal_core.hpp"
#include "fastcv_hal_utils.hpp"
#include <opencv2/core/core.hpp>
#include <opencv2/core/base.hpp>
class ParallelTableLookup : public cv::ParallelLoopBody
{
public:
ParallelTableLookup(const uchar* src_data_, int width_, size_t src_step_, const uchar* lut_data_, uchar* dst_data_, size_t dst_step_) :
cv::ParallelLoopBody(), src_data(src_data_), width(width_), src_step(src_step_), lut_data(lut_data_), dst_data(dst_data_), dst_step(dst_step_)
{
}
virtual void operator()(const cv::Range& range) const CV_OVERRIDE
{
fcvStatus status = FASTCV_SUCCESS;
for (int y = range.start; y < range.end; y++) {
status = fcvTableLookupu8((uint8_t*)src_data + y * src_step, width, 1, src_step, (uint8_t*)lut_data, (uint8_t*)dst_data + y * dst_step, dst_step);
if(status != FASTCV_SUCCESS)
CV_LOG_ERROR(NULL,"FastCV error:"<<getFastCVErrorString(status));
}
}
private:
const uchar* src_data;
int width;
size_t src_step;
const uchar* lut_data;
uchar* dst_data;
size_t dst_step;
};
int fastcv_hal_lut(
const uchar* src_data,
size_t src_step,
size_t src_type,
const uchar* lut_data,
size_t lut_channel_size,
size_t lut_channels,
uchar* dst_data,
size_t dst_step,
int width,
int height)
{
if((width*height)<=(320*240))
CV_HAL_RETURN_NOT_IMPLEMENTED("Switching to default OpenCV solution!");
INITIALIZATION_CHECK;
fcvStatus status;
if (src_type == CV_8UC1 && lut_channels == 1 && lut_channel_size == 1)
{
cv::parallel_for_(cv::Range(0, height),
ParallelTableLookup(src_data, width, src_step, lut_data, dst_data, dst_step));
status = FASTCV_SUCCESS;
CV_HAL_RETURN(status, hal_lut);
}
else
{
CV_HAL_RETURN_NOT_IMPLEMENTED("Multi-channel input is not supported");
}
}
int fastcv_hal_normHammingDiff8u(
const uchar* a,
const uchar* b,
int n,
int cellSize,
int* result)
{
fcvStatus status;
if (cellSize != 1)
CV_HAL_RETURN_NOT_IMPLEMENTED(cv::format("NORM_HAMMING2 cellSize:%d is not supported", cellSize));
INITIALIZATION_CHECK;
uint32_t dist = 0;
dist = fcvHammingDistanceu8((uint8_t*)a, (uint8_t*)b, n);
*result = dist;
status = FASTCV_SUCCESS;
CV_HAL_RETURN(status, hal_normHammingDiff8u);
}
int fastcv_hal_mul8u16u(
const uchar* src1_data,
size_t src1_step,
const uchar* src2_data,
size_t src2_step,
ushort* dst_data,
size_t dst_step,
int width,
int height,
double scale)
{
if(scale != 1.0)
CV_HAL_RETURN_NOT_IMPLEMENTED("Scale factor not supported");
INITIALIZATION_CHECK;
fcvStatus status = FASTCV_SUCCESS;
if (src1_step < (size_t)width && src2_step < (size_t)width)
{
src1_step = width*sizeof(uchar);
src2_step = width*sizeof(uchar);
dst_step = width*sizeof(ushort);
}
status = fcvElementMultiplyu8u16_v2(src1_data, width, height, src1_step,
src2_data, src2_step, dst_data, dst_step);
CV_HAL_RETURN(status,hal_multiply);
}
int fastcv_hal_sub8u32f(
const uchar* src1_data,
size_t src1_step,
const uchar* src2_data,
size_t src2_step,
float* dst_data,
size_t dst_step,
int width,
int height)
{
INITIALIZATION_CHECK;
fcvStatus status = FASTCV_SUCCESS;
if (src1_step < (size_t)width && src2_step < (size_t)width)
{
src1_step = width*sizeof(uchar);
src2_step = width*sizeof(uchar);
dst_step = width*sizeof(float);
}
status = fcvImageDiffu8f32_v2(src1_data, src2_data, width, height, src1_step,
src2_step, dst_data, dst_step);
CV_HAL_RETURN(status,hal_subtract);
}
int fastcv_hal_transpose2d(
const uchar* src_data,
size_t src_step,
uchar* dst_data,
size_t dst_step,
int src_width,
int src_height,
int element_size)
{
INITIALIZATION_CHECK;
if (src_data == dst_data)
CV_HAL_RETURN_NOT_IMPLEMENTED("In-place not supported");
fcvStatus status = FASTCV_SUCCESS;
switch (element_size)
{
case 1:
status = fcvTransposeu8_v2(src_data, src_width, src_height, src_step,
dst_data, dst_step);
break;
case 2:
status = fcvTransposeu16_v2((const uint16_t*)src_data, src_width, src_height,
src_step, (uint16_t*)dst_data, dst_step);
break;
case 4:
status = fcvTransposef32_v2((const float32_t*)src_data, src_width, src_height,
src_step, (float32_t*)dst_data, dst_step);
break;
default:
CV_HAL_RETURN_NOT_IMPLEMENTED("srcType not supported");
}
CV_HAL_RETURN(status,hal_transpose);
}
int fastcv_hal_meanStdDev(
const uchar* src_data,
size_t src_step,
int width,
int height,
int src_type,
double* mean_val,
double* stddev_val,
uchar* mask,
size_t mask_step)
{
INITIALIZATION_CHECK;
CV_UNUSED(mask_step);
if(src_type != CV_8UC1)
{
CV_HAL_RETURN_NOT_IMPLEMENTED("src type not supported");
}
else if(mask != nullptr)
{
CV_HAL_RETURN_NOT_IMPLEMENTED("mask not supported");
}
else if(mean_val == nullptr && stddev_val == nullptr)
{
CV_HAL_RETURN_NOT_IMPLEMENTED("null ptr for mean and stddev");
}
float32_t mean, variance;
fcvStatus status = fcvImageIntensityStats_v2(src_data, src_step, 0, 0, width, height,
&mean, &variance, FASTCV_BIASED_VARIANCE_ESTIMATOR);
if(mean_val != nullptr)
*mean_val = mean;
if(stddev_val != nullptr)
*stddev_val = std::sqrt(variance);
CV_HAL_RETURN(status,hal_meanStdDev);
}
int fastcv_hal_flip(
int src_type,
const uchar* src_data,
size_t src_step,
int src_width,
int src_height,
uchar* dst_data,
size_t dst_step,
int flip_mode)
{
INITIALIZATION_CHECK;
if(src_type!=CV_8UC1 && src_type!=CV_16UC1 && src_type!=CV_8UC3)
CV_HAL_RETURN_NOT_IMPLEMENTED("Data type is not supported, Switching to default OpenCV solution!");
if((src_width*src_height)<=(640*480))
CV_HAL_RETURN_NOT_IMPLEMENTED("Switching to default OpenCV solution!");
fcvStatus status = FASTCV_SUCCESS;;
fcvFlipDir dir;
switch (flip_mode)
{
//Flip around X-Axis: Vertical Flip or FLIP_ROWS
case 0:
CV_HAL_RETURN_NOT_IMPLEMENTED("Switching to default OpenCV solution due to low perf!");
dir = FASTCV_FLIP_VERT;
break;
//Flip around Y-Axis: Horizontal Flip or FLIP_COLS
case 1:
dir = FASTCV_FLIP_HORIZ;
break;
//Flip around both X and Y-Axis or FLIP_BOTH
case -1:
dir = FASTCV_FLIP_BOTH;
break;
default:
CV_HAL_RETURN_NOT_IMPLEMENTED("Invalid flip_mode, Switching to default OpenCV solution!");
}
if(src_type==CV_8UC1)
fcvFlipu8(src_data, src_width, src_height, src_step, dst_data, dst_step, dir);
else if(src_type==CV_16UC1)
fcvFlipu16((uint16_t*)src_data, src_width, src_height, src_step, (uint16_t*)dst_data, dst_step, dir);
else if(src_type==CV_8UC3)
status = fcvFlipRGB888u8((uint8_t*)src_data, src_width, src_height, src_step, (uint8_t*)dst_data, dst_step, dir);
else
CV_HAL_RETURN_NOT_IMPLEMENTED(cv::format("Data type:%d is not supported, Switching to default OpenCV solution!", src_type));
CV_HAL_RETURN(status, hal_flip);
}
int fastcv_hal_rotate(
int src_type,
const uchar* src_data,
size_t src_step,
int src_width,
int src_height,
uchar* dst_data,
size_t dst_step,
int angle)
{
if((src_width*src_height)<(120*80))
CV_HAL_RETURN_NOT_IMPLEMENTED("Switching to default OpenCV solution for lower resolution!");
fcvStatus status;
fcvRotateDegree degree;
if (src_type != CV_8UC1 && src_type != CV_8UC2)
CV_HAL_RETURN_NOT_IMPLEMENTED(cv::format("src_type:%d is not supported", src_type));
INITIALIZATION_CHECK;
switch (angle)
{
case 90:
degree = FASTCV_ROTATE_90;
break;
case 180:
degree = FASTCV_ROTATE_180;
break;
case 270:
degree = FASTCV_ROTATE_270;
break;
default:
CV_HAL_RETURN_NOT_IMPLEMENTED(cv::format("Rotation angle:%d is not supported", angle));
}
switch(src_type)
{
case CV_8UC1:
status = fcvRotateImageu8(src_data, src_width, src_height, src_step, dst_data, dst_step, degree);
break;
case CV_8UC2:
status = fcvRotateImageInterleavedu8((uint8_t*)src_data, src_width, src_height, src_step, (uint8_t*)dst_data,
dst_step, degree);
break;
default:
CV_HAL_RETURN_NOT_IMPLEMENTED(cv::format("src_type:%d is not supported", src_type));
}
CV_HAL_RETURN(status, hal_rotate);
}
int fastcv_hal_addWeighted8u(
const uchar* src1_data,
size_t src1_step,
const uchar* src2_data,
size_t src2_step,
uchar* dst_data,
size_t dst_step,
int width,
int height,
const double scalars[3])
{
if( (scalars[0] < -128.0f) || (scalars[0] >= 128.0f) ||
(scalars[1] < -128.0f) || (scalars[1] >= 128.0f) ||
(scalars[2] < -(1<<23))|| (scalars[2] >= 1<<23))
CV_HAL_RETURN_NOT_IMPLEMENTED(
cv::format("Alpha:%f,Beta:%f,Gamma:%f is not supported because it's too large or too small\n",
scalars[0],scalars[1],scalars[2]));
INITIALIZATION_CHECK;
fcvStatus status = FASTCV_SUCCESS;
if (height == 1)
{
src1_step = width*sizeof(uchar);
src2_step = width*sizeof(uchar);
dst_step = width*sizeof(uchar);
cv::parallel_for_(cv::Range(0, width), [&](const cv::Range &range){
int rangeWidth = range.end - range.start;
const uint8_t *src1 = src1_data + range.start;
const uint8_t *src2 = src2_data + range.start;
uint8_t *dst = dst_data + range.start;
fcvAddWeightedu8_v2(src1, rangeWidth, height, src1_step, src2, src2_step,
scalars[0], scalars[1], scalars[2], dst, dst_step);
});
}
else
{
cv::parallel_for_(cv::Range(0, height), [&](const cv::Range &range){
int rangeHeight = range.end - range.start;
const uint8_t *src1 = src1_data + range.start * src1_step;
const uint8_t *src2 = src2_data + range.start * src2_step;
uint8_t *dst = dst_data + range.start * dst_step;
fcvAddWeightedu8_v2(src1, width, rangeHeight, src1_step, src2, src2_step,
scalars[0], scalars[1], scalars[2], dst, dst_step);
});
}
CV_HAL_RETURN(status, hal_addWeighted8u_v2);
}
int fastcv_hal_mul8u(
const uchar *src1_data,
size_t src1_step,
const uchar *src2_data,
size_t src2_step,
uchar *dst_data,
size_t dst_step,
int width,
int height,
double scale)
{
int8_t sF;
if(FCV_CMP_EQ(scale,1.0)) { sF = 0; }
else if(scale > 1.0)
{
if(FCV_CMP_EQ(scale,2.0)) { sF = -1; }
else if(FCV_CMP_EQ(scale,4.0)) { sF = -2; }
else if(FCV_CMP_EQ(scale,8.0)) { sF = -3; }
else if(FCV_CMP_EQ(scale,16.0)) { sF = -4; }
else if(FCV_CMP_EQ(scale,32.0)) { sF = -5; }
else if(FCV_CMP_EQ(scale,64.0)) { sF = -6; }
else if(FCV_CMP_EQ(scale,128.0)) { sF = -7; }
else if(FCV_CMP_EQ(scale,256.0)) { sF = -8; }
else CV_HAL_RETURN_NOT_IMPLEMENTED("scale factor not supported");
}
else if(scale > 0 && scale < 1.0)
{
if(FCV_CMP_EQ(scale,1/2.0)) { sF = 1; }
else if(FCV_CMP_EQ(scale,1/4.0)) { sF = 2; }
else if(FCV_CMP_EQ(scale,1/8.0)) { sF = 3; }
else if(FCV_CMP_EQ(scale,1/16.0)) { sF = 4; }
else if(FCV_CMP_EQ(scale,1/32.0)) { sF = 5; }
else if(FCV_CMP_EQ(scale,1/64.0)) { sF = 6; }
else if(FCV_CMP_EQ(scale,1/128.0)) { sF = 7; }
else if(FCV_CMP_EQ(scale,1/256.0)) { sF = 8; }
else CV_HAL_RETURN_NOT_IMPLEMENTED("scale factor not supported");
}
else
CV_HAL_RETURN_NOT_IMPLEMENTED("scale factor not supported");
INITIALIZATION_CHECK;
int nStripes = cv::getNumThreads();
if(height == 1)
{
cv::parallel_for_(cv::Range(0, width), [&](const cv::Range &range){
int rangeWidth = range.end - range.start;
const uchar* yS1 = src1_data + static_cast<size_t>(range.start);
const uchar* yS2 = src2_data + static_cast<size_t>(range.start);
uchar* yD = dst_data + static_cast<size_t>(range.start);
fcvElementMultiplyu8(yS1, rangeWidth, 1, 0, yS2, 0, sF,
FASTCV_CONVERT_POLICY_SATURATE, yD, 0);
}, nStripes);
}
else
{
cv::parallel_for_(cv::Range(0, height), [&](const cv::Range &range){
int rangeHeight = range.end - range.start;
const uchar* yS1 = src1_data + static_cast<size_t>(range.start)*src1_step;
const uchar* yS2 = src2_data + static_cast<size_t>(range.start)*src2_step;
uchar* yD = dst_data + static_cast<size_t>(range.start)*dst_step;
fcvElementMultiplyu8(yS1, width, rangeHeight, src1_step, yS2, src2_step,
sF, FASTCV_CONVERT_POLICY_SATURATE, yD, dst_step);
}, nStripes);
}
fcvStatus status = FASTCV_SUCCESS;
CV_HAL_RETURN(status, hal_mul8u);
}
int fastcv_hal_mul16s(
const short *src1_data,
size_t src1_step,
const short *src2_data,
size_t src2_step,
short *dst_data,
size_t dst_step,
int width,
int height,
double scale)
{
int8_t sF;
if(FCV_CMP_EQ(scale,1.0)) { sF = 0; }
else if(scale > 1.0)
{
if(FCV_CMP_EQ(scale,2.0)) { sF = -1; }
else if(FCV_CMP_EQ(scale,4.0)) { sF = -2; }
else if(FCV_CMP_EQ(scale,8.0)) { sF = -3; }
else if(FCV_CMP_EQ(scale,16.0)) { sF = -4; }
else if(FCV_CMP_EQ(scale,32.0)) { sF = -5; }
else if(FCV_CMP_EQ(scale,64.0)) { sF = -6; }
else if(FCV_CMP_EQ(scale,128.0)) { sF = -7; }
else if(FCV_CMP_EQ(scale,256.0)) { sF = -8; }
else CV_HAL_RETURN_NOT_IMPLEMENTED("scale factor not supported");
}
else if(scale > 0 && scale < 1.0)
{
if(FCV_CMP_EQ(scale,1/2.0)) { sF = 1; }
else if(FCV_CMP_EQ(scale,1/4.0)) { sF = 2; }
else if(FCV_CMP_EQ(scale,1/8.0)) { sF = 3; }
else if(FCV_CMP_EQ(scale,1/16.0)) { sF = 4; }
else if(FCV_CMP_EQ(scale,1/32.0)) { sF = 5; }
else if(FCV_CMP_EQ(scale,1/64.0)) { sF = 6; }
else if(FCV_CMP_EQ(scale,1/128.0)) { sF = 7; }
else if(FCV_CMP_EQ(scale,1/256.0)) { sF = 8; }
else CV_HAL_RETURN_NOT_IMPLEMENTED("scale factor not supported");
}
else
CV_HAL_RETURN_NOT_IMPLEMENTED("scale factor not supported");
INITIALIZATION_CHECK;
int nStripes = cv::getNumThreads();
if(height == 1)
{
cv::parallel_for_(cv::Range(0, width), [&](const cv::Range &range){
int rangeWidth = range.end - range.start;
const short* yS1 = src1_data + static_cast<size_t>(range.start);
const short* yS2 = src2_data + static_cast<size_t>(range.start);
short* yD = dst_data + static_cast<size_t>(range.start);
fcvElementMultiplys16(yS1, rangeWidth, 1, 0, yS2, 0, sF,
FASTCV_CONVERT_POLICY_SATURATE, yD, 0);
}, nStripes);
}
else
{
cv::parallel_for_(cv::Range(0, height), [&](const cv::Range &range){
int rangeHeight = range.end - range.start;
const short* yS1 = src1_data + static_cast<size_t>(range.start) * (src1_step/sizeof(short));
const short* yS2 = src2_data + static_cast<size_t>(range.start) * (src2_step/sizeof(short));
short* yD = dst_data + static_cast<size_t>(range.start) * (dst_step/sizeof(short));
fcvElementMultiplys16(yS1, width, rangeHeight, src1_step, yS2, src2_step,
sF, FASTCV_CONVERT_POLICY_SATURATE, yD, dst_step);
}, nStripes);
}
fcvStatus status = FASTCV_SUCCESS;
CV_HAL_RETURN(status, hal_mul16s);
}
int fastcv_hal_mul32f(
const float *src1_data,
size_t src1_step,
const float *src2_data,
size_t src2_step,
float *dst_data,
size_t dst_step,
int width,
int height,
double scale)
{
if(!FCV_CMP_EQ(scale,1.0))
CV_HAL_RETURN_NOT_IMPLEMENTED("scale factor not supported");
INITIALIZATION_CHECK;
int nStripes = cv::getNumThreads();
if(height == 1)
{
cv::parallel_for_(cv::Range(0, width), [&](const cv::Range &range){
int rangeWidth = range.end - range.start;
const float* yS1 = src1_data + static_cast<size_t>(range.start);
const float* yS2 = src2_data + static_cast<size_t>(range.start);
float* yD = dst_data + static_cast<size_t>(range.start);
fcvElementMultiplyf32(yS1, rangeWidth, 1, 0, yS2, 0, yD, 0);
}, nStripes);
}
else
{
cv::parallel_for_(cv::Range(0, height), [&](const cv::Range &range){
int rangeHeight = range.end - range.start;
const float* yS1 = src1_data + static_cast<size_t>(range.start) * (src1_step/sizeof(float));
const float* yS2 = src2_data + static_cast<size_t>(range.start) * (src2_step/sizeof(float));
float* yD = dst_data + static_cast<size_t>(range.start) * (dst_step/sizeof(float));
fcvElementMultiplyf32(yS1, width, rangeHeight, src1_step,
yS2, src2_step, yD, dst_step);
}, nStripes);
}
fcvStatus status = FASTCV_SUCCESS;
CV_HAL_RETURN(status, hal_mul32f);
}

1050
3rdparty/fastcv/src/fastcv_hal_imgproc.cpp vendored Normal file

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,56 @@
/*
* Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved.
* SPDX-License-Identifier: Apache-2.0
*/
#include "fastcv_hal_utils.hpp"
const char* getFastCVErrorString(int status)
{
switch(status)
{
case FASTCV_SUCCESS: return "Successful";
case FASTCV_EFAIL: return "General failure";
case FASTCV_EUNALIGNPARAM: return "Unaligned pointer parameter";
case FASTCV_EBADPARAM: return "Bad parameters";
case FASTCV_EINVALSTATE: return "Called at invalid state";
case FASTCV_ENORES: return "Insufficient resources, memory, thread, etc";
case FASTCV_EUNSUPPORTED: return "Unsupported feature";
case FASTCV_EHWQDSP: return "Hardware QDSP failed to respond";
case FASTCV_EHWGPU: return "Hardware GPU failed to respond";
default: return "Unknown FastCV Error";
}
}
const char* borderToString(int border)
{
switch (border)
{
case 0: return "BORDER_CONSTANT";
case 1: return "BORDER_REPLICATE";
case 2: return "BORDER_REFLECT";
case 3: return "BORDER_WRAP";
case 4: return "BORDER_REFLECT_101";
case 5: return "BORDER_TRANSPARENT";
default: return "Unknown border type";
}
}
const char* interpolationToString(int interpolation)
{
switch (interpolation)
{
case 0: return "INTER_NEAREST";
case 1: return "INTER_LINEAR";
case 2: return "INTER_CUBIC";
case 3: return "INTER_AREA";
case 4: return "INTER_LANCZOS4";
case 5: return "INTER_LINEAR_EXACT";
case 6: return "INTER_NEAREST_EXACT";
case 7: return "INTER_MAX";
case 8: return "WARP_FILL_OUTLIERS";
case 16: return "WARP_INVERSE_MAP";
case 32: return "WARP_RELATIVE_MAP";
default: return "Unknown interpolation type";
}
}

View File

@ -1,8 +1,8 @@
# Binaries branch name: ffmpeg/4.x_20240522
# Binaries were created for OpenCV: 8393885a39dac1e650bf5d0aaff84c04ad8bcdd3
ocv_update(FFMPEG_BINARIES_COMMIT "394dca6ceb3085c979415e6385996b6570e94153")
ocv_update(FFMPEG_FILE_HASH_BIN32 "bdfbd1efb295f3e54c07d2cb7a843bf9")
ocv_update(FFMPEG_FILE_HASH_BIN64 "bfef029900f788480a363d6dc05c4f0e")
# Binaries branch name: ffmpeg/4.x_20241226
# Binaries were created for OpenCV: 09892c9d1706f40342bda0bc404580f63492d9f8
ocv_update(FFMPEG_BINARIES_COMMIT "d63d7c154c57242bf2283be61166be2bd30ec47e")
ocv_update(FFMPEG_FILE_HASH_BIN32 "642b94d032a8292b07550126934173f6")
ocv_update(FFMPEG_FILE_HASH_BIN64 "a8c3560c8f20e1ae465bef81580fa92c")
ocv_update(FFMPEG_FILE_HASH_CMAKE "8862c87496e2e8c375965e1277dee1c7")
function(download_win_ffmpeg script_var)

View File

@ -19,4 +19,15 @@
#include "version/hal_rvv_071.hpp"
#endif
#endif
#if defined(__riscv_v) && __riscv_v == 1000000
#include "hal_rvv_1p0/merge.hpp" // core
#include "hal_rvv_1p0/mean.hpp" // core
#include "hal_rvv_1p0/norm.hpp" // core
#include "hal_rvv_1p0/norm_diff.hpp" // core
#include "hal_rvv_1p0/convert_scale.hpp" // core
#include "hal_rvv_1p0/minmax.hpp" // core
#include "hal_rvv_1p0/atan.hpp" // core
#include "hal_rvv_1p0/split.hpp" // core
#endif
#endif

128
3rdparty/hal_rvv/hal_rvv_1p0/atan.hpp vendored Normal file
View File

@ -0,0 +1,128 @@
// This file is part of OpenCV project.
// It is subject to the license terms in the LICENSE file found in the top-level
// directory of this distribution and at http://opencv.org/license.html.
#pragma once
#undef cv_hal_fastAtan32f
#define cv_hal_fastAtan32f cv::cv_hal_rvv::fast_atan_32
#undef cv_hal_fastAtan64f
#define cv_hal_fastAtan64f cv::cv_hal_rvv::fast_atan_64
#include <riscv_vector.h>
#include <cfloat>
namespace cv::cv_hal_rvv {
namespace detail {
// ref: mathfuncs_core.simd.hpp
static constexpr float pi = CV_PI;
static constexpr float atan2_p1 = 0.9997878412794807F * (180 / pi);
static constexpr float atan2_p3 = -0.3258083974640975F * (180 / pi);
static constexpr float atan2_p5 = 0.1555786518463281F * (180 / pi);
static constexpr float atan2_p7 = -0.04432655554792128F * (180 / pi);
__attribute__((always_inline)) inline vfloat32m4_t
rvv_atan_f32(vfloat32m4_t vy, vfloat32m4_t vx, size_t vl, float p7,
vfloat32m4_t vp5, vfloat32m4_t vp3, vfloat32m4_t vp1,
float angle_90_deg) {
const auto ax = __riscv_vfabs(vx, vl);
const auto ay = __riscv_vfabs(vy, vl);
const auto c = __riscv_vfdiv(
__riscv_vfmin(ax, ay, vl),
__riscv_vfadd(__riscv_vfmax(ax, ay, vl), FLT_EPSILON, vl), vl);
const auto c2 = __riscv_vfmul(c, c, vl);
auto a = __riscv_vfmadd(c2, p7, vp5, vl);
a = __riscv_vfmadd(a, c2, vp3, vl);
a = __riscv_vfmadd(a, c2, vp1, vl);
a = __riscv_vfmul(a, c, vl);
const auto mask = __riscv_vmflt(ax, ay, vl);
a = __riscv_vfrsub_mu(mask, a, a, angle_90_deg, vl);
a = __riscv_vfrsub_mu(__riscv_vmflt(vx, 0.F, vl), a, a, angle_90_deg * 2,
vl);
a = __riscv_vfrsub_mu(__riscv_vmflt(vy, 0.F, vl), a, a, angle_90_deg * 4,
vl);
return a;
}
} // namespace detail
inline int fast_atan_32(const float *y, const float *x, float *dst, size_t n,
bool angle_in_deg) {
const float scale = angle_in_deg ? 1.f : CV_PI / 180.f;
const float p1 = detail::atan2_p1 * scale;
const float p3 = detail::atan2_p3 * scale;
const float p5 = detail::atan2_p5 * scale;
const float p7 = detail::atan2_p7 * scale;
const float angle_90_deg = 90.F * scale;
static size_t vlmax = __riscv_vsetvlmax_e32m4();
auto vp1 = __riscv_vfmv_v_f_f32m4(p1, vlmax);
auto vp3 = __riscv_vfmv_v_f_f32m4(p3, vlmax);
auto vp5 = __riscv_vfmv_v_f_f32m4(p5, vlmax);
for (size_t vl{}; n > 0; n -= vl) {
vl = __riscv_vsetvl_e32m4(n);
auto vy = __riscv_vle32_v_f32m4(y, vl);
auto vx = __riscv_vle32_v_f32m4(x, vl);
auto a =
detail::rvv_atan_f32(vy, vx, vl, p7, vp5, vp3, vp1, angle_90_deg);
__riscv_vse32(dst, a, vl);
x += vl;
y += vl;
dst += vl;
}
return CV_HAL_ERROR_OK;
}
inline int fast_atan_64(const double *y, const double *x, double *dst, size_t n,
bool angle_in_deg) {
// this also uses float32 version, ref: mathfuncs_core.simd.hpp
const float scale = angle_in_deg ? 1.f : CV_PI / 180.f;
const float p1 = detail::atan2_p1 * scale;
const float p3 = detail::atan2_p3 * scale;
const float p5 = detail::atan2_p5 * scale;
const float p7 = detail::atan2_p7 * scale;
const float angle_90_deg = 90.F * scale;
static size_t vlmax = __riscv_vsetvlmax_e32m4();
auto vp1 = __riscv_vfmv_v_f_f32m4(p1, vlmax);
auto vp3 = __riscv_vfmv_v_f_f32m4(p3, vlmax);
auto vp5 = __riscv_vfmv_v_f_f32m4(p5, vlmax);
for (size_t vl{}; n > 0; n -= vl) {
vl = __riscv_vsetvl_e64m8(n);
auto wy = __riscv_vle64_v_f64m8(y, vl);
auto wx = __riscv_vle64_v_f64m8(x, vl);
auto vy = __riscv_vfncvt_f_f_w_f32m4(wy, vl);
auto vx = __riscv_vfncvt_f_f_w_f32m4(wx, vl);
auto a =
detail::rvv_atan_f32(vy, vx, vl, p7, vp5, vp3, vp1, angle_90_deg);
auto wa = __riscv_vfwcvt_f_f_v_f64m8(a, vl);
__riscv_vse64(dst, wa, vl);
x += vl;
y += vl;
dst += vl;
}
return CV_HAL_ERROR_OK;
}
} // namespace cv::cv_hal_rvv

View File

@ -0,0 +1,120 @@
// This file is part of OpenCV project.
// It is subject to the license terms in the LICENSE file found in the top-level directory
// of this distribution and at http://opencv.org/license.html.
#ifndef OPENCV_HAL_RVV_CONVERT_SCALE_HPP_INCLUDED
#define OPENCV_HAL_RVV_CONVERT_SCALE_HPP_INCLUDED
#include <riscv_vector.h>
namespace cv { namespace cv_hal_rvv {
#undef cv_hal_convertScale
#define cv_hal_convertScale cv::cv_hal_rvv::convertScale
inline int convertScale_8U8U(const uchar* src, size_t src_step, uchar* dst, size_t dst_step, int width, int height, double alpha, double beta)
{
int vlmax = __riscv_vsetvlmax_e32m8();
auto vec_b = __riscv_vfmv_v_f_f32m8(beta, vlmax);
float a = alpha;
for (int i = 0; i < height; i++)
{
const uchar* src_row = src + i * src_step;
uchar* dst_row = dst + i * dst_step;
int vl;
for (int j = 0; j < width; j += vl)
{
vl = __riscv_vsetvl_e8m2(width - j);
auto vec_src = __riscv_vle8_v_u8m2(src_row + j, vl);
auto vec_src_u16 = __riscv_vzext_vf2(vec_src, vl);
auto vec_src_f32 = __riscv_vfwcvt_f(vec_src_u16, vl);
auto vec_fma = __riscv_vfmadd(vec_src_f32, a, vec_b, vl);
auto vec_dst_u16 = __riscv_vfncvt_xu(vec_fma, vl);
auto vec_dst = __riscv_vnclipu(vec_dst_u16, 0, __RISCV_VXRM_RNU, vl);
__riscv_vse8_v_u8m2(dst_row + j, vec_dst, vl);
}
}
return CV_HAL_ERROR_OK;
}
inline int convertScale_8U32F(const uchar* src, size_t src_step, uchar* dst, size_t dst_step, int width, int height, double alpha, double beta)
{
int vlmax = __riscv_vsetvlmax_e32m8();
auto vec_b = __riscv_vfmv_v_f_f32m8(beta, vlmax);
float a = alpha;
for (int i = 0; i < height; i++)
{
const uchar* src_row = src + i * src_step;
float* dst_row = reinterpret_cast<float*>(dst + i * dst_step);
int vl;
for (int j = 0; j < width; j += vl)
{
vl = __riscv_vsetvl_e8m2(width - j);
auto vec_src = __riscv_vle8_v_u8m2(src_row + j, vl);
auto vec_src_u16 = __riscv_vzext_vf2(vec_src, vl);
auto vec_src_f32 = __riscv_vfwcvt_f(vec_src_u16, vl);
auto vec_fma = __riscv_vfmadd(vec_src_f32, a, vec_b, vl);
__riscv_vse32_v_f32m8(dst_row + j, vec_fma, vl);
}
}
return CV_HAL_ERROR_OK;
}
inline int convertScale_32F32F(const uchar* src, size_t src_step, uchar* dst, size_t dst_step, int width, int height, double alpha, double beta)
{
int vlmax = __riscv_vsetvlmax_e32m8();
auto vec_b = __riscv_vfmv_v_f_f32m8(beta, vlmax);
float a = alpha;
for (int i = 0; i < height; i++)
{
const float* src_row = reinterpret_cast<const float*>(src + i * src_step);
float* dst_row = reinterpret_cast<float*>(dst + i * dst_step);
int vl;
for (int j = 0; j < width; j += vl)
{
vl = __riscv_vsetvl_e32m8(width - j);
auto vec_src = __riscv_vle32_v_f32m8(src_row + j, vl);
auto vec_fma = __riscv_vfmadd(vec_src, a, vec_b, vl);
__riscv_vse32_v_f32m8(dst_row + j, vec_fma, vl);
}
}
return CV_HAL_ERROR_OK;
}
inline int convertScale(const uchar* src, size_t src_step, uchar* dst, size_t dst_step, int width, int height,
int sdepth, int ddepth, double alpha, double beta)
{
if (!dst)
return CV_HAL_ERROR_OK;
switch (sdepth)
{
case CV_8U:
switch (ddepth)
{
case CV_8U:
return convertScale_8U8U(src, src_step, dst, dst_step, width, height, alpha, beta);
case CV_32F:
return convertScale_8U32F(src, src_step, dst, dst_step, width, height, alpha, beta);
}
return CV_HAL_ERROR_NOT_IMPLEMENTED;
case CV_32F:
switch (ddepth)
{
case CV_32F:
return convertScale_32F32F(src, src_step, dst, dst_step, width, height, alpha, beta);
}
return CV_HAL_ERROR_NOT_IMPLEMENTED;
}
return CV_HAL_ERROR_NOT_IMPLEMENTED;
}
}}
#endif

228
3rdparty/hal_rvv/hal_rvv_1p0/mean.hpp vendored Normal file
View File

@ -0,0 +1,228 @@
// This file is part of OpenCV project.
// It is subject to the license terms in the LICENSE file found in the top-level directory
// of this distribution and at http://opencv.org/license.html.
#ifndef OPENCV_HAL_RVV_MEANSTDDEV_HPP_INCLUDED
#define OPENCV_HAL_RVV_MEANSTDDEV_HPP_INCLUDED
#include <riscv_vector.h>
namespace cv { namespace cv_hal_rvv {
#undef cv_hal_meanStdDev
#define cv_hal_meanStdDev cv::cv_hal_rvv::meanStdDev
inline int meanStdDev_8UC1(const uchar* src_data, size_t src_step, int width, int height,
double* mean_val, double* stddev_val, uchar* mask, size_t mask_step);
inline int meanStdDev_8UC4(const uchar* src_data, size_t src_step, int width, int height,
double* mean_val, double* stddev_val, uchar* mask, size_t mask_step);
inline int meanStdDev_32FC1(const uchar* src_data, size_t src_step, int width, int height,
double* mean_val, double* stddev_val, uchar* mask, size_t mask_step);
inline int meanStdDev(const uchar* src_data, size_t src_step, int width, int height,
int src_type, double* mean_val, double* stddev_val, uchar* mask, size_t mask_step) {
switch (src_type)
{
case CV_8UC1:
return meanStdDev_8UC1(src_data, src_step, width, height, mean_val, stddev_val, mask, mask_step);
case CV_8UC4:
return meanStdDev_8UC4(src_data, src_step, width, height, mean_val, stddev_val, mask, mask_step);
case CV_32FC1:
return meanStdDev_32FC1(src_data, src_step, width, height, mean_val, stddev_val, mask, mask_step);
default:
return CV_HAL_ERROR_NOT_IMPLEMENTED;
}
}
inline int meanStdDev_8UC1(const uchar* src_data, size_t src_step, int width, int height,
double* mean_val, double* stddev_val, uchar* mask, size_t mask_step) {
int nz = 0;
int vlmax = __riscv_vsetvlmax_e64m8();
vuint64m8_t vec_sum = __riscv_vmv_v_x_u64m8(0, vlmax);
vuint64m8_t vec_sqsum = __riscv_vmv_v_x_u64m8(0, vlmax);
if (mask) {
for (int i = 0; i < height; ++i) {
const uchar* src_row = src_data + i * src_step;
const uchar* mask_row = mask + i * mask_step;
int j = 0, vl;
for ( ; j < width; j += vl) {
vl = __riscv_vsetvl_e8m1(width - j);
auto vec_pixel_u8 = __riscv_vle8_v_u8m1(src_row + j, vl);
auto vmask_u8 = __riscv_vle8_v_u8m1(mask_row+j, vl);
auto vec_pixel = __riscv_vzext_vf4(vec_pixel_u8, vl);
auto vmask = __riscv_vmseq_vx_u8m1_b8(vmask_u8, 1, vl);
vec_sum = __riscv_vwaddu_wv_u64m8_tumu(vmask, vec_sum, vec_sum, vec_pixel, vl);
vec_sqsum = __riscv_vwmaccu_vv_u64m8_tumu(vmask, vec_sqsum, vec_pixel, vec_pixel, vl);
nz += __riscv_vcpop_m_b8(vmask, vl);
}
}
} else {
for (int i = 0; i < height; i++) {
const uchar* src_row = src_data + i * src_step;
int j = 0, vl;
for ( ; j < width; j += vl) {
vl = __riscv_vsetvl_e8m1(width - j);
auto vec_pixel_u8 = __riscv_vle8_v_u8m1(src_row + j, vl);
auto vec_pixel = __riscv_vzext_vf4(vec_pixel_u8, vl);
vec_sum = __riscv_vwaddu_wv_u64m8_tu(vec_sum, vec_sum, vec_pixel, vl);
vec_sqsum = __riscv_vwmaccu_vv_u64m8_tu(vec_sqsum, vec_pixel, vec_pixel, vl);
}
}
nz = height * width;
}
if (nz == 0) {
if (mean_val) *mean_val = 0.0;
if (stddev_val) *stddev_val = 0.0;
return CV_HAL_ERROR_OK;
}
auto zero = __riscv_vmv_s_x_u64m1(0, vlmax);
auto vec_red = __riscv_vmv_v_x_u64m1(0, vlmax);
auto vec_reddev = __riscv_vmv_v_x_u64m1(0, vlmax);
vec_red = __riscv_vredsum(vec_sum, zero, vlmax);
vec_reddev = __riscv_vredsum(vec_sqsum, zero, vlmax);
double sum = __riscv_vmv_x(vec_red);
double mean = sum / nz;
if (mean_val) {
*mean_val = mean;
}
if (stddev_val) {
double sqsum = __riscv_vmv_x(vec_reddev);
double variance = std::max((sqsum / nz) - (mean * mean), 0.0);
double stddev = std::sqrt(variance);
*stddev_val = stddev;
}
return CV_HAL_ERROR_OK;
}
inline int meanStdDev_8UC4(const uchar* src_data, size_t src_step, int width, int height,
double* mean_val, double* stddev_val, uchar* mask, size_t mask_step) {
int nz = 0;
int vlmax = __riscv_vsetvlmax_e64m8();
vuint64m8_t vec_sum = __riscv_vmv_v_x_u64m8(0, vlmax);
vuint64m8_t vec_sqsum = __riscv_vmv_v_x_u64m8(0, vlmax);
if (mask) {
for (int i = 0; i < height; ++i) {
const uchar* src_row = src_data + i * src_step;
const uchar* mask_row = mask + i * mask_step;
int j = 0, jm = 0, vl, vlm;
for ( ; j < width*4; j += vl, jm += vlm) {
vl = __riscv_vsetvl_e8m1(width*4 - j);
vlm = __riscv_vsetvl_e8mf4(width - jm);
auto vec_pixel_u8 = __riscv_vle8_v_u8m1(src_row + j, vl);
auto vmask_u8mf4 = __riscv_vle8_v_u8mf4(mask_row + jm, vlm);
auto vmask_u32 = __riscv_vzext_vf4(vmask_u8mf4, vlm);
// 0 -> 0000; 1 -> 1111
vmask_u32 = __riscv_vmul(vmask_u32, 0b00000001000000010000000100000001, vlm);
auto vmask_u8 = __riscv_vreinterpret_u8m1(vmask_u32);
auto vec_pixel = __riscv_vzext_vf4(vec_pixel_u8, vl);
auto vmask = __riscv_vmseq_vx_u8m1_b8(vmask_u8, 1, vl);
vec_sum = __riscv_vwaddu_wv_u64m8_tumu(vmask, vec_sum, vec_sum, vec_pixel, vl);
vec_sqsum = __riscv_vwmaccu_vv_u64m8_tumu(vmask, vec_sqsum, vec_pixel, vec_pixel, vl);
nz += __riscv_vcpop_m_b8(vmask, vl);
}
}
nz /= 4;
} else {
for (int i = 0; i < height; i++) {
const uchar* src_row = src_data + i * src_step;
int j = 0, vl;
for ( ; j < width*4; j += vl) {
vl = __riscv_vsetvl_e8m1(width*4 - j);
auto vec_pixel_u8 = __riscv_vle8_v_u8m1(src_row + j, vl);
auto vec_pixel = __riscv_vzext_vf4(vec_pixel_u8, vl);
vec_sum = __riscv_vwaddu_wv_u64m8_tu(vec_sum, vec_sum, vec_pixel, vl);
vec_sqsum = __riscv_vwmaccu_vv_u64m8_tu(vec_sqsum, vec_pixel, vec_pixel, vl);
}
}
nz = height * width;
}
if (nz == 0) {
if (mean_val) *mean_val = 0.0;
if (stddev_val) *stddev_val = 0.0;
return CV_HAL_ERROR_OK;
}
uint64_t s[256], sq[256], sum[4] = {0}, sqsum[4] = {0};
__riscv_vse64(s, vec_sum, vlmax);
__riscv_vse64(sq, vec_sqsum, vlmax);
for (int i = 0; i < vlmax; ++i)
{
sum[i % 4] += s[i];
sqsum[i % 4] += sq[i];
}
if (mean_val) {
mean_val[0] = (double)sum[0] / nz;
mean_val[1] = (double)sum[1] / nz;
mean_val[2] = (double)sum[2] / nz;
mean_val[3] = (double)sum[3] / nz;
}
if (stddev_val) {
stddev_val[0] = std::sqrt(std::max(((double)sqsum[0] / nz) - (mean_val[0] * mean_val[0]), 0.0));
stddev_val[1] = std::sqrt(std::max(((double)sqsum[1] / nz) - (mean_val[1] * mean_val[1]), 0.0));
stddev_val[2] = std::sqrt(std::max(((double)sqsum[2] / nz) - (mean_val[2] * mean_val[2]), 0.0));
stddev_val[3] = std::sqrt(std::max(((double)sqsum[3] / nz) - (mean_val[3] * mean_val[3]), 0.0));
}
return CV_HAL_ERROR_OK;
}
inline int meanStdDev_32FC1(const uchar* src_data, size_t src_step, int width, int height,
double* mean_val, double* stddev_val, uchar* mask, size_t mask_step) {
int nz = 0;
int vlmax = __riscv_vsetvlmax_e64m4();
vfloat64m4_t vec_sum = __riscv_vfmv_v_f_f64m4(0, vlmax);
vfloat64m4_t vec_sqsum = __riscv_vfmv_v_f_f64m4(0, vlmax);
src_step /= sizeof(float);
if (mask) {
for (int i = 0; i < height; ++i) {
const float* src_row0 = reinterpret_cast<const float*>(src_data) + i * src_step;
const uchar* mask_row = mask + i * mask_step;
int j = 0, vl;
for ( ; j < width; j += vl) {
vl = __riscv_vsetvl_e32m2(width - j);
auto vec_pixel = __riscv_vle32_v_f32m2(src_row0 + j, vl);
auto vmask_u8 = __riscv_vle8_v_u8mf2(mask_row + j, vl);
auto vmask_u32 = __riscv_vzext_vf4(vmask_u8, vl);
auto vmask = __riscv_vmseq_vx_u32m2_b16(vmask_u32, 1, vl);
vec_sum = __riscv_vfwadd_wv_f64m4_tumu(vmask, vec_sum, vec_sum, vec_pixel, vl);
vec_sqsum = __riscv_vfwmacc_vv_f64m4_tumu(vmask, vec_sqsum, vec_pixel, vec_pixel, vl);
nz += __riscv_vcpop_m_b16(vmask, vl);
}
}
} else {
for (int i = 0; i < height; i++) {
const float* src_row0 = reinterpret_cast<const float*>(src_data) + i * src_step;
int j = 0, vl;
for ( ; j < width; j += vl) {
vl = __riscv_vsetvl_e32m2(width - j);
auto vec_pixel = __riscv_vle32_v_f32m2(src_row0 + j, vl);
vec_sum = __riscv_vfwadd_wv_f64m4_tu(vec_sum, vec_sum, vec_pixel, vl);
vec_sqsum = __riscv_vfwmacc_vv_f64m4_tu(vec_sqsum, vec_pixel, vec_pixel, vl);
}
}
nz = height * width;
}
if (nz == 0) {
if (mean_val) *mean_val = 0.0;
if (stddev_val) *stddev_val = 0.0;
return CV_HAL_ERROR_OK;
}
auto zero = __riscv_vfmv_v_f_f64m1(0, vlmax);
auto vec_red = __riscv_vfmv_v_f_f64m1(0, vlmax);
auto vec_reddev = __riscv_vfmv_v_f_f64m1(0, vlmax);
vec_red = __riscv_vfredusum(vec_sum, zero, vlmax);
vec_reddev = __riscv_vfredusum(vec_sqsum, zero, vlmax);
double sum = __riscv_vfmv_f(vec_red);
double mean = sum / nz;
if (mean_val) {
*mean_val = mean;
}
if (stddev_val) {
double sqsum = __riscv_vfmv_f(vec_reddev);
double variance = std::max((sqsum / nz) - (mean * mean), 0.0);
double stddev = std::sqrt(variance);
*stddev_val = stddev;
}
return CV_HAL_ERROR_OK;
}
}}
#endif

397
3rdparty/hal_rvv/hal_rvv_1p0/merge.hpp vendored Normal file
View File

@ -0,0 +1,397 @@
// This file is part of OpenCV project.
// It is subject to the license terms in the LICENSE file found in the top-level directory
// of this distribution and at http://opencv.org/license.html.
#ifndef OPENCV_HAL_RVV_MERGE_HPP_INCLUDED
#define OPENCV_HAL_RVV_MERGE_HPP_INCLUDED
#include <riscv_vector.h>
namespace cv { namespace cv_hal_rvv {
#undef cv_hal_merge8u
#define cv_hal_merge8u cv::cv_hal_rvv::merge8u
#undef cv_hal_merge16u
#define cv_hal_merge16u cv::cv_hal_rvv::merge16u
#undef cv_hal_merge32s
#define cv_hal_merge32s cv::cv_hal_rvv::merge32s
#undef cv_hal_merge64s
#define cv_hal_merge64s cv::cv_hal_rvv::merge64s
#if defined __GNUC__
__attribute__((optimize("no-tree-vectorize")))
#endif
inline int merge8u(const uchar** src, uchar* dst, int len, int cn ) {
int k = cn % 4 ? cn % 4 : 4;
int i = 0;
int vl = __riscv_vsetvlmax_e8m1();
if( k == 1 )
{
const uchar* src0 = src[0];
for( ; i <= len - vl; i += vl)
{
auto a = __riscv_vle8_v_u8m1(src0 + i, vl);
__riscv_vsse8_v_u8m1(dst + i*cn, sizeof(uchar)*cn, a, vl);
}
#if defined(__clang__)
#pragma clang loop vectorize(disable)
#endif
for( ; i < len; i++)
dst[i*cn] = src0[i];
}
else if( k == 2 )
{
const uchar *src0 = src[0], *src1 = src[1];
for( ; i <= len - vl; i += vl)
{
auto a = __riscv_vle8_v_u8m1(src0 + i, vl);
auto b = __riscv_vle8_v_u8m1(src1 + i, vl);
__riscv_vsse8_v_u8m1(dst + i*cn, sizeof(uchar)*cn, a, vl);
__riscv_vsse8_v_u8m1(dst + i*cn + 1, sizeof(uchar)*cn, b, vl);
}
#if defined(__clang__)
#pragma clang loop vectorize(disable)
#endif
for( ; i < len; i++ )
{
dst[i*cn] = src0[i];
dst[i*cn+1] = src1[i];
}
}
else if( k == 3 )
{
const uchar *src0 = src[0], *src1 = src[1], *src2 = src[2];
for( ; i <= len - vl; i += vl)
{
auto a = __riscv_vle8_v_u8m1(src0 + i, vl);
auto b = __riscv_vle8_v_u8m1(src1 + i, vl);
auto c = __riscv_vle8_v_u8m1(src2 + i, vl);
__riscv_vsse8_v_u8m1(dst + i*cn, sizeof(uchar)*cn, a, vl);
__riscv_vsse8_v_u8m1(dst + i*cn + 1, sizeof(uchar)*cn, b, vl);
__riscv_vsse8_v_u8m1(dst + i*cn + 2, sizeof(uchar)*cn, c, vl);
}
#if defined(__clang__)
#pragma clang loop vectorize(disable)
#endif
for( ; i < len; i++ )
{
dst[i*cn] = src0[i];
dst[i*cn+1] = src1[i];
dst[i*cn+2] = src2[i];
}
}
else
{
const uchar *src0 = src[0], *src1 = src[1], *src2 = src[2], *src3 = src[3];
for( ; i <= len - vl; i += vl)
{
auto a = __riscv_vle8_v_u8m1(src0 + i, vl);
auto b = __riscv_vle8_v_u8m1(src1 + i, vl);
auto c = __riscv_vle8_v_u8m1(src2 + i, vl);
auto d = __riscv_vle8_v_u8m1(src3 + i, vl);
__riscv_vsse8_v_u8m1(dst + i*cn, sizeof(uchar)*cn, a, vl);
__riscv_vsse8_v_u8m1(dst + i*cn + 1, sizeof(uchar)*cn, b, vl);
__riscv_vsse8_v_u8m1(dst + i*cn + 2, sizeof(uchar)*cn, c, vl);
__riscv_vsse8_v_u8m1(dst + i*cn + 3, sizeof(uchar)*cn, d, vl);
}
#if defined(__clang__)
#pragma clang loop vectorize(disable)
#endif
for( ; i < len; i++ )
{
dst[i*cn] = src0[i];
dst[i*cn+1] = src1[i];
dst[i*cn+2] = src2[i];
dst[i*cn+3] = src3[i];
}
}
#if defined(__clang__)
#pragma clang loop vectorize(disable)
#endif
for( ; k < cn; k += 4 )
{
const uchar *src0 = src[k], *src1 = src[k+1], *src2 = src[k+2], *src3 = src[k+3];
i = 0;
for( ; i <= len - vl; i += vl)
{
auto a = __riscv_vle8_v_u8m1(src0 + i, vl);
auto b = __riscv_vle8_v_u8m1(src1 + i, vl);
auto c = __riscv_vle8_v_u8m1(src2 + i, vl);
auto d = __riscv_vle8_v_u8m1(src3 + i, vl);
__riscv_vsse8_v_u8m1(dst + k+i*cn, sizeof(uchar)*cn, a, vl);
__riscv_vsse8_v_u8m1(dst + k+i*cn + 1, sizeof(uchar)*cn, b, vl);
__riscv_vsse8_v_u8m1(dst + k+i*cn + 2, sizeof(uchar)*cn, c, vl);
__riscv_vsse8_v_u8m1(dst + k+i*cn + 3, sizeof(uchar)*cn, d, vl);
}
#if defined(__clang__)
#pragma clang loop vectorize(disable)
#endif
for( ; i < len; i++ )
{
dst[k+i*cn] = src0[i];
dst[k+i*cn+1] = src1[i];
dst[k+i*cn+2] = src2[i];
dst[k+i*cn+3] = src3[i];
}
}
return CV_HAL_ERROR_OK;
}
#if defined __GNUC__
__attribute__((optimize("no-tree-vectorize")))
#endif
inline int merge16u(const ushort** src, ushort* dst, int len, int cn ) {
int k = cn % 4 ? cn % 4 : 4;
int i = 0;
int vl = __riscv_vsetvlmax_e16m1();
if( k == 1 )
{
const ushort* src0 = src[0];
for( ; i <= len - vl; i += vl)
{
auto a = __riscv_vle16_v_u16m1(src0 + i, vl);
__riscv_vsse16_v_u16m1(dst + i*cn, sizeof(ushort)*cn, a, vl);
}
#if defined(__clang__)
#pragma clang loop vectorize(disable)
#endif
for( ; i < len; i++)
dst[i*cn] = src0[i];
}
else if( k == 2 )
{
const ushort *src0 = src[0], *src1 = src[1];
for( ; i <= len - vl; i += vl)
{
auto a = __riscv_vle16_v_u16m1(src0 + i, vl);
auto b = __riscv_vle16_v_u16m1(src1 + i, vl);
__riscv_vsse16_v_u16m1(dst + i*cn, sizeof(ushort)*cn, a, vl);
__riscv_vsse16_v_u16m1(dst + i*cn + 1, sizeof(ushort)*cn, b, vl);
}
#if defined(__clang__)
#pragma clang loop vectorize(disable)
#endif
for( ; i < len; i++ )
{
dst[i*cn] = src0[i];
dst[i*cn+1] = src1[i];
}
}
else if( k == 3 )
{
const ushort *src0 = src[0], *src1 = src[1], *src2 = src[2];
for( ; i <= len - vl; i += vl)
{
auto a = __riscv_vle16_v_u16m1(src0 + i, vl);
auto b = __riscv_vle16_v_u16m1(src1 + i, vl);
auto c = __riscv_vle16_v_u16m1(src2 + i, vl);
__riscv_vsse16_v_u16m1(dst + i*cn, sizeof(ushort)*cn, a, vl);
__riscv_vsse16_v_u16m1(dst + i*cn + 1, sizeof(ushort)*cn, b, vl);
__riscv_vsse16_v_u16m1(dst + i*cn + 2, sizeof(ushort)*cn, c, vl);
}
#if defined(__clang__)
#pragma clang loop vectorize(disable)
#endif
for( ; i < len; i++ )
{
dst[i*cn] = src0[i];
dst[i*cn+1] = src1[i];
dst[i*cn+2] = src2[i];
}
}
else
{
const ushort *src0 = src[0], *src1 = src[1], *src2 = src[2], *src3 = src[3];
for( ; i <= len - vl; i += vl)
{
auto a = __riscv_vle16_v_u16m1(src0 + i, vl);
auto b = __riscv_vle16_v_u16m1(src1 + i, vl);
auto c = __riscv_vle16_v_u16m1(src2 + i, vl);
auto d = __riscv_vle16_v_u16m1(src3 + i, vl);
__riscv_vsse16_v_u16m1(dst + i*cn, sizeof(ushort)*cn, a, vl);
__riscv_vsse16_v_u16m1(dst + i*cn + 1, sizeof(ushort)*cn, b, vl);
__riscv_vsse16_v_u16m1(dst + i*cn + 2, sizeof(ushort)*cn, c, vl);
__riscv_vsse16_v_u16m1(dst + i*cn + 3, sizeof(ushort)*cn, d, vl);
}
#if defined(__clang__)
#pragma clang loop vectorize(disable)
#endif
for( ; i < len; i++ )
{
dst[i*cn] = src0[i];
dst[i*cn+1] = src1[i];
dst[i*cn+2] = src2[i];
dst[i*cn+3] = src3[i];
}
}
#if defined(__clang__)
#pragma clang loop vectorize(disable)
#endif
for( ; k < cn; k += 4 )
{
const uint16_t *src0 = src[k], *src1 = src[k+1], *src2 = src[k+2], *src3 = src[k+3];
i = 0;
for( ; i <= len - vl; i += vl)
{
auto a = __riscv_vle16_v_u16m1(src0 + i, vl);
auto b = __riscv_vle16_v_u16m1(src1 + i, vl);
auto c = __riscv_vle16_v_u16m1(src2 + i, vl);
auto d = __riscv_vle16_v_u16m1(src3 + i, vl);
__riscv_vsse16_v_u16m1(dst + k+i*cn, sizeof(ushort)*cn, a, vl);
__riscv_vsse16_v_u16m1(dst + k+i*cn + 1, sizeof(ushort)*cn, b, vl);
__riscv_vsse16_v_u16m1(dst + k+i*cn + 2, sizeof(ushort)*cn, c, vl);
__riscv_vsse16_v_u16m1(dst + k+i*cn + 3, sizeof(ushort)*cn, d, vl);
}
for( ; i < len; i++ )
{
dst[k+i*cn] = src0[i];
dst[k+i*cn+1] = src1[i];
dst[k+i*cn+2] = src2[i];
dst[k+i*cn+3] = src3[i];
}
}
return CV_HAL_ERROR_OK;
}
#if defined __GNUC__
__attribute__((optimize("no-tree-vectorize")))
#endif
inline int merge32s(const int** src, int* dst, int len, int cn ) {
int k = cn % 4 ? cn % 4 : 4;
int i, j;
if( k == 1 )
{
const int* src0 = src[0];
#if defined(__clang__)
#pragma clang loop vectorize(disable)
#endif
for( i = j = 0; i < len; i++, j += cn )
dst[j] = src0[i];
}
else if( k == 2 )
{
const int *src0 = src[0], *src1 = src[1];
i = j = 0;
#if defined(__clang__)
#pragma clang loop vectorize(disable)
#endif
for( ; i < len; i++, j += cn )
{
dst[j] = src0[i];
dst[j+1] = src1[i];
}
}
else if( k == 3 )
{
const int *src0 = src[0], *src1 = src[1], *src2 = src[2];
i = j = 0;
#if defined(__clang__)
#pragma clang loop vectorize(disable)
#endif
for( ; i < len; i++, j += cn )
{
dst[j] = src0[i];
dst[j+1] = src1[i];
dst[j+2] = src2[i];
}
}
else
{
const int *src0 = src[0], *src1 = src[1], *src2 = src[2], *src3 = src[3];
i = j = 0;
#if defined(__clang__)
#pragma clang loop vectorize(disable)
#endif
for( ; i < len; i++, j += cn )
{
dst[j] = src0[i]; dst[j+1] = src1[i];
dst[j+2] = src2[i]; dst[j+3] = src3[i];
}
}
#if defined(__clang__)
#pragma clang loop vectorize(disable)
#endif
for( ; k < cn; k += 4 )
{
const int *src0 = src[k], *src1 = src[k+1], *src2 = src[k+2], *src3 = src[k+3];
for( i = 0, j = k; i < len; i++, j += cn )
{
dst[j] = src0[i]; dst[j+1] = src1[i];
dst[j+2] = src2[i]; dst[j+3] = src3[i];
}
}
return CV_HAL_ERROR_OK;
}
#if defined __GNUC__
__attribute__((optimize("no-tree-vectorize")))
#endif
inline int merge64s(const int64** src, int64* dst, int len, int cn ) {
int k = cn % 4 ? cn % 4 : 4;
int i, j;
if( k == 1 )
{
const int64* src0 = src[0];
#if defined(__clang__)
#pragma clang loop vectorize(disable)
#endif
for( i = j = 0; i < len; i++, j += cn )
dst[j] = src0[i];
}
else if( k == 2 )
{
const int64 *src0 = src[0], *src1 = src[1];
i = j = 0;
#if defined(__clang__)
#pragma clang loop vectorize(disable)
#endif
for( ; i < len; i++, j += cn )
{
dst[j] = src0[i];
dst[j+1] = src1[i];
}
}
else if( k == 3 )
{
const int64 *src0 = src[0], *src1 = src[1], *src2 = src[2];
i = j = 0;
#if defined(__clang__)
#pragma clang loop vectorize(disable)
#endif
for( ; i < len; i++, j += cn )
{
dst[j] = src0[i];
dst[j+1] = src1[i];
dst[j+2] = src2[i];
}
}
else
{
const int64 *src0 = src[0], *src1 = src[1], *src2 = src[2], *src3 = src[3];
i = j = 0;
#if defined(__clang__)
#pragma clang loop vectorize(disable)
#endif
for( ; i < len; i++, j += cn )
{
dst[j] = src0[i]; dst[j+1] = src1[i];
dst[j+2] = src2[i]; dst[j+3] = src3[i];
}
}
#if defined(__clang__)
#pragma clang loop vectorize(disable)
#endif
for( ; k < cn; k += 4 )
{
const int64 *src0 = src[k], *src1 = src[k+1], *src2 = src[k+2], *src3 = src[k+3];
for( i = 0, j = k; i < len; i++, j += cn )
{
dst[j] = src0[i]; dst[j+1] = src1[i];
dst[j+2] = src2[i]; dst[j+3] = src3[i];
}
}
return CV_HAL_ERROR_OK;
}
}}
#endif

335
3rdparty/hal_rvv/hal_rvv_1p0/minmax.hpp vendored Normal file
View File

@ -0,0 +1,335 @@
// This file is part of OpenCV project.
// It is subject to the license terms in the LICENSE file found in the top-level directory
// of this distribution and at http://opencv.org/license.html.
#ifndef OPENCV_HAL_RVV_MINMAXIDX_HPP_INCLUDED
#define OPENCV_HAL_RVV_MINMAXIDX_HPP_INCLUDED
#include <riscv_vector.h>
namespace cv { namespace cv_hal_rvv {
#undef cv_hal_minMaxIdx
#define cv_hal_minMaxIdx cv::cv_hal_rvv::minMaxIdx
#undef cv_hal_minMaxIdxMaskStep
#define cv_hal_minMaxIdxMaskStep cv::cv_hal_rvv::minMaxIdx
namespace
{
template<typename T> struct rvv;
#define HAL_RVV_GENERATOR(T, EEW, TYPE, IS_U, EMUL, M_EMUL, B_LEN) \
template<> struct rvv<T> \
{ \
using vec_t = v##IS_U##int##EEW##EMUL##_t; \
using bool_t = vbool##B_LEN##_t; \
static inline size_t vsetvlmax() { return __riscv_vsetvlmax_e##EEW##EMUL(); } \
static inline size_t vsetvl(size_t a) { return __riscv_vsetvl_e##EEW##EMUL(a); } \
static inline vec_t vmv_v_x(T a, size_t b) { return __riscv_vmv_v_x_##TYPE##EMUL(a, b); } \
static inline vec_t vle(const T* a, size_t b) { return __riscv_vle##EEW##_v_##TYPE##EMUL(a, b); } \
static inline vuint8##M_EMUL##_t vle_mask(const uchar* a, size_t b) { return __riscv_vle8_v_u8##M_EMUL(a, b); } \
static inline vec_t vmin_tu(vec_t a, vec_t b, vec_t c, size_t d) { return __riscv_vmin##IS_U##_tu(a, b, c, d); } \
static inline vec_t vmax_tu(vec_t a, vec_t b, vec_t c, size_t d) { return __riscv_vmax##IS_U##_tu(a, b, c, d); } \
static inline vec_t vmin_tumu(bool_t a, vec_t b, vec_t c, vec_t d, size_t e) { return __riscv_vmin##IS_U##_tumu(a, b, c, d, e); } \
static inline vec_t vmax_tumu(bool_t a, vec_t b, vec_t c, vec_t d, size_t e) { return __riscv_vmax##IS_U##_tumu(a, b, c, d, e); } \
static inline vec_t vredmin(vec_t a, vec_t b, size_t c) { return __riscv_vredmin##IS_U(a, b, c); } \
static inline vec_t vredmax(vec_t a, vec_t b, size_t c) { return __riscv_vredmax##IS_U(a, b, c); } \
};
HAL_RVV_GENERATOR(uchar , 8 , u8 , u, m1, m1 , 8 )
HAL_RVV_GENERATOR(schar , 8 , i8 , , m1, m1 , 8 )
HAL_RVV_GENERATOR(ushort, 16, u16, u, m1, mf2, 16)
HAL_RVV_GENERATOR(short , 16, i16, , m1, mf2, 16)
#undef HAL_RVV_GENERATOR
#define HAL_RVV_GENERATOR(T, NAME, EEW, TYPE, IS_F, F_OR_S, F_OR_X, EMUL, M_EMUL, P_EMUL, B_LEN) \
template<> struct rvv<T> \
{ \
using vec_t = v##NAME##EEW##EMUL##_t; \
using bool_t = vbool##B_LEN##_t; \
static inline size_t vsetvlmax() { return __riscv_vsetvlmax_e##EEW##EMUL(); } \
static inline size_t vsetvl(size_t a) { return __riscv_vsetvl_e##EEW##EMUL(a); } \
static inline vec_t vmv_v_x(T a, size_t b) { return __riscv_v##IS_F##mv_v_##F_OR_X##_##TYPE##EMUL(a, b); } \
static inline vuint32##P_EMUL##_t vid(size_t a) { return __riscv_vid_v_u32##P_EMUL(a); } \
static inline vuint32##P_EMUL##_t vundefined() { return __riscv_vundefined_u32##P_EMUL(); } \
static inline vec_t vle(const T* a, size_t b) { return __riscv_vle##EEW##_v_##TYPE##EMUL(a, b); } \
static inline vuint8##M_EMUL##_t vle_mask(const uchar* a, size_t b) { return __riscv_vle8_v_u8##M_EMUL(a, b); } \
static inline bool_t vmlt(vec_t a, vec_t b, size_t c) { return __riscv_vm##F_OR_S##lt(a, b, c); } \
static inline bool_t vmgt(vec_t a, vec_t b, size_t c) { return __riscv_vm##F_OR_S##gt(a, b, c); } \
static inline bool_t vmlt_mu(bool_t a, bool_t b, vec_t c, vec_t d, size_t e) { return __riscv_vm##F_OR_S##lt##_mu(a, b, c, d, e); } \
static inline bool_t vmgt_mu(bool_t a, bool_t b, vec_t c, vec_t d, size_t e) { return __riscv_vm##F_OR_S##gt##_mu(a, b, c, d, e); } \
static inline T vmv_x_s(vec_t a) { return __riscv_v##IS_F##mv_##F_OR_X(a); } \
};
HAL_RVV_GENERATOR(int , int , 32, i32, , s, x, m4, m1 , m4, 8 )
HAL_RVV_GENERATOR(float , float, 32, f32, f, f, f, m4, m1 , m4, 8 )
HAL_RVV_GENERATOR(double, float, 64, f64, f, f, f, m4, mf2, m2, 16)
#undef HAL_RVV_GENERATOR
}
template<typename T>
inline int minMaxIdxReadTwice(const uchar* src_data, size_t src_step, int width, int height, double* minVal, double* maxVal,
int* minIdx, int* maxIdx, uchar* mask, size_t mask_step)
{
int vlmax = rvv<T>::vsetvlmax();
auto vec_min = rvv<T>::vmv_v_x(std::numeric_limits<T>::max(), vlmax);
auto vec_max = rvv<T>::vmv_v_x(std::numeric_limits<T>::lowest(), vlmax);
T val_min, val_max;
if (mask)
{
for (int i = 0; i < height; i++)
{
const T* src_row = reinterpret_cast<const T*>(src_data + i * src_step);
const uchar* mask_row = mask + i * mask_step;
int vl;
for (int j = 0; j < width; j += vl)
{
vl = rvv<T>::vsetvl(width - j);
auto vec_src = rvv<T>::vle(src_row + j, vl);
auto vec_mask = rvv<T>::vle_mask(mask_row + j, vl);
auto bool_mask = __riscv_vmsne(vec_mask, 0, vl);
vec_min = rvv<T>::vmin_tumu(bool_mask, vec_min, vec_min, vec_src, vl);
vec_max = rvv<T>::vmax_tumu(bool_mask, vec_max, vec_max, vec_src, vl);
}
}
auto sc_minval = rvv<T>::vmv_v_x(std::numeric_limits<T>::max(), vlmax);
auto sc_maxval = rvv<T>::vmv_v_x(std::numeric_limits<T>::lowest(), vlmax);
sc_minval = rvv<T>::vredmin(vec_min, sc_minval, vlmax);
sc_maxval = rvv<T>::vredmax(vec_max, sc_maxval, vlmax);
val_min = __riscv_vmv_x(sc_minval);
val_max = __riscv_vmv_x(sc_maxval);
bool found_min = !minIdx, found_max = !maxIdx;
for (int i = 0; i < height && (!found_min || !found_max); i++)
{
const T* src_row = reinterpret_cast<const T*>(src_data + i * src_step);
const uchar* mask_row = mask + i * mask_step;
int vl;
for (int j = 0; j < width && (!found_min || !found_max); j += vl)
{
vl = rvv<T>::vsetvl(width - j);
auto vec_src = rvv<T>::vle(src_row + j, vl);
auto vec_mask = rvv<T>::vle_mask(mask_row + j, vl);
auto bool_mask = __riscv_vmsne(vec_mask, 0, vl);
auto bool_zero = __riscv_vmxor(bool_mask, bool_mask, vl);
if (!found_min)
{
auto bool_minpos = __riscv_vmseq_mu(bool_mask, bool_zero, vec_src, val_min, vl);
int index = __riscv_vfirst(bool_minpos, vl);
if (index != -1)
{
found_min = true;
minIdx[0] = i;
minIdx[1] = j + index;
}
}
if (!found_max)
{
auto bool_maxpos = __riscv_vmseq_mu(bool_mask, bool_zero, vec_src, val_max, vl);
int index = __riscv_vfirst(bool_maxpos, vl);
if (index != -1)
{
found_max = true;
maxIdx[0] = i;
maxIdx[1] = j + index;
}
}
}
}
}
else
{
for (int i = 0; i < height; i++)
{
const T* src_row = reinterpret_cast<const T*>(src_data + i * src_step);
int vl;
for (int j = 0; j < width; j += vl)
{
vl = rvv<T>::vsetvl(width - j);
auto vec_src = rvv<T>::vle(src_row + j, vl);
vec_min = rvv<T>::vmin_tu(vec_min, vec_min, vec_src, vl);
vec_max = rvv<T>::vmax_tu(vec_max, vec_max, vec_src, vl);
}
}
auto sc_minval = rvv<T>::vmv_v_x(std::numeric_limits<T>::max(), vlmax);
auto sc_maxval = rvv<T>::vmv_v_x(std::numeric_limits<T>::lowest(), vlmax);
sc_minval = rvv<T>::vredmin(vec_min, sc_minval, vlmax);
sc_maxval = rvv<T>::vredmax(vec_max, sc_maxval, vlmax);
val_min = __riscv_vmv_x(sc_minval);
val_max = __riscv_vmv_x(sc_maxval);
bool found_min = !minIdx, found_max = !maxIdx;
for (int i = 0; i < height && (!found_min || !found_max); i++)
{
const T* src_row = reinterpret_cast<const T*>(src_data + i * src_step);
int vl;
for (int j = 0; j < width && (!found_min || !found_max); j += vl)
{
vl = rvv<T>::vsetvl(width - j);
auto vec_src = rvv<T>::vle(src_row + j, vl);
if (!found_min)
{
auto bool_minpos = __riscv_vmseq(vec_src, val_min, vl);
int index = __riscv_vfirst(bool_minpos, vl);
if (index != -1)
{
found_min = true;
minIdx[0] = i;
minIdx[1] = j + index;
}
}
if (!found_max)
{
auto bool_maxpos = __riscv_vmseq(vec_src, val_max, vl);
int index = __riscv_vfirst(bool_maxpos, vl);
if (index != -1)
{
found_max = true;
maxIdx[0] = i;
maxIdx[1] = j + index;
}
}
}
}
}
if (minVal)
{
*minVal = val_min;
}
if (maxVal)
{
*maxVal = val_max;
}
return CV_HAL_ERROR_OK;
}
template<typename T>
inline int minMaxIdxReadOnce(const uchar* src_data, size_t src_step, int width, int height, double* minVal, double* maxVal,
int* minIdx, int* maxIdx, uchar* mask, size_t mask_step)
{
int vlmax = rvv<T>::vsetvlmax();
auto vec_min = rvv<T>::vmv_v_x(std::numeric_limits<T>::max(), vlmax);
auto vec_max = rvv<T>::vmv_v_x(std::numeric_limits<T>::lowest(), vlmax);
auto vec_pos = rvv<T>::vid(vlmax);
auto vec_minpos = rvv<T>::vundefined(), vec_maxpos = rvv<T>::vundefined();
T val_min, val_max;
if (mask)
{
for (int i = 0; i < height; i++)
{
const T* src_row = reinterpret_cast<const T*>(src_data + i * src_step);
const uchar* mask_row = mask + i * mask_step;
int vl;
for (int j = 0; j < width; j += vl)
{
vl = rvv<T>::vsetvl(width - j);
auto vec_src = rvv<T>::vle(src_row + j, vl);
auto vec_mask = rvv<T>::vle_mask(mask_row + j, vl);
auto bool_mask = __riscv_vmsne(vec_mask, 0, vl);
auto bool_zero = __riscv_vmxor(bool_mask, bool_mask, vl);
auto bool_minpos = rvv<T>::vmlt_mu(bool_mask, bool_zero, vec_src, vec_min, vl);
auto bool_maxpos = rvv<T>::vmgt_mu(bool_mask, bool_zero, vec_src, vec_max, vl);
vec_minpos = __riscv_vmerge_tu(vec_minpos, vec_minpos, vec_pos, bool_minpos, vl);
vec_maxpos = __riscv_vmerge_tu(vec_maxpos, vec_maxpos, vec_pos, bool_maxpos, vl);
vec_min = __riscv_vmerge_tu(vec_min, vec_min, vec_src, bool_minpos, vl);
vec_max = __riscv_vmerge_tu(vec_max, vec_max, vec_src, bool_maxpos, vl);
vec_pos = __riscv_vadd(vec_pos, vl, vlmax);
}
}
}
else
{
for (int i = 0; i < height; i++)
{
const T* src_row = reinterpret_cast<const T*>(src_data + i * src_step);
int vl;
for (int j = 0; j < width; j += vl)
{
vl = rvv<T>::vsetvl(width - j);
auto vec_src = rvv<T>::vle(src_row + j, vl);
auto bool_minpos = rvv<T>::vmlt(vec_src, vec_min, vl);
auto bool_maxpos = rvv<T>::vmgt(vec_src, vec_max, vl);
vec_minpos = __riscv_vmerge_tu(vec_minpos, vec_minpos, vec_pos, bool_minpos, vl);
vec_maxpos = __riscv_vmerge_tu(vec_maxpos, vec_maxpos, vec_pos, bool_maxpos, vl);
vec_min = __riscv_vmerge_tu(vec_min, vec_min, vec_src, bool_minpos, vl);
vec_max = __riscv_vmerge_tu(vec_max, vec_max, vec_src, bool_maxpos, vl);
vec_pos = __riscv_vadd(vec_pos, vl, vlmax);
}
}
}
val_min = std::numeric_limits<T>::max();
val_max = std::numeric_limits<T>::lowest();
for (int i = 0; i < vlmax; i++)
{
if (val_min > rvv<T>::vmv_x_s(vec_min))
{
val_min = rvv<T>::vmv_x_s(vec_min);
if (minIdx)
{
minIdx[0] = __riscv_vmv_x(vec_minpos) / width;
minIdx[1] = __riscv_vmv_x(vec_minpos) % width;
}
}
if (val_max < rvv<T>::vmv_x_s(vec_max))
{
val_max = rvv<T>::vmv_x_s(vec_max);
if (maxIdx)
{
maxIdx[0] = __riscv_vmv_x(vec_maxpos) / width;
maxIdx[1] = __riscv_vmv_x(vec_maxpos) % width;
}
}
vec_min = __riscv_vslidedown(vec_min, 1, vlmax);
vec_max = __riscv_vslidedown(vec_max, 1, vlmax);
vec_minpos = __riscv_vslidedown(vec_minpos, 1, vlmax);
vec_maxpos = __riscv_vslidedown(vec_maxpos, 1, vlmax);
}
if (minVal)
{
*minVal = val_min;
}
if (maxVal)
{
*maxVal = val_max;
}
return CV_HAL_ERROR_OK;
}
inline int minMaxIdx(const uchar* src_data, size_t src_step, int width, int height, int depth, double* minVal, double* maxVal,
int* minIdx, int* maxIdx, uchar* mask, size_t mask_step = 0)
{
if (!mask_step)
mask_step = src_step;
switch (depth)
{
case CV_8UC1:
return minMaxIdxReadTwice<uchar>(src_data, src_step, width, height, minVal, maxVal, minIdx, maxIdx, mask, mask_step);
case CV_8SC1:
return minMaxIdxReadTwice<schar>(src_data, src_step, width, height, minVal, maxVal, minIdx, maxIdx, mask, mask_step);
case CV_16UC1:
return minMaxIdxReadTwice<ushort>(src_data, src_step, width, height, minVal, maxVal, minIdx, maxIdx, mask, mask_step);
case CV_16SC1:
return minMaxIdxReadTwice<short>(src_data, src_step, width, height, minVal, maxVal, minIdx, maxIdx, mask, mask_step);
case CV_32SC1:
return minMaxIdxReadOnce<int>(src_data, src_step, width, height, minVal, maxVal, minIdx, maxIdx, mask, mask_step);
case CV_32FC1:
return minMaxIdxReadOnce<float>(src_data, src_step, width, height, minVal, maxVal, minIdx, maxIdx, mask, mask_step);
case CV_64FC1:
return minMaxIdxReadOnce<double>(src_data, src_step, width, height, minVal, maxVal, minIdx, maxIdx, mask, mask_step);
}
return CV_HAL_ERROR_NOT_IMPLEMENTED;
}
}}
#endif

517
3rdparty/hal_rvv/hal_rvv_1p0/norm.hpp vendored Normal file
View File

@ -0,0 +1,517 @@
// This file is part of OpenCV project.
// It is subject to the license terms in the LICENSE file found in the top-level directory
// of this distribution and at http://opencv.org/license.html.
#ifndef OPENCV_HAL_RVV_NORM_HPP_INCLUDED
#define OPENCV_HAL_RVV_NORM_HPP_INCLUDED
#include <riscv_vector.h>
namespace cv { namespace cv_hal_rvv {
#undef cv_hal_norm
#define cv_hal_norm cv::cv_hal_rvv::norm
inline int normInf_8UC1(const uchar* src, size_t src_step, const uchar* mask, size_t mask_step, int width, int height, double* result)
{
int vlmax = __riscv_vsetvlmax_e8m8();
auto vec_max = __riscv_vmv_v_x_u8m8(0, vlmax);
if (mask)
{
for (int i = 0; i < height; i++)
{
const uchar* src_row = src + i * src_step;
const uchar* mask_row = mask + i * mask_step;
int vl;
for (int j = 0; j < width; j += vl)
{
vl = __riscv_vsetvl_e8m8(width - j);
auto vec_src = __riscv_vle8_v_u8m8(src_row + j, vl);
auto vec_mask = __riscv_vle8_v_u8m8(mask_row + j, vl);
auto bool_mask = __riscv_vmsne(vec_mask, 0, vl);
vec_max = __riscv_vmaxu_tumu(bool_mask, vec_max, vec_max, vec_src, vl);
}
}
}
else
{
for (int i = 0; i < height; i++)
{
const uchar* src_row = src + i * src_step;
int vl;
for (int j = 0; j < width; j += vl)
{
vl = __riscv_vsetvl_e8m8(width - j);
auto vec_src = __riscv_vle8_v_u8m8(src_row + j, vl);
vec_max = __riscv_vmaxu_tu(vec_max, vec_max, vec_src, vl);
}
}
}
auto sc_max = __riscv_vmv_s_x_u8m1(0, vlmax);
sc_max = __riscv_vredmaxu(vec_max, sc_max, vlmax);
*result = __riscv_vmv_x(sc_max);
return CV_HAL_ERROR_OK;
}
inline int normL1_8UC1(const uchar* src, size_t src_step, const uchar* mask, size_t mask_step, int width, int height, double* result)
{
int vlmax = __riscv_vsetvlmax_e8m2();
auto vec_sum = __riscv_vmv_v_x_u32m8(0, vlmax);
if (mask)
{
for (int i = 0; i < height; i++)
{
const uchar* src_row = src + i * src_step;
const uchar* mask_row = mask + i * mask_step;
int vl;
for (int j = 0; j < width; j += vl)
{
vl = __riscv_vsetvl_e8m2(width - j);
auto vec_src = __riscv_vle8_v_u8m2(src_row + j, vl);
auto vec_mask = __riscv_vle8_v_u8m2(mask_row + j, vl);
auto bool_mask = __riscv_vmsne(vec_mask, 0, vl);
auto vec_zext = __riscv_vzext_vf4_u32m8_m(bool_mask, vec_src, vl);
vec_sum = __riscv_vadd_tumu(bool_mask, vec_sum, vec_sum, vec_zext, vl);
}
}
}
else
{
for (int i = 0; i < height; i++)
{
const uchar* src_row = src + i * src_step;
int vl;
for (int j = 0; j < width; j += vl)
{
vl = __riscv_vsetvl_e8m2(width - j);
auto vec_src = __riscv_vle8_v_u8m2(src_row + j, vl);
auto vec_zext = __riscv_vzext_vf4(vec_src, vl);
vec_sum = __riscv_vadd_tu(vec_sum, vec_sum, vec_zext, vl);
}
}
}
auto sc_sum = __riscv_vmv_s_x_u32m1(0, vlmax);
sc_sum = __riscv_vredsum(vec_sum, sc_sum, vlmax);
*result = __riscv_vmv_x(sc_sum);
return CV_HAL_ERROR_OK;
}
inline int normL2Sqr_8UC1(const uchar* src, size_t src_step, const uchar* mask, size_t mask_step, int width, int height, double* result)
{
int vlmax = __riscv_vsetvlmax_e8m2();
auto vec_sum = __riscv_vmv_v_x_u32m8(0, vlmax);
int cnt = 0;
auto reduce = [&](int vl) {
if ((cnt += vl) < (1 << 16))
return;
cnt = vl;
for (int i = 0; i < vlmax; i++)
{
*result += __riscv_vmv_x(vec_sum);
vec_sum = __riscv_vslidedown(vec_sum, 1, vlmax);
}
vec_sum = __riscv_vmv_v_x_u32m8(0, vlmax);
};
*result = 0;
if (mask)
{
for (int i = 0; i < height; i++)
{
const uchar* src_row = src + i * src_step;
const uchar* mask_row = mask + i * mask_step;
int vl;
for (int j = 0; j < width; j += vl)
{
vl = __riscv_vsetvl_e8m2(width - j);
reduce(vl);
auto vec_src = __riscv_vle8_v_u8m2(src_row + j, vl);
auto vec_mask = __riscv_vle8_v_u8m2(mask_row + j, vl);
auto bool_mask = __riscv_vmsne(vec_mask, 0, vl);
auto vec_mul = __riscv_vwmulu_vv_u16m4_m(bool_mask, vec_src, vec_src, vl);
auto vec_zext = __riscv_vzext_vf2_u32m8_m(bool_mask, vec_mul, vl);
vec_sum = __riscv_vadd_tumu(bool_mask, vec_sum, vec_sum, vec_zext, vl);
}
}
}
else
{
for (int i = 0; i < height; i++)
{
const uchar* src_row = src + i * src_step;
int vl;
for (int j = 0; j < width; j += vl)
{
vl = __riscv_vsetvl_e8m2(width - j);
reduce(vl);
auto vec_src = __riscv_vle8_v_u8m2(src_row + j, vl);
auto vec_mul = __riscv_vwmulu(vec_src, vec_src, vl);
auto vec_zext = __riscv_vzext_vf2(vec_mul, vl);
vec_sum = __riscv_vadd_tu(vec_sum, vec_sum, vec_zext, vl);
}
}
}
reduce(1 << 16);
return CV_HAL_ERROR_OK;
}
inline int normInf_8UC4(const uchar* src, size_t src_step, const uchar* mask, size_t mask_step, int width, int height, double* result)
{
int vlmax = __riscv_vsetvlmax_e8m8();
auto vec_max = __riscv_vmv_v_x_u8m8(0, vlmax);
if (mask)
{
for (int i = 0; i < height; i++)
{
const uchar* src_row = src + i * src_step;
const uchar* mask_row = mask + i * mask_step;
int vl, vlm;
for (int j = 0, jm = 0; j < width * 4; j += vl, jm += vlm)
{
vl = __riscv_vsetvl_e8m8(width * 4 - j);
vlm = __riscv_vsetvl_e8m2(width - jm);
auto vec_src = __riscv_vle8_v_u8m8(src_row + j, vl);
auto vec_mask = __riscv_vle8_v_u8m2(mask_row + jm, vlm);
auto vec_mask_ext = __riscv_vmul(__riscv_vzext_vf4(__riscv_vminu(vec_mask, 1, vlm), vlm), 0x01010101, vlm);
auto bool_mask_ext = __riscv_vmsne(__riscv_vreinterpret_u8m8(vec_mask_ext), 0, vl);
vec_max = __riscv_vmaxu_tumu(bool_mask_ext, vec_max, vec_max, vec_src, vl);
}
}
}
else
{
for (int i = 0; i < height; i++)
{
const uchar* src_row = src + i * src_step;
int vl;
for (int j = 0; j < width * 4; j += vl)
{
vl = __riscv_vsetvl_e8m8(width * 4 - j);
auto vec_src = __riscv_vle8_v_u8m8(src_row + j, vl);
vec_max = __riscv_vmaxu_tu(vec_max, vec_max, vec_src, vl);
}
}
}
auto sc_max = __riscv_vmv_s_x_u8m1(0, vlmax);
sc_max = __riscv_vredmaxu(vec_max, sc_max, vlmax);
*result = __riscv_vmv_x(sc_max);
return CV_HAL_ERROR_OK;
}
inline int normL1_8UC4(const uchar* src, size_t src_step, const uchar* mask, size_t mask_step, int width, int height, double* result)
{
int vlmax = __riscv_vsetvlmax_e8m2();
auto vec_sum = __riscv_vmv_v_x_u32m8(0, vlmax);
if (mask)
{
for (int i = 0; i < height; i++)
{
const uchar* src_row = src + i * src_step;
const uchar* mask_row = mask + i * mask_step;
int vl, vlm;
for (int j = 0, jm = 0; j < width * 4; j += vl, jm += vlm)
{
vl = __riscv_vsetvl_e8m2(width * 4 - j);
vlm = __riscv_vsetvl_e8mf2(width - jm);
auto vec_src = __riscv_vle8_v_u8m2(src_row + j, vl);
auto vec_mask = __riscv_vle8_v_u8mf2(mask_row + jm, vlm);
auto vec_mask_ext = __riscv_vmul(__riscv_vzext_vf4(__riscv_vminu(vec_mask, 1, vlm), vlm), 0x01010101, vlm);
auto bool_mask_ext = __riscv_vmsne(__riscv_vreinterpret_u8m2(vec_mask_ext), 0, vl);
auto vec_zext = __riscv_vzext_vf4_u32m8_m(bool_mask_ext, vec_src, vl);
vec_sum = __riscv_vadd_tumu(bool_mask_ext, vec_sum, vec_sum, vec_zext, vl);
}
}
}
else
{
for (int i = 0; i < height; i++)
{
const uchar* src_row = src + i * src_step;
int vl;
for (int j = 0; j < width * 4; j += vl)
{
vl = __riscv_vsetvl_e8m2(width * 4 - j);
auto vec_src = __riscv_vle8_v_u8m2(src_row + j, vl);
auto vec_zext = __riscv_vzext_vf4(vec_src, vl);
vec_sum = __riscv_vadd_tu(vec_sum, vec_sum, vec_zext, vl);
}
}
}
auto sc_sum = __riscv_vmv_s_x_u32m1(0, vlmax);
sc_sum = __riscv_vredsum(vec_sum, sc_sum, vlmax);
*result = __riscv_vmv_x(sc_sum);
return CV_HAL_ERROR_OK;
}
inline int normL2Sqr_8UC4(const uchar* src, size_t src_step, const uchar* mask, size_t mask_step, int width, int height, double* result)
{
int vlmax = __riscv_vsetvlmax_e8m2();
auto vec_sum = __riscv_vmv_v_x_u32m8(0, vlmax);
int cnt = 0;
auto reduce = [&](int vl) {
if ((cnt += vl) < (1 << 16))
return;
cnt = vl;
for (int i = 0; i < vlmax; i++)
{
*result += __riscv_vmv_x(vec_sum);
vec_sum = __riscv_vslidedown(vec_sum, 1, vlmax);
}
vec_sum = __riscv_vmv_v_x_u32m8(0, vlmax);
};
*result = 0;
if (mask)
{
for (int i = 0; i < height; i++)
{
const uchar* src_row = src + i * src_step;
const uchar* mask_row = mask + i * mask_step;
int vl, vlm;
for (int j = 0, jm = 0; j < width * 4; j += vl, jm += vlm)
{
vl = __riscv_vsetvl_e8m2(width * 4 - j);
vlm = __riscv_vsetvl_e8mf2(width - jm);
reduce(vl);
auto vec_src = __riscv_vle8_v_u8m2(src_row + j, vl);
auto vec_mask = __riscv_vle8_v_u8mf2(mask_row + jm, vlm);
auto vec_mask_ext = __riscv_vmul(__riscv_vzext_vf4(__riscv_vminu(vec_mask, 1, vlm), vlm), 0x01010101, vlm);
auto bool_mask_ext = __riscv_vmsne(__riscv_vreinterpret_u8m2(vec_mask_ext), 0, vl);
auto vec_mul = __riscv_vwmulu_vv_u16m4_m(bool_mask_ext, vec_src, vec_src, vl);
auto vec_zext = __riscv_vzext_vf2_u32m8_m(bool_mask_ext, vec_mul, vl);
vec_sum = __riscv_vadd_tumu(bool_mask_ext, vec_sum, vec_sum, vec_zext, vl);
}
}
}
else
{
for (int i = 0; i < height; i++)
{
const uchar* src_row = src + i * src_step;
int vl;
for (int j = 0; j < width * 4; j += vl)
{
vl = __riscv_vsetvl_e8m2(width * 4 - j);
reduce(vl);
auto vec_src = __riscv_vle8_v_u8m2(src_row + j, vl);
auto vec_mul = __riscv_vwmulu(vec_src, vec_src, vl);
auto vec_zext = __riscv_vzext_vf2(vec_mul, vl);
vec_sum = __riscv_vadd_tu(vec_sum, vec_sum, vec_zext, vl);
}
}
}
reduce(1 << 16);
return CV_HAL_ERROR_OK;
}
inline int normInf_32FC1(const uchar* src, size_t src_step, const uchar* mask, size_t mask_step, int width, int height, double* result)
{
int vlmax = __riscv_vsetvlmax_e32m8();
auto vec_max = __riscv_vfmv_v_f_f32m8(0, vlmax);
if (mask)
{
for (int i = 0; i < height; i++)
{
const float* src_row = reinterpret_cast<const float*>(src + i * src_step);
const uchar* mask_row = mask + i * mask_step;
int vl;
for (int j = 0; j < width; j += vl)
{
vl = __riscv_vsetvl_e32m8(width - j);
auto vec_src = __riscv_vle32_v_f32m8(src_row + j, vl);
auto vec_mask = __riscv_vle8_v_u8m2(mask_row + j, vl);
auto bool_mask = __riscv_vmsne(vec_mask, 0, vl);
auto vec_abs = __riscv_vfabs_v_f32m8_m(bool_mask, vec_src, vl);
vec_max = __riscv_vfmax_tumu(bool_mask, vec_max, vec_max, vec_abs, vl);
}
}
}
else
{
for (int i = 0; i < height; i++)
{
const float* src_row = reinterpret_cast<const float*>(src + i * src_step);
int vl;
for (int j = 0; j < width; j += vl)
{
vl = __riscv_vsetvl_e32m8(width - j);
auto vec_src = __riscv_vle32_v_f32m8(src_row + j, vl);
auto vec_abs = __riscv_vfabs(vec_src, vl);
vec_max = __riscv_vfmax_tu(vec_max, vec_max, vec_abs, vl);
}
}
}
auto sc_max = __riscv_vfmv_s_f_f32m1(0, vlmax);
sc_max = __riscv_vfredmax(vec_max, sc_max, vlmax);
*result = __riscv_vfmv_f(sc_max);
return CV_HAL_ERROR_OK;
}
inline int normL1_32FC1(const uchar* src, size_t src_step, const uchar* mask, size_t mask_step, int width, int height, double* result)
{
int vlmax = __riscv_vsetvlmax_e32m4();
auto vec_sum = __riscv_vfmv_v_f_f64m8(0, vlmax);
if (mask)
{
for (int i = 0; i < height; i++)
{
const float* src_row = reinterpret_cast<const float*>(src + i * src_step);
const uchar* mask_row = mask + i * mask_step;
int vl;
for (int j = 0; j < width; j += vl)
{
vl = __riscv_vsetvl_e32m4(width - j);
auto vec_src = __riscv_vle32_v_f32m4(src_row + j, vl);
auto vec_mask = __riscv_vle8_v_u8m1(mask_row + j, vl);
auto bool_mask = __riscv_vmsne(vec_mask, 0, vl);
auto vec_abs = __riscv_vfabs_v_f32m4_m(bool_mask, vec_src, vl);
auto vec_fext = __riscv_vfwcvt_f_f_v_f64m8_m(bool_mask, vec_abs, vl);
vec_sum = __riscv_vfadd_tumu(bool_mask, vec_sum, vec_sum, vec_fext, vl);
}
}
}
else
{
for (int i = 0; i < height; i++)
{
const float* src_row = reinterpret_cast<const float*>(src + i * src_step);
int vl;
for (int j = 0; j < width; j += vl)
{
vl = __riscv_vsetvl_e32m4(width - j);
auto vec_src = __riscv_vle32_v_f32m4(src_row + j, vl);
auto vec_abs = __riscv_vfabs(vec_src, vl);
auto vec_fext = __riscv_vfwcvt_f_f_v_f64m8(vec_abs, vl);
vec_sum = __riscv_vfadd_tu(vec_sum, vec_sum, vec_fext, vl);
}
}
}
auto sc_sum = __riscv_vfmv_s_f_f64m1(0, vlmax);
sc_sum = __riscv_vfredosum(vec_sum, sc_sum, vlmax);
*result = __riscv_vfmv_f(sc_sum);
return CV_HAL_ERROR_OK;
}
inline int normL2Sqr_32FC1(const uchar* src, size_t src_step, const uchar* mask, size_t mask_step, int width, int height, double* result)
{
int vlmax = __riscv_vsetvlmax_e32m4();
auto vec_sum = __riscv_vfmv_v_f_f64m8(0, vlmax);
if (mask)
{
for (int i = 0; i < height; i++)
{
const float* src_row = reinterpret_cast<const float*>(src + i * src_step);
const uchar* mask_row = mask + i * mask_step;
int vl;
for (int j = 0; j < width; j += vl)
{
vl = __riscv_vsetvl_e32m4(width - j);
auto vec_src = __riscv_vle32_v_f32m4(src_row + j, vl);
auto vec_mask = __riscv_vle8_v_u8m1(mask_row + j, vl);
auto bool_mask = __riscv_vmsne(vec_mask, 0, vl);
auto vec_mul = __riscv_vfwmul_vv_f64m8_m(bool_mask, vec_src, vec_src, vl);
vec_sum = __riscv_vfadd_tumu(bool_mask, vec_sum, vec_sum, vec_mul, vl);
}
}
}
else
{
for (int i = 0; i < height; i++)
{
const float* src_row = reinterpret_cast<const float*>(src + i * src_step);
int vl;
for (int j = 0; j < width; j += vl)
{
vl = __riscv_vsetvl_e32m4(width - j);
auto vec_src = __riscv_vle32_v_f32m4(src_row + j, vl);
auto vec_mul = __riscv_vfwmul(vec_src, vec_src, vl);
vec_sum = __riscv_vfadd_tu(vec_sum, vec_sum, vec_mul, vl);
}
}
}
auto sc_sum = __riscv_vfmv_s_f_f64m1(0, vlmax);
sc_sum = __riscv_vfredosum(vec_sum, sc_sum, vlmax);
*result = __riscv_vfmv_f(sc_sum);
return CV_HAL_ERROR_OK;
}
inline int norm(const uchar* src, size_t src_step, const uchar* mask, size_t mask_step, int width,
int height, int type, int norm_type, double* result)
{
if (!result)
return CV_HAL_ERROR_OK;
switch (type)
{
case CV_8UC1:
switch (norm_type)
{
case NORM_INF:
return normInf_8UC1(src, src_step, mask, mask_step, width, height, result);
case NORM_L1:
return normL1_8UC1(src, src_step, mask, mask_step, width, height, result);
case NORM_L2SQR:
return normL2Sqr_8UC1(src, src_step, mask, mask_step, width, height, result);
case NORM_L2:
int ret = normL2Sqr_8UC1(src, src_step, mask, mask_step, width, height, result);
*result = std::sqrt(*result);
return ret;
}
return CV_HAL_ERROR_NOT_IMPLEMENTED;
case CV_8UC4:
switch (norm_type)
{
case NORM_INF:
return normInf_8UC4(src, src_step, mask, mask_step, width, height, result);
case NORM_L1:
return normL1_8UC4(src, src_step, mask, mask_step, width, height, result);
case NORM_L2SQR:
return normL2Sqr_8UC4(src, src_step, mask, mask_step, width, height, result);
case NORM_L2:
int ret = normL2Sqr_8UC4(src, src_step, mask, mask_step, width, height, result);
*result = std::sqrt(*result);
return ret;
}
return CV_HAL_ERROR_NOT_IMPLEMENTED;
case CV_32FC1:
switch (norm_type)
{
case NORM_INF:
return normInf_32FC1(src, src_step, mask, mask_step, width, height, result);
case NORM_L1:
return normL1_32FC1(src, src_step, mask, mask_step, width, height, result);
case NORM_L2SQR:
return normL2Sqr_32FC1(src, src_step, mask, mask_step, width, height, result);
case NORM_L2:
int ret = normL2Sqr_32FC1(src, src_step, mask, mask_step, width, height, result);
*result = std::sqrt(*result);
return ret;
}
return CV_HAL_ERROR_NOT_IMPLEMENTED;
}
return CV_HAL_ERROR_NOT_IMPLEMENTED;
}
}}
#endif

View File

@ -0,0 +1,605 @@
// This file is part of OpenCV project.
// It is subject to the license terms in the LICENSE file found in the top-level directory
// of this distribution and at http://opencv.org/license.html.
#ifndef OPENCV_HAL_RVV_NORM_DIFF_HPP_INCLUDED
#define OPENCV_HAL_RVV_NORM_DIFF_HPP_INCLUDED
#include <riscv_vector.h>
namespace cv { namespace cv_hal_rvv {
#undef cv_hal_normDiff
#define cv_hal_normDiff cv::cv_hal_rvv::normDiff
inline int normDiffInf_8UC1(const uchar* src1, size_t src1_step, const uchar* src2, size_t src2_step, const uchar* mask, size_t mask_step, int width, int height, double* result)
{
int vlmax = __riscv_vsetvlmax_e8m8();
auto vec_max = __riscv_vmv_v_x_u8m8(0, vlmax);
if (mask)
{
for (int i = 0; i < height; i++)
{
const uchar* src1_row = src1 + i * src1_step;
const uchar* src2_row = src2 + i * src2_step;
const uchar* mask_row = mask + i * mask_step;
int vl;
for (int j = 0; j < width; j += vl)
{
vl = __riscv_vsetvl_e8m8(width - j);
auto vec_src1 = __riscv_vle8_v_u8m8(src1_row + j, vl);
auto vec_src2 = __riscv_vle8_v_u8m8(src2_row + j, vl);
auto vec_mask = __riscv_vle8_v_u8m8(mask_row + j, vl);
auto bool_mask = __riscv_vmsne(vec_mask, 0, vl);
auto vec_src = __riscv_vsub_vv_u8m8_m(bool_mask, __riscv_vmaxu_vv_u8m8_m(bool_mask, vec_src1, vec_src2, vl),
__riscv_vminu_vv_u8m8_m(bool_mask, vec_src1, vec_src2, vl), vl);
vec_max = __riscv_vmaxu_tumu(bool_mask, vec_max, vec_max, vec_src, vl);
}
}
}
else
{
for (int i = 0; i < height; i++)
{
const uchar* src1_row = src1 + i * src1_step;
const uchar* src2_row = src2 + i * src2_step;
int vl;
for (int j = 0; j < width; j += vl)
{
vl = __riscv_vsetvl_e8m8(width - j);
auto vec_src1 = __riscv_vle8_v_u8m8(src1_row + j, vl);
auto vec_src2 = __riscv_vle8_v_u8m8(src2_row + j, vl);
auto vec_src = __riscv_vsub(__riscv_vmaxu(vec_src1, vec_src2, vl), __riscv_vminu(vec_src1, vec_src2, vl), vl);
vec_max = __riscv_vmaxu_tu(vec_max, vec_max, vec_src, vl);
}
}
}
auto sc_max = __riscv_vmv_s_x_u8m1(0, vlmax);
sc_max = __riscv_vredmaxu(vec_max, sc_max, vlmax);
*result = __riscv_vmv_x(sc_max);
return CV_HAL_ERROR_OK;
}
inline int normDiffL1_8UC1(const uchar* src1, size_t src1_step, const uchar* src2, size_t src2_step, const uchar* mask, size_t mask_step, int width, int height, double* result)
{
int vlmax = __riscv_vsetvlmax_e8m2();
auto vec_sum = __riscv_vmv_v_x_u32m8(0, vlmax);
if (mask)
{
for (int i = 0; i < height; i++)
{
const uchar* src1_row = src1 + i * src1_step;
const uchar* src2_row = src2 + i * src2_step;
const uchar* mask_row = mask + i * mask_step;
int vl;
for (int j = 0; j < width; j += vl)
{
vl = __riscv_vsetvl_e8m2(width - j);
auto vec_src1 = __riscv_vle8_v_u8m2(src1_row + j, vl);
auto vec_src2 = __riscv_vle8_v_u8m2(src2_row + j, vl);
auto vec_mask = __riscv_vle8_v_u8m2(mask_row + j, vl);
auto bool_mask = __riscv_vmsne(vec_mask, 0, vl);
auto vec_src = __riscv_vsub_vv_u8m2_m(bool_mask, __riscv_vmaxu_vv_u8m2_m(bool_mask, vec_src1, vec_src2, vl),
__riscv_vminu_vv_u8m2_m(bool_mask, vec_src1, vec_src2, vl), vl);
auto vec_zext = __riscv_vzext_vf4_u32m8_m(bool_mask, vec_src, vl);
vec_sum = __riscv_vadd_tumu(bool_mask, vec_sum, vec_sum, vec_zext, vl);
}
}
}
else
{
for (int i = 0; i < height; i++)
{
const uchar* src1_row = src1 + i * src1_step;
const uchar* src2_row = src2 + i * src2_step;
int vl;
for (int j = 0; j < width; j += vl)
{
vl = __riscv_vsetvl_e8m2(width - j);
auto vec_src1 = __riscv_vle8_v_u8m2(src1_row + j, vl);
auto vec_src2 = __riscv_vle8_v_u8m2(src2_row + j, vl);
auto vec_src = __riscv_vsub(__riscv_vmaxu(vec_src1, vec_src2, vl), __riscv_vminu(vec_src1, vec_src2, vl), vl);
auto vec_zext = __riscv_vzext_vf4(vec_src, vl);
vec_sum = __riscv_vadd_tu(vec_sum, vec_sum, vec_zext, vl);
}
}
}
auto sc_sum = __riscv_vmv_s_x_u32m1(0, vlmax);
sc_sum = __riscv_vredsum(vec_sum, sc_sum, vlmax);
*result = __riscv_vmv_x(sc_sum);
return CV_HAL_ERROR_OK;
}
inline int normDiffL2Sqr_8UC1(const uchar* src1, size_t src1_step, const uchar* src2, size_t src2_step, const uchar* mask, size_t mask_step, int width, int height, double* result)
{
int vlmax = __riscv_vsetvlmax_e8m2();
auto vec_sum = __riscv_vmv_v_x_u32m8(0, vlmax);
int cnt = 0;
auto reduce = [&](int vl) {
if ((cnt += vl) < (1 << 16))
return;
cnt = vl;
for (int i = 0; i < vlmax; i++)
{
*result += __riscv_vmv_x(vec_sum);
vec_sum = __riscv_vslidedown(vec_sum, 1, vlmax);
}
vec_sum = __riscv_vmv_v_x_u32m8(0, vlmax);
};
*result = 0;
if (mask)
{
for (int i = 0; i < height; i++)
{
const uchar* src1_row = src1 + i * src1_step;
const uchar* src2_row = src2 + i * src2_step;
const uchar* mask_row = mask + i * mask_step;
int vl;
for (int j = 0; j < width; j += vl)
{
vl = __riscv_vsetvl_e8m2(width - j);
reduce(vl);
auto vec_src1 = __riscv_vle8_v_u8m2(src1_row + j, vl);
auto vec_src2 = __riscv_vle8_v_u8m2(src2_row + j, vl);
auto vec_mask = __riscv_vle8_v_u8m2(mask_row + j, vl);
auto bool_mask = __riscv_vmsne(vec_mask, 0, vl);
auto vec_src = __riscv_vsub_vv_u8m2_m(bool_mask, __riscv_vmaxu_vv_u8m2_m(bool_mask, vec_src1, vec_src2, vl),
__riscv_vminu_vv_u8m2_m(bool_mask, vec_src1, vec_src2, vl), vl);
auto vec_mul = __riscv_vwmulu_vv_u16m4_m(bool_mask, vec_src, vec_src, vl);
auto vec_zext = __riscv_vzext_vf2_u32m8_m(bool_mask, vec_mul, vl);
vec_sum = __riscv_vadd_tumu(bool_mask, vec_sum, vec_sum, vec_zext, vl);
}
}
}
else
{
for (int i = 0; i < height; i++)
{
const uchar* src1_row = src1 + i * src1_step;
const uchar* src2_row = src2 + i * src2_step;
int vl;
for (int j = 0; j < width; j += vl)
{
vl = __riscv_vsetvl_e8m2(width - j);
reduce(vl);
auto vec_src1 = __riscv_vle8_v_u8m2(src1_row + j, vl);
auto vec_src2 = __riscv_vle8_v_u8m2(src2_row + j, vl);
auto vec_src = __riscv_vsub(__riscv_vmaxu(vec_src1, vec_src2, vl), __riscv_vminu(vec_src1, vec_src2, vl), vl);
auto vec_mul = __riscv_vwmulu(vec_src, vec_src, vl);
auto vec_zext = __riscv_vzext_vf2(vec_mul, vl);
vec_sum = __riscv_vadd_tu(vec_sum, vec_sum, vec_zext, vl);
}
}
}
reduce(1 << 16);
return CV_HAL_ERROR_OK;
}
inline int normDiffInf_8UC4(const uchar* src1, size_t src1_step, const uchar* src2, size_t src2_step, const uchar* mask, size_t mask_step, int width, int height, double* result)
{
int vlmax = __riscv_vsetvlmax_e8m8();
auto vec_max = __riscv_vmv_v_x_u8m8(0, vlmax);
if (mask)
{
for (int i = 0; i < height; i++)
{
const uchar* src1_row = src1 + i * src1_step;
const uchar* src2_row = src2 + i * src2_step;
const uchar* mask_row = mask + i * mask_step;
int vl, vlm;
for (int j = 0, jm = 0; j < width * 4; j += vl, jm += vlm)
{
vl = __riscv_vsetvl_e8m8(width * 4 - j);
vlm = __riscv_vsetvl_e8m2(width - jm);
auto vec_src1 = __riscv_vle8_v_u8m8(src1_row + j, vl);
auto vec_src2 = __riscv_vle8_v_u8m8(src2_row + j, vl);
auto vec_mask = __riscv_vle8_v_u8m2(mask_row + jm, vlm);
auto vec_mask_ext = __riscv_vmul(__riscv_vzext_vf4(__riscv_vminu(vec_mask, 1, vlm), vlm), 0x01010101, vlm);
auto bool_mask_ext = __riscv_vmsne(__riscv_vreinterpret_u8m8(vec_mask_ext), 0, vl);
auto vec_src = __riscv_vsub_vv_u8m8_m(bool_mask_ext, __riscv_vmaxu_vv_u8m8_m(bool_mask_ext, vec_src1, vec_src2, vl),
__riscv_vminu_vv_u8m8_m(bool_mask_ext, vec_src1, vec_src2, vl), vl);
vec_max = __riscv_vmaxu_tumu(bool_mask_ext, vec_max, vec_max, vec_src, vl);
}
}
}
else
{
for (int i = 0; i < height; i++)
{
const uchar* src1_row = src1 + i * src1_step;
const uchar* src2_row = src2 + i * src2_step;
int vl;
for (int j = 0; j < width * 4; j += vl)
{
vl = __riscv_vsetvl_e8m8(width * 4 - j);
auto vec_src1 = __riscv_vle8_v_u8m8(src1_row + j, vl);
auto vec_src2 = __riscv_vle8_v_u8m8(src2_row + j, vl);
auto vec_src = __riscv_vsub(__riscv_vmaxu(vec_src1, vec_src2, vl), __riscv_vminu(vec_src1, vec_src2, vl), vl);
vec_max = __riscv_vmaxu_tu(vec_max, vec_max, vec_src, vl);
}
}
}
auto sc_max = __riscv_vmv_s_x_u8m1(0, vlmax);
sc_max = __riscv_vredmaxu(vec_max, sc_max, vlmax);
*result = __riscv_vmv_x(sc_max);
return CV_HAL_ERROR_OK;
}
inline int normDiffL1_8UC4(const uchar* src1, size_t src1_step, const uchar* src2, size_t src2_step, const uchar* mask, size_t mask_step, int width, int height, double* result)
{
int vlmax = __riscv_vsetvlmax_e8m2();
auto vec_sum = __riscv_vmv_v_x_u32m8(0, vlmax);
if (mask)
{
for (int i = 0; i < height; i++)
{
const uchar* src1_row = src1 + i * src1_step;
const uchar* src2_row = src2 + i * src2_step;
const uchar* mask_row = mask + i * mask_step;
int vl, vlm;
for (int j = 0, jm = 0; j < width * 4; j += vl, jm += vlm)
{
vl = __riscv_vsetvl_e8m2(width * 4 - j);
vlm = __riscv_vsetvl_e8mf2(width - jm);
auto vec_src1 = __riscv_vle8_v_u8m2(src1_row + j, vl);
auto vec_src2 = __riscv_vle8_v_u8m2(src2_row + j, vl);
auto vec_mask = __riscv_vle8_v_u8mf2(mask_row + jm, vlm);
auto vec_mask_ext = __riscv_vmul(__riscv_vzext_vf4(__riscv_vminu(vec_mask, 1, vlm), vlm), 0x01010101, vlm);
auto bool_mask_ext = __riscv_vmsne(__riscv_vreinterpret_u8m2(vec_mask_ext), 0, vl);
auto vec_src = __riscv_vsub_vv_u8m2_m(bool_mask_ext, __riscv_vmaxu_vv_u8m2_m(bool_mask_ext, vec_src1, vec_src2, vl),
__riscv_vminu_vv_u8m2_m(bool_mask_ext, vec_src1, vec_src2, vl), vl);
auto vec_zext = __riscv_vzext_vf4_u32m8_m(bool_mask_ext, vec_src, vl);
vec_sum = __riscv_vadd_tumu(bool_mask_ext, vec_sum, vec_sum, vec_zext, vl);
}
}
}
else
{
for (int i = 0; i < height; i++)
{
const uchar* src1_row = src1 + i * src1_step;
const uchar* src2_row = src2 + i * src2_step;
int vl;
for (int j = 0; j < width * 4; j += vl)
{
vl = __riscv_vsetvl_e8m2(width * 4 - j);
auto vec_src1 = __riscv_vle8_v_u8m2(src1_row + j, vl);
auto vec_src2 = __riscv_vle8_v_u8m2(src2_row + j, vl);
auto vec_src = __riscv_vsub(__riscv_vmaxu(vec_src1, vec_src2, vl), __riscv_vminu(vec_src1, vec_src2, vl), vl);
auto vec_zext = __riscv_vzext_vf4(vec_src, vl);
vec_sum = __riscv_vadd_tu(vec_sum, vec_sum, vec_zext, vl);
}
}
}
auto sc_sum = __riscv_vmv_s_x_u32m1(0, vlmax);
sc_sum = __riscv_vredsum(vec_sum, sc_sum, vlmax);
*result = __riscv_vmv_x(sc_sum);
return CV_HAL_ERROR_OK;
}
inline int normDiffL2Sqr_8UC4(const uchar* src1, size_t src1_step, const uchar* src2, size_t src2_step, const uchar* mask, size_t mask_step, int width, int height, double* result)
{
int vlmax = __riscv_vsetvlmax_e8m2();
auto vec_sum = __riscv_vmv_v_x_u32m8(0, vlmax);
int cnt = 0;
auto reduce = [&](int vl) {
if ((cnt += vl) < (1 << 16))
return;
cnt = vl;
for (int i = 0; i < vlmax; i++)
{
*result += __riscv_vmv_x(vec_sum);
vec_sum = __riscv_vslidedown(vec_sum, 1, vlmax);
}
vec_sum = __riscv_vmv_v_x_u32m8(0, vlmax);
};
*result = 0;
if (mask)
{
for (int i = 0; i < height; i++)
{
const uchar* src1_row = src1 + i * src1_step;
const uchar* src2_row = src2 + i * src2_step;
const uchar* mask_row = mask + i * mask_step;
int vl, vlm;
for (int j = 0, jm = 0; j < width * 4; j += vl, jm += vlm)
{
vl = __riscv_vsetvl_e8m2(width * 4 - j);
vlm = __riscv_vsetvl_e8mf2(width - jm);
reduce(vl);
auto vec_src1 = __riscv_vle8_v_u8m2(src1_row + j, vl);
auto vec_src2 = __riscv_vle8_v_u8m2(src2_row + j, vl);
auto vec_mask = __riscv_vle8_v_u8mf2(mask_row + jm, vlm);
auto vec_mask_ext = __riscv_vmul(__riscv_vzext_vf4(__riscv_vminu(vec_mask, 1, vlm), vlm), 0x01010101, vlm);
auto bool_mask_ext = __riscv_vmsne(__riscv_vreinterpret_u8m2(vec_mask_ext), 0, vl);
auto vec_src = __riscv_vsub_vv_u8m2_m(bool_mask_ext, __riscv_vmaxu_vv_u8m2_m(bool_mask_ext, vec_src1, vec_src2, vl),
__riscv_vminu_vv_u8m2_m(bool_mask_ext, vec_src1, vec_src2, vl), vl);
auto vec_mul = __riscv_vwmulu_vv_u16m4_m(bool_mask_ext, vec_src, vec_src, vl);
auto vec_zext = __riscv_vzext_vf2_u32m8_m(bool_mask_ext, vec_mul, vl);
vec_sum = __riscv_vadd_tumu(bool_mask_ext, vec_sum, vec_sum, vec_zext, vl);
}
}
}
else
{
for (int i = 0; i < height; i++)
{
const uchar* src1_row = src1 + i * src1_step;
const uchar* src2_row = src2 + i * src2_step;
int vl;
for (int j = 0; j < width * 4; j += vl)
{
vl = __riscv_vsetvl_e8m2(width * 4 - j);
reduce(vl);
auto vec_src1 = __riscv_vle8_v_u8m2(src1_row + j, vl);
auto vec_src2 = __riscv_vle8_v_u8m2(src2_row + j, vl);
auto vec_src = __riscv_vsub(__riscv_vmaxu(vec_src1, vec_src2, vl), __riscv_vminu(vec_src1, vec_src2, vl), vl);
auto vec_mul = __riscv_vwmulu(vec_src, vec_src, vl);
auto vec_zext = __riscv_vzext_vf2(vec_mul, vl);
vec_sum = __riscv_vadd_tu(vec_sum, vec_sum, vec_zext, vl);
}
}
}
reduce(1 << 16);
return CV_HAL_ERROR_OK;
}
inline int normDiffInf_32FC1(const uchar* src1, size_t src1_step, const uchar* src2, size_t src2_step, const uchar* mask, size_t mask_step, int width, int height, double* result)
{
int vlmax = __riscv_vsetvlmax_e32m8();
auto vec_max = __riscv_vfmv_v_f_f32m8(0, vlmax);
if (mask)
{
for (int i = 0; i < height; i++)
{
const float* src1_row = reinterpret_cast<const float*>(src1 + i * src1_step);
const float* src2_row = reinterpret_cast<const float*>(src2 + i * src2_step);
const uchar* mask_row = mask + i * mask_step;
int vl;
for (int j = 0; j < width; j += vl)
{
vl = __riscv_vsetvl_e32m8(width - j);
auto vec_src1 = __riscv_vle32_v_f32m8(src1_row + j, vl);
auto vec_src2 = __riscv_vle32_v_f32m8(src2_row + j, vl);
auto vec_mask = __riscv_vle8_v_u8m2(mask_row + j, vl);
auto bool_mask = __riscv_vmsne(vec_mask, 0, vl);
auto vec_src = __riscv_vfsub_vv_f32m8_m(bool_mask, vec_src1, vec_src2, vl);
auto vec_abs = __riscv_vfabs_v_f32m8_m(bool_mask, vec_src, vl);
vec_max = __riscv_vfmax_tumu(bool_mask, vec_max, vec_max, vec_abs, vl);
}
}
}
else
{
for (int i = 0; i < height; i++)
{
const float* src1_row = reinterpret_cast<const float*>(src1 + i * src1_step);
const float* src2_row = reinterpret_cast<const float*>(src2 + i * src2_step);
int vl;
for (int j = 0; j < width; j += vl)
{
vl = __riscv_vsetvl_e32m8(width - j);
auto vec_src1 = __riscv_vle32_v_f32m8(src1_row + j, vl);
auto vec_src2 = __riscv_vle32_v_f32m8(src2_row + j, vl);
auto vec_src = __riscv_vfsub(vec_src1, vec_src2, vl);
auto vec_abs = __riscv_vfabs(vec_src, vl);
vec_max = __riscv_vfmax_tu(vec_max, vec_max, vec_abs, vl);
}
}
}
auto sc_max = __riscv_vfmv_s_f_f32m1(0, vlmax);
sc_max = __riscv_vfredmax(vec_max, sc_max, vlmax);
*result = __riscv_vfmv_f(sc_max);
return CV_HAL_ERROR_OK;
}
inline int normDiffL1_32FC1(const uchar* src1, size_t src1_step, const uchar* src2, size_t src2_step, const uchar* mask, size_t mask_step, int width, int height, double* result)
{
int vlmax = __riscv_vsetvlmax_e32m4();
auto vec_sum = __riscv_vfmv_v_f_f64m8(0, vlmax);
if (mask)
{
for (int i = 0; i < height; i++)
{
const float* src1_row = reinterpret_cast<const float*>(src1 + i * src1_step);
const float* src2_row = reinterpret_cast<const float*>(src2 + i * src2_step);
const uchar* mask_row = mask + i * mask_step;
int vl;
for (int j = 0; j < width; j += vl)
{
vl = __riscv_vsetvl_e32m4(width - j);
auto vec_src1 = __riscv_vle32_v_f32m4(src1_row + j, vl);
auto vec_src2 = __riscv_vle32_v_f32m4(src2_row + j, vl);
auto vec_mask = __riscv_vle8_v_u8m1(mask_row + j, vl);
auto bool_mask = __riscv_vmsne(vec_mask, 0, vl);
auto vec_src = __riscv_vfsub_vv_f32m4_m(bool_mask, vec_src1, vec_src2, vl);
auto vec_abs = __riscv_vfabs_v_f32m4_m(bool_mask, vec_src, vl);
auto vec_fext = __riscv_vfwcvt_f_f_v_f64m8_m(bool_mask, vec_abs, vl);
vec_sum = __riscv_vfadd_tumu(bool_mask, vec_sum, vec_sum, vec_fext, vl);
}
}
}
else
{
for (int i = 0; i < height; i++)
{
const float* src1_row = reinterpret_cast<const float*>(src1 + i * src1_step);
const float* src2_row = reinterpret_cast<const float*>(src2 + i * src2_step);
int vl;
for (int j = 0; j < width; j += vl)
{
vl = __riscv_vsetvl_e32m4(width - j);
auto vec_src1 = __riscv_vle32_v_f32m4(src1_row + j, vl);
auto vec_src2 = __riscv_vle32_v_f32m4(src2_row + j, vl);
auto vec_src = __riscv_vfsub(vec_src1, vec_src2, vl);
auto vec_abs = __riscv_vfabs(vec_src, vl);
auto vec_fext = __riscv_vfwcvt_f_f_v_f64m8(vec_abs, vl);
vec_sum = __riscv_vfadd_tu(vec_sum, vec_sum, vec_fext, vl);
}
}
}
auto sc_sum = __riscv_vfmv_s_f_f64m1(0, vlmax);
sc_sum = __riscv_vfredosum(vec_sum, sc_sum, vlmax);
*result = __riscv_vfmv_f(sc_sum);
return CV_HAL_ERROR_OK;
}
inline int normDiffL2Sqr_32FC1(const uchar* src1, size_t src1_step, const uchar* src2, size_t src2_step, const uchar* mask, size_t mask_step, int width, int height, double* result)
{
int vlmax = __riscv_vsetvlmax_e32m4();
auto vec_sum = __riscv_vfmv_v_f_f64m8(0, vlmax);
if (mask)
{
for (int i = 0; i < height; i++)
{
const float* src1_row = reinterpret_cast<const float*>(src1 + i * src1_step);
const float* src2_row = reinterpret_cast<const float*>(src2 + i * src2_step);
const uchar* mask_row = mask + i * mask_step;
int vl;
for (int j = 0; j < width; j += vl)
{
vl = __riscv_vsetvl_e32m4(width - j);
auto vec_src1 = __riscv_vle32_v_f32m4(src1_row + j, vl);
auto vec_src2 = __riscv_vle32_v_f32m4(src2_row + j, vl);
auto vec_mask = __riscv_vle8_v_u8m1(mask_row + j, vl);
auto bool_mask = __riscv_vmsne(vec_mask, 0, vl);
auto vec_src = __riscv_vfsub_vv_f32m4_m(bool_mask, vec_src1, vec_src2, vl);
auto vec_mul = __riscv_vfwmul_vv_f64m8_m(bool_mask, vec_src, vec_src, vl);
vec_sum = __riscv_vfadd_tumu(bool_mask, vec_sum, vec_sum, vec_mul, vl);
}
}
}
else
{
for (int i = 0; i < height; i++)
{
const float* src1_row = reinterpret_cast<const float*>(src1 + i * src1_step);
const float* src2_row = reinterpret_cast<const float*>(src2 + i * src2_step);
int vl;
for (int j = 0; j < width; j += vl)
{
vl = __riscv_vsetvl_e32m4(width - j);
auto vec_src1 = __riscv_vle32_v_f32m4(src1_row + j, vl);
auto vec_src2 = __riscv_vle32_v_f32m4(src2_row + j, vl);
auto vec_src = __riscv_vfsub(vec_src1, vec_src2, vl);
auto vec_mul = __riscv_vfwmul(vec_src, vec_src, vl);
vec_sum = __riscv_vfadd_tu(vec_sum, vec_sum, vec_mul, vl);
}
}
}
auto sc_sum = __riscv_vfmv_s_f_f64m1(0, vlmax);
sc_sum = __riscv_vfredosum(vec_sum, sc_sum, vlmax);
*result = __riscv_vfmv_f(sc_sum);
return CV_HAL_ERROR_OK;
}
inline int normDiff(const uchar* src1, size_t src1_step, const uchar* src2, size_t src2_step, const uchar* mask,
size_t mask_step, int width, int height, int type, int norm_type, double* result)
{
if (!result)
return CV_HAL_ERROR_OK;
int ret;
switch (type)
{
case CV_8UC1:
switch (norm_type & ~NORM_RELATIVE)
{
case NORM_INF:
ret = normDiffInf_8UC1(src1, src1_step, src2, src2_step, mask, mask_step, width, height, result);
break;
case NORM_L1:
ret = normDiffL1_8UC1(src1, src1_step, src2, src2_step, mask, mask_step, width, height, result);
break;
case NORM_L2SQR:
ret = normDiffL2Sqr_8UC1(src1, src1_step, src2, src2_step, mask, mask_step, width, height, result);
break;
case NORM_L2:
ret = normDiffL2Sqr_8UC1(src1, src1_step, src2, src2_step, mask, mask_step, width, height, result);
*result = std::sqrt(*result);
break;
default:
ret = CV_HAL_ERROR_NOT_IMPLEMENTED;
}
break;
case CV_8UC4:
switch (norm_type & ~NORM_RELATIVE)
{
case NORM_INF:
ret = normDiffInf_8UC4(src1, src1_step, src2, src2_step, mask, mask_step, width, height, result);
break;
case NORM_L1:
ret = normDiffL1_8UC4(src1, src1_step, src2, src2_step, mask, mask_step, width, height, result);
break;
case NORM_L2SQR:
ret = normDiffL2Sqr_8UC4(src1, src1_step, src2, src2_step, mask, mask_step, width, height, result);
break;
case NORM_L2:
ret = normDiffL2Sqr_8UC4(src1, src1_step, src2, src2_step, mask, mask_step, width, height, result);
*result = std::sqrt(*result);
break;
default:
ret = CV_HAL_ERROR_NOT_IMPLEMENTED;
}
break;
case CV_32FC1:
switch (norm_type & ~NORM_RELATIVE)
{
case NORM_INF:
ret = normDiffInf_32FC1(src1, src1_step, src2, src2_step, mask, mask_step, width, height, result);
break;
case NORM_L1:
ret = normDiffL1_32FC1(src1, src1_step, src2, src2_step, mask, mask_step, width, height, result);
break;
case NORM_L2SQR:
ret = normDiffL2Sqr_32FC1(src1, src1_step, src2, src2_step, mask, mask_step, width, height, result);
break;
case NORM_L2:
ret = normDiffL2Sqr_32FC1(src1, src1_step, src2, src2_step, mask, mask_step, width, height, result);
*result = std::sqrt(*result);
break;
default:
ret = CV_HAL_ERROR_NOT_IMPLEMENTED;
}
break;
default:
ret = CV_HAL_ERROR_NOT_IMPLEMENTED;
}
if(ret == CV_HAL_ERROR_OK && (norm_type & NORM_RELATIVE))
{
double result_;
ret = cv::cv_hal_rvv::norm(src2, src2_step, mask, mask_step, width, height, type, norm_type & ~NORM_RELATIVE, &result_);
if(ret == CV_HAL_ERROR_OK)
{
*result /= result_ + DBL_EPSILON;
}
}
return ret;
}
}}
#endif

93
3rdparty/hal_rvv/hal_rvv_1p0/split.hpp vendored Normal file
View File

@ -0,0 +1,93 @@
// This file is part of OpenCV project.
// It is subject to the license terms in the LICENSE file found in the top-level directory
// of this distribution and at http://opencv.org/license.html.
#ifndef OPENCV_HAL_RVV_SPLIT_HPP_INCLUDED
#define OPENCV_HAL_RVV_SPLIT_HPP_INCLUDED
#include <riscv_vector.h>
namespace cv { namespace cv_hal_rvv {
#undef cv_hal_split8u
#define cv_hal_split8u cv::cv_hal_rvv::split8u
inline int split8u(const uchar* src, uchar** dst, int len, int cn)
{
int vl = 0;
if (cn == 1)
{
uchar* dst0 = dst[0];
for (int i = 0; i < len; i += vl)
{
vl = __riscv_vsetvl_e8m8(len - i);
__riscv_vse8_v_u8m8(dst0 + i, __riscv_vle8_v_u8m8(src + i, vl), vl);
}
}
else if (cn == 2)
{
uchar *dst0 = dst[0], *dst1 = dst[1];
for (int i = 0; i < len; i += vl)
{
vl = __riscv_vsetvl_e8m4(len - i);
vuint8m4x2_t seg = __riscv_vlseg2e8_v_u8m4x2(src + i * cn, vl);
__riscv_vse8_v_u8m4(dst0 + i, __riscv_vget_v_u8m4x2_u8m4(seg, 0), vl);
__riscv_vse8_v_u8m4(dst1 + i, __riscv_vget_v_u8m4x2_u8m4(seg, 1), vl);
}
}
else if (cn == 3)
{
uchar *dst0 = dst[0], *dst1 = dst[1], *dst2 = dst[2];
for (int i = 0; i < len; i += vl)
{
vl = __riscv_vsetvl_e8m2(len - i);
vuint8m2x3_t seg = __riscv_vlseg3e8_v_u8m2x3(src + i * cn, vl);
__riscv_vse8_v_u8m2(dst0 + i, __riscv_vget_v_u8m2x3_u8m2(seg, 0), vl);
__riscv_vse8_v_u8m2(dst1 + i, __riscv_vget_v_u8m2x3_u8m2(seg, 1), vl);
__riscv_vse8_v_u8m2(dst2 + i, __riscv_vget_v_u8m2x3_u8m2(seg, 2), vl);
}
}
else if (cn == 4)
{
uchar *dst0 = dst[0], *dst1 = dst[1], *dst2 = dst[2], *dst3 = dst[3];
for (int i = 0; i < len; i += vl)
{
vl = __riscv_vsetvl_e8m2(len - i);
vuint8m2x4_t seg = __riscv_vlseg4e8_v_u8m2x4(src + i * cn, vl);
__riscv_vse8_v_u8m2(dst0 + i, __riscv_vget_v_u8m2x4_u8m2(seg, 0), vl);
__riscv_vse8_v_u8m2(dst1 + i, __riscv_vget_v_u8m2x4_u8m2(seg, 1), vl);
__riscv_vse8_v_u8m2(dst2 + i, __riscv_vget_v_u8m2x4_u8m2(seg, 2), vl);
__riscv_vse8_v_u8m2(dst3 + i, __riscv_vget_v_u8m2x4_u8m2(seg, 3), vl);
}
}
else
{
int k = 0;
for (; k <= cn - 4; k += 4)
{
uchar *dst0 = dst[k], *dst1 = dst[k + 1], *dst2 = dst[k + 2], *dst3 = dst[k + 3];
for (int i = 0; i < len; i += vl)
{
vl = __riscv_vsetvl_e8m2(len - i);
vuint8m2x4_t seg = __riscv_vlsseg4e8_v_u8m2x4(src + k + i * cn, cn, vl);
__riscv_vse8_v_u8m2(dst0 + i, __riscv_vget_v_u8m2x4_u8m2(seg, 0), vl);
__riscv_vse8_v_u8m2(dst1 + i, __riscv_vget_v_u8m2x4_u8m2(seg, 1), vl);
__riscv_vse8_v_u8m2(dst2 + i, __riscv_vget_v_u8m2x4_u8m2(seg, 2), vl);
__riscv_vse8_v_u8m2(dst3 + i, __riscv_vget_v_u8m2x4_u8m2(seg, 3), vl);
}
}
for (; k < cn; ++k)
{
uchar* dstK = dst[k];
for (int i = 0; i < len; i += vl)
{
vl = __riscv_vsetvl_e8m2(len - i);
vuint8m2_t seg = __riscv_vlse8_v_u8m2(src + k + i * cn, cn, vl);
__riscv_vse8_v_u8m2(dstK + i, seg, vl);
}
}
}
return CV_HAL_ERROR_OK;
}
}}
#endif

View File

@ -2,7 +2,7 @@ function(download_ippicv root_var)
set(${root_var} "" PARENT_SCOPE)
# Commit SHA in the opencv_3rdparty repo
set(IPPICV_COMMIT "7f55c0c26be418d494615afca15218566775c725")
set(IPPICV_COMMIT "d1cbea44d326eb0421fedcdd16de4630fd8c7ed0")
# Define actual ICV versions
if(APPLE)
set(IPPICV_COMMIT "0cc4aa06bf2bef4b05d237c69a5a96b9cd0cb85a")
@ -14,9 +14,10 @@ function(download_ippicv root_var)
set(OPENCV_ICV_PLATFORM "linux")
set(OPENCV_ICV_PACKAGE_SUBDIR "ippicv_lnx")
if(X86_64)
set(OPENCV_ICV_NAME "ippicv_2021.12.0_lnx_intel64_20240425_general.tgz")
set(OPENCV_ICV_HASH "d06e6d44ece88f7f17a6cd9216761186")
set(OPENCV_ICV_NAME "ippicv_2022.0.0_lnx_intel64_20240904_general.tgz")
set(OPENCV_ICV_HASH "63717ee0f918ad72fb5a737992a206d1")
else()
set(IPPICV_COMMIT "7f55c0c26be418d494615afca15218566775c725")
set(OPENCV_ICV_NAME "ippicv_2021.12.0_lnx_ia32_20240425_general.tgz")
set(OPENCV_ICV_HASH "85ffa2b9ed7802b93c23fa27b0097d36")
endif()
@ -24,9 +25,10 @@ function(download_ippicv root_var)
set(OPENCV_ICV_PLATFORM "windows")
set(OPENCV_ICV_PACKAGE_SUBDIR "ippicv_win")
if(X86_64)
set(OPENCV_ICV_NAME "ippicv_2021.12.0_win_intel64_20240425_general.zip")
set(OPENCV_ICV_HASH "402ff8c6b4986738fed71c44e1ce665d")
set(OPENCV_ICV_NAME "ippicv_2022.0.0_win_intel64_20240904_general.zip")
set(OPENCV_ICV_HASH "3a6eca7cc3bce7159eb1443c6fca4e31")
else()
set(IPPICV_COMMIT "7f55c0c26be418d494615afca15218566775c725")
set(OPENCV_ICV_NAME "ippicv_2021.12.0_win_ia32_20240425_general.zip")
set(OPENCV_ICV_HASH "8b1d2a23957d57624d0de8f2a5cae5f1")
endif()

View File

@ -24,7 +24,6 @@ set(ITT_PUBLIC_HDRS
include/ittnotify.h
include/jitprofiling.h
include/libittnotify.h
include/llvm_jit_event_listener.hpp
)
set(ITT_PRIVATE_HDRS
src/ittnotify/disable_warnings.h
@ -39,6 +38,11 @@ set(ITT_SRCS
add_library(${ITT_LIBRARY} STATIC ${OPENCV_3RDPARTY_EXCLUDE_FROM_ALL} ${ITT_SRCS} ${ITT_PUBLIC_HDRS} ${ITT_PRIVATE_HDRS})
file(STRINGS "src/ittnotify/ittnotify_config.h" API_VERSION_NUM REGEX "#define\[ \t]+API_VERSION_NUM[ \t]+([0-9\.]+)")
if(API_VERSION_NUM MATCHES "#define\[ \t]+API_VERSION_NUM[ \t]+([0-9\.]*)")
set(ITTNOTIFY_VERSION "${CMAKE_MATCH_1}" CACHE INTERNAL "" FORCE)
endif()
if(NOT WIN32)
if(HAVE_DL_LIBRARY)
target_link_libraries(${ITT_LIBRARY} dl)
@ -64,4 +68,4 @@ if(NOT BUILD_SHARED_LIBS)
ocv_install_target(${ITT_LIBRARY} EXPORT OpenCVModules ARCHIVE DESTINATION ${OPENCV_3P_LIB_INSTALL_PATH} COMPONENT dev OPTIONAL)
endif()
ocv_install_3rdparty_licenses(ittnotify src/ittnotify/LICENSE.BSD src/ittnotify/LICENSE.GPL)
ocv_install_3rdparty_licenses(ittnotify src/ittnotify/BSD-3-Clause.txt src/ittnotify/GPL-2.0-only.txt)

View File

@ -1,60 +1,8 @@
/* <copyright>
This file is provided under a dual BSD/GPLv2 license. When using or
redistributing this file, you may do so under either license.
/*
Copyright (C) 2005-2019 Intel Corporation
GPL LICENSE SUMMARY
Copyright (c) 2005-2014 Intel Corporation. All rights reserved.
This program is free software; you can redistribute it and/or modify
it under the terms of version 2 of the GNU General Public License as
published by the Free Software Foundation.
This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
The full GNU General Public License is included in this distribution
in the file called LICENSE.GPL.
Contact Information:
http://software.intel.com/en-us/articles/intel-vtune-amplifier-xe/
BSD LICENSE
Copyright (c) 2005-2014 Intel Corporation. All rights reserved.
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
* Neither the name of Intel Corporation nor the names of its
contributors may be used to endorse or promote products derived
from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
</copyright> */
SPDX-License-Identifier: GPL-2.0-only OR BSD-3-Clause
*/
#ifndef _ITTNOTIFY_H_
#define _ITTNOTIFY_H_
@ -63,7 +11,8 @@
@brief Public User API functions and types
@mainpage
The ITT API is used to annotate a user's program with additional information
The Instrumentation and Tracing Technology API (ITT API) is used to
annotate a user's program with additional information
that can be used by correctness and performance tools. The user inserts
calls in their program. Those calls generate information that is collected
at runtime, and used by Intel(R) Threading Tools.
@ -141,6 +90,10 @@ The same ID may not be reused for different instances, unless a previous
# define ITT_OS_FREEBSD 4
#endif /* ITT_OS_FREEBSD */
#ifndef ITT_OS_OPENBSD
# define ITT_OS_OPENBSD 5
#endif /* ITT_OS_OPENBSD */
#ifndef ITT_OS
# if defined WIN32 || defined _WIN32
# define ITT_OS ITT_OS_WIN
@ -148,6 +101,8 @@ The same ID may not be reused for different instances, unless a previous
# define ITT_OS ITT_OS_MAC
# elif defined( __FreeBSD__ )
# define ITT_OS ITT_OS_FREEBSD
# elif defined( __OpenBSD__)
# define ITT_OS ITT_OS_OPENBSD
# else
# define ITT_OS ITT_OS_LINUX
# endif
@ -169,6 +124,10 @@ The same ID may not be reused for different instances, unless a previous
# define ITT_PLATFORM_FREEBSD 4
#endif /* ITT_PLATFORM_FREEBSD */
#ifndef ITT_PLATFORM_OPENBSD
# define ITT_PLATFORM_OPENBSD 5
#endif /* ITT_PLATFORM_OPENBSD */
#ifndef ITT_PLATFORM
# if ITT_OS==ITT_OS_WIN
# define ITT_PLATFORM ITT_PLATFORM_WIN
@ -176,6 +135,8 @@ The same ID may not be reused for different instances, unless a previous
# define ITT_PLATFORM ITT_PLATFORM_MAC
# elif ITT_OS==ITT_OS_FREEBSD
# define ITT_PLATFORM ITT_PLATFORM_FREEBSD
# elif ITT_OS==ITT_OS_OPENBSD
# define ITT_PLATFORM ITT_PLATFORM_OPENBSD
# else
# define ITT_PLATFORM ITT_PLATFORM_POSIX
# endif
@ -228,7 +189,12 @@ The same ID may not be reused for different instances, unless a previous
#if ITT_PLATFORM==ITT_PLATFORM_WIN
/* use __forceinline (VC++ specific) */
#define ITT_INLINE __forceinline
#if defined(__MINGW32__) && !defined(__cplusplus)
#define ITT_INLINE static __inline__ __attribute__((__always_inline__,__gnu_inline__))
#else
#define ITT_INLINE static __forceinline
#endif /* __MINGW32__ */
#define ITT_INLINE_ATTRIBUTE /* nothing */
#else /* ITT_PLATFORM==ITT_PLATFORM_WIN */
/*
@ -289,20 +255,20 @@ The same ID may not be reused for different instances, unless a previous
#define ITTNOTIFY_VOID(n) (!ITTNOTIFY_NAME(n)) ? (void)0 : ITTNOTIFY_NAME(n)
#define ITTNOTIFY_DATA(n) (!ITTNOTIFY_NAME(n)) ? 0 : ITTNOTIFY_NAME(n)
#define ITTNOTIFY_VOID_D0(n,d) (!(d)->flags) ? (void)0 : (!ITTNOTIFY_NAME(n)) ? (void)0 : ITTNOTIFY_NAME(n)(d)
#define ITTNOTIFY_VOID_D1(n,d,x) (!(d)->flags) ? (void)0 : (!ITTNOTIFY_NAME(n)) ? (void)0 : ITTNOTIFY_NAME(n)(d,x)
#define ITTNOTIFY_VOID_D2(n,d,x,y) (!(d)->flags) ? (void)0 : (!ITTNOTIFY_NAME(n)) ? (void)0 : ITTNOTIFY_NAME(n)(d,x,y)
#define ITTNOTIFY_VOID_D3(n,d,x,y,z) (!(d)->flags) ? (void)0 : (!ITTNOTIFY_NAME(n)) ? (void)0 : ITTNOTIFY_NAME(n)(d,x,y,z)
#define ITTNOTIFY_VOID_D4(n,d,x,y,z,a) (!(d)->flags) ? (void)0 : (!ITTNOTIFY_NAME(n)) ? (void)0 : ITTNOTIFY_NAME(n)(d,x,y,z,a)
#define ITTNOTIFY_VOID_D5(n,d,x,y,z,a,b) (!(d)->flags) ? (void)0 : (!ITTNOTIFY_NAME(n)) ? (void)0 : ITTNOTIFY_NAME(n)(d,x,y,z,a,b)
#define ITTNOTIFY_VOID_D6(n,d,x,y,z,a,b,c) (!(d)->flags) ? (void)0 : (!ITTNOTIFY_NAME(n)) ? (void)0 : ITTNOTIFY_NAME(n)(d,x,y,z,a,b,c)
#define ITTNOTIFY_DATA_D0(n,d) (!(d)->flags) ? 0 : (!ITTNOTIFY_NAME(n)) ? 0 : ITTNOTIFY_NAME(n)(d)
#define ITTNOTIFY_DATA_D1(n,d,x) (!(d)->flags) ? 0 : (!ITTNOTIFY_NAME(n)) ? 0 : ITTNOTIFY_NAME(n)(d,x)
#define ITTNOTIFY_DATA_D2(n,d,x,y) (!(d)->flags) ? 0 : (!ITTNOTIFY_NAME(n)) ? 0 : ITTNOTIFY_NAME(n)(d,x,y)
#define ITTNOTIFY_DATA_D3(n,d,x,y,z) (!(d)->flags) ? 0 : (!ITTNOTIFY_NAME(n)) ? 0 : ITTNOTIFY_NAME(n)(d,x,y,z)
#define ITTNOTIFY_DATA_D4(n,d,x,y,z,a) (!(d)->flags) ? 0 : (!ITTNOTIFY_NAME(n)) ? 0 : ITTNOTIFY_NAME(n)(d,x,y,z,a)
#define ITTNOTIFY_DATA_D5(n,d,x,y,z,a,b) (!(d)->flags) ? 0 : (!ITTNOTIFY_NAME(n)) ? 0 : ITTNOTIFY_NAME(n)(d,x,y,z,a,b)
#define ITTNOTIFY_DATA_D6(n,d,x,y,z,a,b,c) (!(d)->flags) ? 0 : (!ITTNOTIFY_NAME(n)) ? 0 : ITTNOTIFY_NAME(n)(d,x,y,z,a,b,c)
#define ITTNOTIFY_VOID_D0(n,d) (d == NULL) ? (void)0 : (!(d)->flags) ? (void)0 : (!ITTNOTIFY_NAME(n)) ? (void)0 : ITTNOTIFY_NAME(n)(d)
#define ITTNOTIFY_VOID_D1(n,d,x) (d == NULL) ? (void)0 : (!(d)->flags) ? (void)0 : (!ITTNOTIFY_NAME(n)) ? (void)0 : ITTNOTIFY_NAME(n)(d,x)
#define ITTNOTIFY_VOID_D2(n,d,x,y) (d == NULL) ? (void)0 : (!(d)->flags) ? (void)0 : (!ITTNOTIFY_NAME(n)) ? (void)0 : ITTNOTIFY_NAME(n)(d,x,y)
#define ITTNOTIFY_VOID_D3(n,d,x,y,z) (d == NULL) ? (void)0 : (!(d)->flags) ? (void)0 : (!ITTNOTIFY_NAME(n)) ? (void)0 : ITTNOTIFY_NAME(n)(d,x,y,z)
#define ITTNOTIFY_VOID_D4(n,d,x,y,z,a) (d == NULL) ? (void)0 : (!(d)->flags) ? (void)0 : (!ITTNOTIFY_NAME(n)) ? (void)0 : ITTNOTIFY_NAME(n)(d,x,y,z,a)
#define ITTNOTIFY_VOID_D5(n,d,x,y,z,a,b) (d == NULL) ? (void)0 : (!(d)->flags) ? (void)0 : (!ITTNOTIFY_NAME(n)) ? (void)0 : ITTNOTIFY_NAME(n)(d,x,y,z,a,b)
#define ITTNOTIFY_VOID_D6(n,d,x,y,z,a,b,c) (d == NULL) ? (void)0 : (!(d)->flags) ? (void)0 : (!ITTNOTIFY_NAME(n)) ? (void)0 : ITTNOTIFY_NAME(n)(d,x,y,z,a,b,c)
#define ITTNOTIFY_DATA_D0(n,d) (d == NULL) ? 0 : (!(d)->flags) ? 0 : (!ITTNOTIFY_NAME(n)) ? 0 : ITTNOTIFY_NAME(n)(d)
#define ITTNOTIFY_DATA_D1(n,d,x) (d == NULL) ? 0 : (!(d)->flags) ? 0 : (!ITTNOTIFY_NAME(n)) ? 0 : ITTNOTIFY_NAME(n)(d,x)
#define ITTNOTIFY_DATA_D2(n,d,x,y) (d == NULL) ? 0 : (!(d)->flags) ? 0 : (!ITTNOTIFY_NAME(n)) ? 0 : ITTNOTIFY_NAME(n)(d,x,y)
#define ITTNOTIFY_DATA_D3(n,d,x,y,z) (d == NULL) ? 0 : (!(d)->flags) ? 0 : (!ITTNOTIFY_NAME(n)) ? 0 : ITTNOTIFY_NAME(n)(d,x,y,z)
#define ITTNOTIFY_DATA_D4(n,d,x,y,z,a) (d == NULL) ? 0 : (!(d)->flags) ? 0 : (!ITTNOTIFY_NAME(n)) ? 0 : ITTNOTIFY_NAME(n)(d,x,y,z,a)
#define ITTNOTIFY_DATA_D5(n,d,x,y,z,a,b) (d == NULL) ? 0 : (!(d)->flags) ? 0 : (!ITTNOTIFY_NAME(n)) ? 0 : ITTNOTIFY_NAME(n)(d,x,y,z,a,b)
#define ITTNOTIFY_DATA_D6(n,d,x,y,z,a,b,c) (d == NULL) ? 0 : (!(d)->flags) ? 0 : (!ITTNOTIFY_NAME(n)) ? 0 : ITTNOTIFY_NAME(n)(d,x,y,z,a,b,c)
#ifdef ITT_STUB
#undef ITT_STUB
@ -340,7 +306,7 @@ extern "C" {
* only pauses tracing and analyzing memory access.
* It does not pause tracing or analyzing threading APIs.
* .
* - Intel(R) Parallel Amplifier and Intel(R) VTune(TM) Amplifier XE:
* - Intel(R) VTune(TM) Profiler:
* - Does continue to record when new threads are started.
* .
* - Other effects:
@ -355,35 +321,143 @@ void ITTAPI __itt_resume(void);
/** @brief Detach collection */
void ITTAPI __itt_detach(void);
/**
* @enum __itt_collection_scope
* @brief Enumerator for collection scopes
*/
typedef enum {
__itt_collection_scope_host = 1 << 0,
__itt_collection_scope_offload = 1 << 1,
__itt_collection_scope_all = 0x7FFFFFFF
} __itt_collection_scope;
/** @brief Pause scoped collection */
void ITTAPI __itt_pause_scoped(__itt_collection_scope);
/** @brief Resume scoped collection */
void ITTAPI __itt_resume_scoped(__itt_collection_scope);
/** @cond exclude_from_documentation */
#ifndef INTEL_NO_MACRO_BODY
#ifndef INTEL_NO_ITTNOTIFY_API
ITT_STUBV(ITTAPI, void, pause, (void))
ITT_STUBV(ITTAPI, void, resume, (void))
ITT_STUBV(ITTAPI, void, detach, (void))
#define __itt_pause ITTNOTIFY_VOID(pause)
#define __itt_pause_ptr ITTNOTIFY_NAME(pause)
#define __itt_resume ITTNOTIFY_VOID(resume)
#define __itt_resume_ptr ITTNOTIFY_NAME(resume)
#define __itt_detach ITTNOTIFY_VOID(detach)
#define __itt_detach_ptr ITTNOTIFY_NAME(detach)
ITT_STUBV(ITTAPI, void, pause, (void))
ITT_STUBV(ITTAPI, void, pause_scoped, (__itt_collection_scope))
ITT_STUBV(ITTAPI, void, resume, (void))
ITT_STUBV(ITTAPI, void, resume_scoped, (__itt_collection_scope))
ITT_STUBV(ITTAPI, void, detach, (void))
#define __itt_pause ITTNOTIFY_VOID(pause)
#define __itt_pause_ptr ITTNOTIFY_NAME(pause)
#define __itt_pause_scoped ITTNOTIFY_VOID(pause_scoped)
#define __itt_pause_scoped_ptr ITTNOTIFY_NAME(pause_scoped)
#define __itt_resume ITTNOTIFY_VOID(resume)
#define __itt_resume_ptr ITTNOTIFY_NAME(resume)
#define __itt_resume_scoped ITTNOTIFY_VOID(resume_scoped)
#define __itt_resume_scoped_ptr ITTNOTIFY_NAME(resume_scoped)
#define __itt_detach ITTNOTIFY_VOID(detach)
#define __itt_detach_ptr ITTNOTIFY_NAME(detach)
#else /* INTEL_NO_ITTNOTIFY_API */
#define __itt_pause()
#define __itt_pause_ptr 0
#define __itt_pause_ptr 0
#define __itt_pause_scoped(scope)
#define __itt_pause_scoped_ptr 0
#define __itt_resume()
#define __itt_resume_ptr 0
#define __itt_resume_ptr 0
#define __itt_resume_scoped(scope)
#define __itt_resume_scoped_ptr 0
#define __itt_detach()
#define __itt_detach_ptr 0
#define __itt_detach_ptr 0
#endif /* INTEL_NO_ITTNOTIFY_API */
#else /* INTEL_NO_MACRO_BODY */
#define __itt_pause_ptr 0
#define __itt_resume_ptr 0
#define __itt_detach_ptr 0
#define __itt_pause_ptr 0
#define __itt_pause_scoped_ptr 0
#define __itt_resume_ptr 0
#define __itt_resume_scoped_ptr 0
#define __itt_detach_ptr 0
#endif /* INTEL_NO_MACRO_BODY */
/** @endcond */
/** @} control group */
/** @endcond */
/**
* @defgroup Intel Processor Trace control
* API from this group provides control over collection and analysis of Intel Processor Trace (Intel PT) data
* Information about Intel Processor Trace technology can be found here (Volume 3 chapter 35):
* https://software.intel.com/sites/default/files/managed/39/c5/325462-sdm-vol-1-2abcd-3abcd.pdf
* Use this API to mark particular code regions for loading detailed performance statistics.
* This mode makes your analysis faster and more accurate.
* @{
*/
typedef unsigned char __itt_pt_region;
/**
* @brief function saves a region name marked with Intel PT API and returns a region id.
* Only 7 names can be registered. Attempts to register more names will be ignored and a region id with auto names will be returned.
* For automatic naming of regions pass NULL as function parameter
*/
#if ITT_PLATFORM==ITT_PLATFORM_WIN
__itt_pt_region ITTAPI __itt_pt_region_createA(const char *name);
__itt_pt_region ITTAPI __itt_pt_region_createW(const wchar_t *name);
#if defined(UNICODE) || defined(_UNICODE)
# define __itt_pt_region_create __itt_pt_region_createW
#else /* UNICODE */
# define __itt_pt_region_create __itt_pt_region_createA
#endif /* UNICODE */
#else /* ITT_PLATFORM==ITT_PLATFORM_WIN */
__itt_pt_region ITTAPI __itt_pt_region_create(const char *name);
#endif /* ITT_PLATFORM==ITT_PLATFORM_WIN */
/** @cond exclude_from_documentation */
#ifndef INTEL_NO_MACRO_BODY
#ifndef INTEL_NO_ITTNOTIFY_API
#if ITT_PLATFORM==ITT_PLATFORM_WIN
ITT_STUB(ITTAPI, __itt_pt_region, pt_region_createA, (const char *name))
ITT_STUB(ITTAPI, __itt_pt_region, pt_region_createW, (const wchar_t *name))
#else /* ITT_PLATFORM==ITT_PLATFORM_WIN */
ITT_STUB(ITTAPI, __itt_pt_region, pt_region_create, (const char *name))
#endif /* ITT_PLATFORM==ITT_PLATFORM_WIN */
#if ITT_PLATFORM==ITT_PLATFORM_WIN
#define __itt_pt_region_createA ITTNOTIFY_DATA(pt_region_createA)
#define __itt_pt_region_createA_ptr ITTNOTIFY_NAME(pt_region_createA)
#define __itt_pt_region_createW ITTNOTIFY_DATA(pt_region_createW)
#define __itt_pt_region_createW_ptr ITTNOTIFY_NAME(pt_region_createW)
#else /* ITT_PLATFORM==ITT_PLATFORM_WIN */
#define __itt_pt_region_create ITTNOTIFY_DATA(pt_region_create)
#define __itt_pt_region_create_ptr ITTNOTIFY_NAME(pt_region_create)
#endif /* ITT_PLATFORM==ITT_PLATFORM_WIN */
#else /* INTEL_NO_ITTNOTIFY_API */
#if ITT_PLATFORM==ITT_PLATFORM_WIN
#define __itt_pt_region_createA(name) (__itt_pt_region)0
#define __itt_pt_region_createA_ptr 0
#define __itt_pt_region_createW(name) (__itt_pt_region)0
#define __itt_pt_region_createW_ptr 0
#else /* ITT_PLATFORM==ITT_PLATFORM_WIN */
#define __itt_pt_region_create(name) (__itt_pt_region)0
#define __itt_pt_region_create_ptr 0
#endif /* ITT_PLATFORM==ITT_PLATFORM_WIN */
#endif /* INTEL_NO_ITTNOTIFY_API */
#else /* INTEL_NO_MACRO_BODY */
#if ITT_PLATFORM==ITT_PLATFORM_WIN
#define __itt_pt_region_createA_ptr 0
#define __itt_pt_region_createW_ptr 0
#else /* ITT_PLATFORM==ITT_PLATFORM_WIN */
#define __itt_pt_region_create_ptr 0
#endif /* ITT_PLATFORM==ITT_PLATFORM_WIN */
#endif /* INTEL_NO_MACRO_BODY */
/** @endcond */
/**
* @brief function contains a special code pattern identified on the post-processing stage and
* marks the beginning of a code region targeted for Intel PT analysis
* @param[in] region - region id, 0 <= region < 8
*/
void __itt_mark_pt_region_begin(__itt_pt_region region);
/**
* @brief function contains a special code pattern identified on the post-processing stage and
* marks the end of a code region targeted for Intel PT analysis
* @param[in] region - region id, 0 <= region < 8
*/
void __itt_mark_pt_region_end(__itt_pt_region region);
/** @} Intel PT control group*/
/**
* @defgroup threads Threads
* @ingroup public
@ -541,14 +615,26 @@ ITT_STUBV(ITTAPI, void, suppress_pop, (void))
/** @endcond */
/**
* @enum __itt_model_disable
* @brief Enumerator for the disable methods
* @enum __itt_suppress_mode
* @brief Enumerator for the suppressing modes
*/
typedef enum __itt_suppress_mode {
__itt_unsuppress_range,
__itt_suppress_range
} __itt_suppress_mode_t;
/**
* @enum __itt_collection_state
* @brief Enumerator for collection state.
*/
typedef enum {
__itt_collection_uninitialized = 0, /* uninitialized */
__itt_collection_init_fail = 1, /* failed to init */
__itt_collection_collector_absent = 2, /* non work state collector is absent */
__itt_collection_collector_exists = 3, /* work state collector exists */
__itt_collection_init_successful = 4 /* success to init */
} __itt_collection_state;
/**
* @brief Mark a range of memory for error suppression or unsuppression for error types included in mask
*/
@ -1496,7 +1582,7 @@ ITT_STUBV(ITTAPI, void, heap_allocate_end, (__itt_heap_function h, void** addr,
/** @endcond */
/**
* @brief Record an free begin occurrence.
* @brief Record a free begin occurrence.
*/
void ITTAPI __itt_heap_free_begin(__itt_heap_function h, void* addr);
@ -1516,7 +1602,7 @@ ITT_STUBV(ITTAPI, void, heap_free_begin, (__itt_heap_function h, void* addr))
/** @endcond */
/**
* @brief Record an free end occurrence.
* @brief Record a free end occurrence.
*/
void ITTAPI __itt_heap_free_end(__itt_heap_function h, void* addr);
@ -1536,7 +1622,7 @@ ITT_STUBV(ITTAPI, void, heap_free_end, (__itt_heap_function h, void* addr))
/** @endcond */
/**
* @brief Record an reallocation begin occurrence.
* @brief Record a reallocation begin occurrence.
*/
void ITTAPI __itt_heap_reallocate_begin(__itt_heap_function h, void* addr, size_t new_size, int initialized);
@ -1556,7 +1642,7 @@ ITT_STUBV(ITTAPI, void, heap_reallocate_begin, (__itt_heap_function h, void* add
/** @endcond */
/**
* @brief Record an reallocation end occurrence.
* @brief Record a reallocation end occurrence.
*/
void ITTAPI __itt_heap_reallocate_end(__itt_heap_function h, void* addr, void** new_addr, size_t new_size, int initialized);
@ -2692,7 +2778,7 @@ ITT_STUB(ITTAPI, __itt_clock_domain*, clock_domain_create, (__itt_get_clock_info
/**
* @ingroup clockdomains
* @brief Recalculate clock domains frequences and clock base timestamps.
* @brief Recalculate clock domains frequencies and clock base timestamps.
*/
void ITTAPI __itt_clock_domain_reset(void);
@ -3597,11 +3683,12 @@ ITT_STUBV(ITTAPI, void, enable_attach, (void))
/** @endcond */
/**
* @brief Module load info
* This API is used to report necessary information in case of module relocation
* @param[in] start_addr - relocated module start address
* @param[in] end_addr - relocated module end address
* @param[in] path - file system path to the module
* @brief Module load notification
* This API is used to report necessary information in case of bypassing default system loader.
* Notification should be done immidiatelly after this module is loaded to process memory.
* @param[in] start_addr - module start address
* @param[in] end_addr - module end address
* @param[in] path - file system full path to the module
*/
#if ITT_PLATFORM==ITT_PLATFORM_WIN
void ITTAPI __itt_module_loadA(void *start_addr, void *end_addr, const char *path);
@ -3656,7 +3743,462 @@ ITT_STUB(ITTAPI, void, module_load, (void *start_addr, void *end_addr, const ch
#endif /* INTEL_NO_MACRO_BODY */
/** @endcond */
/**
* @brief Report module unload
* This API is used to report necessary information in case of bypassing default system loader.
* Notification should be done just before the module is unloaded from process memory.
* @param[in] addr - base address of loaded module
*/
void ITTAPI __itt_module_unload(void *addr);
/** @cond exclude_from_documentation */
#ifndef INTEL_NO_MACRO_BODY
#ifndef INTEL_NO_ITTNOTIFY_API
ITT_STUBV(ITTAPI, void, module_unload, (void *addr))
#define __itt_module_unload ITTNOTIFY_VOID(module_unload)
#define __itt_module_unload_ptr ITTNOTIFY_NAME(module_unload)
#else /* INTEL_NO_ITTNOTIFY_API */
#define __itt_module_unload(addr)
#define __itt_module_unload_ptr 0
#endif /* INTEL_NO_ITTNOTIFY_API */
#else /* INTEL_NO_MACRO_BODY */
#define __itt_module_unload_ptr 0
#endif /* INTEL_NO_MACRO_BODY */
/** @endcond */
/** @cond exclude_from_documentation */
typedef enum
{
__itt_module_type_unknown = 0,
__itt_module_type_elf,
__itt_module_type_coff
} __itt_module_type;
/** @endcond */
/** @cond exclude_from_documentation */
typedef enum
{
itt_section_type_unknown,
itt_section_type_bss, /* notifies that the section contains uninitialized data. These are the relevant section types and the modules that contain them:
* ELF module: SHT_NOBITS section type
* COFF module: IMAGE_SCN_CNT_UNINITIALIZED_DATA section type
*/
itt_section_type_data, /* notifies that section contains initialized data. These are the relevant section types and the modules that contain them:
* ELF module: SHT_PROGBITS section type
* COFF module: IMAGE_SCN_CNT_INITIALIZED_DATA section type
*/
itt_section_type_text /* notifies that the section contains executable code. These are the relevant section types and the modules that contain them:
* ELF module: SHT_PROGBITS section type
* COFF module: IMAGE_SCN_CNT_CODE section type
*/
} __itt_section_type;
/** @endcond */
/**
* @hideinitializer
* @brief bit-mask, detects a section attribute that indicates whether a section can be executed as code:
* These are the relevant section attributes and the modules that contain them:
* ELF module: PF_X section attribute
* COFF module: IMAGE_SCN_MEM_EXECUTE attribute
*/
#define __itt_section_exec 0x20000000
/**
* @hideinitializer
* @brief bit-mask, detects a section attribute that indicates whether a section can be read.
* These are the relevant section attributes and the modules that contain them:
* ELF module: PF_R attribute
* COFF module: IMAGE_SCN_MEM_READ attribute
*/
#define __itt_section_read 0x40000000
/**
* @hideinitializer
* @brief bit-mask, detects a section attribute that indicates whether a section can be written to.
* These are the relevant section attributes and the modules that contain them:
* ELF module: PF_W attribute
* COFF module: IMAGE_SCN_MEM_WRITE attribute
*/
#define __itt_section_write 0x80000000
/** @cond exclude_from_documentation */
#pragma pack(push, 8)
typedef struct ___itt_section_info
{
const char* name; /*!< Section name in UTF8 */
__itt_section_type type; /*!< Section content and semantics description */
size_t flags; /*!< Section bit flags that describe attributes using bit mask
* Zero if disabled, non-zero if enabled
*/
void* start_addr; /*!< Section load(relocated) start address */
size_t size; /*!< Section file offset */
size_t file_offset; /*!< Section size */
} __itt_section_info;
#pragma pack(pop)
/** @endcond */
/** @cond exclude_from_documentation */
#pragma pack(push, 8)
typedef struct ___itt_module_object
{
unsigned int version; /*!< API version*/
__itt_id module_id; /*!< Unique identifier. This is unchanged for sections that belong to the same module */
__itt_module_type module_type; /*!< Binary module format */
const char* module_name; /*!< Unique module name or path to module in UTF8
* Contains module name when module_bufer and module_size exist
* Contains module path when module_bufer and module_size absent
* module_name remains the same for the certain module_id
*/
void* module_buffer; /*!< Module buffer content */
size_t module_size; /*!< Module buffer size */
/*!< If module_buffer and module_size exist, the binary module is dumped onto the system.
* If module_buffer and module_size do not exist,
* the binary module exists on the system already.
* The module_name parameter contains the path to the module.
*/
__itt_section_info* section_array; /*!< Reference to section information */
size_t section_number;
} __itt_module_object;
#pragma pack(pop)
/** @endcond */
/**
* @brief Load module content and its loaded(relocated) sections.
* This API is useful to save a module, or specify its location on the system and report information about loaded sections.
* The target module is saved on the system if module buffer content and size are available.
* If module buffer content and size are unavailable, the module name contains the path to the existing binary module.
* @param[in] module_obj - provides module and section information, along with unique module identifiers (name,module ID)
* which bind the binary module to particular sections.
*/
void ITTAPI __itt_module_load_with_sections(__itt_module_object* module_obj);
/** @cond exclude_from_documentation */
#ifndef INTEL_NO_MACRO_BODY
#ifndef INTEL_NO_ITTNOTIFY_API
ITT_STUBV(ITTAPI, void, module_load_with_sections, (__itt_module_object* module_obj))
#define __itt_module_load_with_sections ITTNOTIFY_VOID(module_load_with_sections)
#define __itt_module_load_with_sections_ptr ITTNOTIFY_NAME(module_load_with_sections)
#else /* INTEL_NO_ITTNOTIFY_API */
#define __itt_module_load_with_sections(module_obj)
#define __itt_module_load_with_sections_ptr 0
#endif /* INTEL_NO_ITTNOTIFY_API */
#else /* INTEL_NO_MACRO_BODY */
#define __itt_module_load_with_sections_ptr 0
#endif /* INTEL_NO_MACRO_BODY */
/** @endcond */
/**
* @brief Unload a module and its loaded(relocated) sections.
* This API notifies that the module and its sections were unloaded.
* @param[in] module_obj - provides module and sections information, along with unique module identifiers (name,module ID)
* which bind the binary module to particular sections.
*/
void ITTAPI __itt_module_unload_with_sections(__itt_module_object* module_obj);
/** @cond exclude_from_documentation */
#ifndef INTEL_NO_MACRO_BODY
#ifndef INTEL_NO_ITTNOTIFY_API
ITT_STUBV(ITTAPI, void, module_unload_with_sections, (__itt_module_object* module_obj))
#define __itt_module_unload_with_sections ITTNOTIFY_VOID(module_unload_with_sections)
#define __itt_module_unload_with_sections_ptr ITTNOTIFY_NAME(module_unload_with_sections)
#else /* INTEL_NO_ITTNOTIFY_API */
#define __itt_module_unload_with_sections(module_obj)
#define __itt_module_unload_with_sections_ptr 0
#endif /* INTEL_NO_ITTNOTIFY_API */
#else /* INTEL_NO_MACRO_BODY */
#define __itt_module_unload_with_sections_ptr 0
#endif /* INTEL_NO_MACRO_BODY */
/** @endcond */
/** @cond exclude_from_documentation */
#pragma pack(push, 8)
typedef struct ___itt_histogram
{
const __itt_domain* domain; /*!< Domain of the histogram*/
const char* nameA; /*!< Name of the histogram */
#if defined(UNICODE) || defined(_UNICODE)
const wchar_t* nameW;
#else /* UNICODE || _UNICODE */
void* nameW;
#endif /* UNICODE || _UNICODE */
__itt_metadata_type x_type; /*!< Type of the histogram X axis */
__itt_metadata_type y_type; /*!< Type of the histogram Y axis */
int extra1; /*!< Reserved to the runtime */
void* extra2; /*!< Reserved to the runtime */
struct ___itt_histogram* next;
} __itt_histogram;
#pragma pack(pop)
/** @endcond */
/**
* @brief Create a typed histogram instance with given name/domain.
* @param[in] domain The domain controlling the call.
* @param[in] name The name of the histogram.
* @param[in] x_type The type of the X axis in histogram (may be 0 to calculate batch statistics).
* @param[in] y_type The type of the Y axis in histogram.
*/
#if ITT_PLATFORM==ITT_PLATFORM_WIN
__itt_histogram* ITTAPI __itt_histogram_createA(const __itt_domain* domain, const char* name, __itt_metadata_type x_type, __itt_metadata_type y_type);
__itt_histogram* ITTAPI __itt_histogram_createW(const __itt_domain* domain, const wchar_t* name, __itt_metadata_type x_type, __itt_metadata_type y_type);
#if defined(UNICODE) || defined(_UNICODE)
# define __itt_histogram_create __itt_histogram_createW
# define __itt_histogram_create_ptr __itt_histogram_createW_ptr
#else /* UNICODE */
# define __itt_histogram_create __itt_histogram_createA
# define __itt_histogram_create_ptr __itt_histogram_createA_ptr
#endif /* UNICODE */
#else /* ITT_PLATFORM==ITT_PLATFORM_WIN */
__itt_histogram* ITTAPI __itt_histogram_create(const __itt_domain* domain, const char* name, __itt_metadata_type x_type, __itt_metadata_type y_type);
#endif /* ITT_PLATFORM==ITT_PLATFORM_WIN */
/** @cond exclude_from_documentation */
#ifndef INTEL_NO_MACRO_BODY
#ifndef INTEL_NO_ITTNOTIFY_API
#if ITT_PLATFORM==ITT_PLATFORM_WIN
ITT_STUB(ITTAPI, __itt_histogram*, histogram_createA, (const __itt_domain* domain, const char* name, __itt_metadata_type x_type, __itt_metadata_type y_type))
ITT_STUB(ITTAPI, __itt_histogram*, histogram_createW, (const __itt_domain* domain, const wchar_t* name, __itt_metadata_type x_type, __itt_metadata_type y_type))
#else /* ITT_PLATFORM==ITT_PLATFORM_WIN */
ITT_STUB(ITTAPI, __itt_histogram*, histogram_create, (const __itt_domain* domain, const char* name, __itt_metadata_type x_type, __itt_metadata_type y_type))
#endif /* ITT_PLATFORM==ITT_PLATFORM_WIN */
#if ITT_PLATFORM==ITT_PLATFORM_WIN
#define __itt_histogram_createA ITTNOTIFY_DATA(histogram_createA)
#define __itt_histogram_createA_ptr ITTNOTIFY_NAME(histogram_createA)
#define __itt_histogram_createW ITTNOTIFY_DATA(histogram_createW)
#define __itt_histogram_createW_ptr ITTNOTIFY_NAME(histogram_createW)
#else /* ITT_PLATFORM==ITT_PLATFORM_WIN */
#define __itt_histogram_create ITTNOTIFY_DATA(histogram_create)
#define __itt_histogram_create_ptr ITTNOTIFY_NAME(histogram_create)
#endif /* ITT_PLATFORM==ITT_PLATFORM_WIN */
#else /* INTEL_NO_ITTNOTIFY_API */
#if ITT_PLATFORM==ITT_PLATFORM_WIN
#define __itt_histogram_createA(domain, name, x_type, y_type) (__itt_histogram*)0
#define __itt_histogram_createA_ptr 0
#define __itt_histogram_createW(domain, name, x_type, y_type) (__itt_histogram*)0
#define __itt_histogram_createW_ptr 0
#else /* ITT_PLATFORM==ITT_PLATFORM_WIN */
#define __itt_histogram_create(domain, name, x_type, y_type) (__itt_histogram*)0
#define __itt_histogram_create_ptr 0
#endif /* ITT_PLATFORM==ITT_PLATFORM_WIN */
#endif /* INTEL_NO_ITTNOTIFY_API */
#else /* INTEL_NO_MACRO_BODY */
#if ITT_PLATFORM==ITT_PLATFORM_WIN
#define __itt_histogram_createA_ptr 0
#define __itt_histogram_createW_ptr 0
#else /* ITT_PLATFORM==ITT_PLATFORM_WIN */
#define __itt_histogram_create_ptr 0
#endif /* ITT_PLATFORM==ITT_PLATFORM_WIN */
#endif /* INTEL_NO_MACRO_BODY */
/** @endcond */
/**
* @brief Submit statistics for a histogram instance.
* @param[in] hist Pointer to the histogram instance to which the histogram statistic is to be dumped.
* @param[in] length The number of elements in dumped axis data array.
* @param[in] x_data The X axis dumped data itself (may be NULL to calculate batch statistics).
* @param[in] y_data The Y axis dumped data itself.
*/
void ITTAPI __itt_histogram_submit(__itt_histogram* hist, size_t length, void* x_data, void* y_data);
/** @cond exclude_from_documentation */
#ifndef INTEL_NO_MACRO_BODY
#ifndef INTEL_NO_ITTNOTIFY_API
ITT_STUBV(ITTAPI, void, histogram_submit, (__itt_histogram* hist, size_t length, void* x_data, void* y_data))
#define __itt_histogram_submit ITTNOTIFY_VOID(histogram_submit)
#define __itt_histogram_submit_ptr ITTNOTIFY_NAME(histogram_submit)
#else /* INTEL_NO_ITTNOTIFY_API */
#define __itt_histogram_submit(hist, length, x_data, y_data)
#define __itt_histogram_submit_ptr 0
#endif /* INTEL_NO_ITTNOTIFY_API */
#else /* INTEL_NO_MACRO_BODY */
#define __itt_histogram_submit_ptr 0
#endif /* INTEL_NO_MACRO_BODY */
/**
* @brief function allows to obtain the current collection state at the moment
* @return collection state as a enum __itt_collection_state
*/
__itt_collection_state __itt_get_collection_state(void);
/**
* @brief function releases resources allocated by ITT API static part
* this API should be called from the library destructor
* @return void
*/
void __itt_release_resources(void);
/** @endcond */
/**
* @brief Create a typed counter with given domain pointer, string name and counter type
*/
#if ITT_PLATFORM==ITT_PLATFORM_WIN
__itt_counter ITTAPI __itt_counter_createA_v3(const __itt_domain* domain, const char* name, __itt_metadata_type type);
__itt_counter ITTAPI __itt_counter_createW_v3(const __itt_domain* domain, const wchar_t* name, __itt_metadata_type type);
#if defined(UNICODE) || defined(_UNICODE)
# define __itt_counter_create_v3 __itt_counter_createW_v3
# define __itt_counter_create_v3_ptr __itt_counter_createW_v3_ptr
#else /* UNICODE */
# define __itt_counter_create_v3 __itt_counter_createA_v3
# define __itt_counter_create_v3_ptr __itt_counter_createA_v3_ptr
#endif /* UNICODE */
#else /* ITT_PLATFORM==ITT_PLATFORM_WIN */
__itt_counter ITTAPI __itt_counter_create_v3(const __itt_domain* domain, const char* name, __itt_metadata_type type);
#endif /* ITT_PLATFORM==ITT_PLATFORM_WIN */
#ifndef INTEL_NO_MACRO_BODY
#ifndef INTEL_NO_ITTNOTIFY_API
#if ITT_PLATFORM==ITT_PLATFORM_WIN
ITT_STUB(ITTAPI, __itt_counter, counter_createA_v3, (const __itt_domain* domain, const char* name, __itt_metadata_type type))
ITT_STUB(ITTAPI, __itt_counter, counter_createW_v3, (const __itt_domain* domain, const wchar_t* name, __itt_metadata_type type))
#else /* ITT_PLATFORM==ITT_PLATFORM_WIN */
ITT_STUB(ITTAPI, __itt_counter, counter_create_v3, (const __itt_domain* domain, const char* name, __itt_metadata_type type))
#endif /* ITT_PLATFORM==ITT_PLATFORM_WIN */
#if ITT_PLATFORM==ITT_PLATFORM_WIN
#define __itt_counter_createA_v3 ITTNOTIFY_DATA(counter_createA_v3)
#define __itt_counter_createA_v3_ptr ITTNOTIFY_NAME(counter_createA_v3)
#define __itt_counter_createW_v3 ITTNOTIFY_DATA(counter_createW_v3)
#define __itt_counter_createW_v3_ptr ITTNOTIFY_NAME(counter_createW_v3)
#else /* ITT_PLATFORM==ITT_PLATFORM_WIN */
#define __itt_counter_create_v3 ITTNOTIFY_DATA(counter_create_v3)
#define __itt_counter_create_v3_ptr ITTNOTIFY_NAME(counter_create_v3)
#endif /* ITT_PLATFORM==ITT_PLATFORM_WIN */
#else /* INTEL_NO_ITTNOTIFY_API */
#if ITT_PLATFORM==ITT_PLATFORM_WIN
#define __itt_counter_createA_v3(domain, name, type) (__itt_counter)0
#define __itt_counter_createA_v3_ptr 0
#define __itt_counter_createW_v3(domain, name, type) (__itt_counter)0
#define __itt_counter_create_typedW_ptr 0
#else /* ITT_PLATFORM==ITT_PLATFORM_WIN */
#define __itt_counter_create_v3(domain, name, type) (__itt_counter)0
#define __itt_counter_create_v3_ptr 0
#endif /* ITT_PLATFORM==ITT_PLATFORM_WIN */
#endif /* INTEL_NO_ITTNOTIFY_API */
#else /* INTEL_NO_MACRO_BODY */
#if ITT_PLATFORM==ITT_PLATFORM_WIN
#define __itt_counter_createA_v3_ptr 0
#define __itt_counter_createW_v3_ptr 0
#else /* ITT_PLATFORM==ITT_PLATFORM_WIN */
#define __itt_counter_create_v3_ptr 0
#endif /* ITT_PLATFORM==ITT_PLATFORM_WIN */
#endif /* INTEL_NO_MACRO_BODY */
/** @endcond */
/**
* @brief Set the counter value api
*/
void ITTAPI __itt_counter_set_value_v3(__itt_counter counter, void *value_ptr);
#ifndef INTEL_NO_MACRO_BODY
#ifndef INTEL_NO_ITTNOTIFY_API
ITT_STUBV(ITTAPI, void, counter_set_value_v3, (__itt_counter counter, void *value_ptr))
#define __itt_counter_set_value_v3 ITTNOTIFY_VOID(counter_set_value_v3)
#define __itt_counter_set_value_v3_ptr ITTNOTIFY_NAME(counter_set_value_v3)
#else /* INTEL_NO_ITTNOTIFY_API */
#define __itt_counter_set_value_v3(counter, value_ptr)
#define __itt_counter_set_value_v3_ptr 0
#endif /* INTEL_NO_ITTNOTIFY_API */
#else /* INTEL_NO_MACRO_BODY */
#define __itt_counter_set_value_v3_ptr 0
#endif /* INTEL_NO_MACRO_BODY */
/** @endcond */
/**
* @brief describes the type of context metadata
*/
typedef enum {
__itt_context_unknown = 0, /*!< Undefined type */
__itt_context_nameA, /*!< ASCII string char* type */
__itt_context_nameW, /*!< Unicode string wchar_t* type */
__itt_context_deviceA, /*!< ASCII string char* type */
__itt_context_deviceW, /*!< Unicode string wchar_t* type */
__itt_context_unitsA, /*!< ASCII string char* type */
__itt_context_unitsW, /*!< Unicode string wchar_t* type */
__itt_context_pci_addrA, /*!< ASCII string char* type */
__itt_context_pci_addrW, /*!< Unicode string wchar_t* type */
__itt_context_tid, /*!< Unsigned 64-bit integer type */
__itt_context_max_val, /*!< Unsigned 64-bit integer type */
__itt_context_bandwidth_flag, /*!< Unsigned 64-bit integer type */
__itt_context_latency_flag, /*!< Unsigned 64-bit integer type */
__itt_context_occupancy_flag, /*!< Unsigned 64-bit integer type */
__itt_context_on_thread_flag, /*!< Unsigned 64-bit integer type */
__itt_context_is_abs_val_flag, /*!< Unsigned 64-bit integer type */
__itt_context_cpu_instructions_flag, /*!< Unsigned 64-bit integer type */
__itt_context_cpu_cycles_flag /*!< Unsigned 64-bit integer type */
} __itt_context_type;
#if defined(UNICODE) || defined(_UNICODE)
# define __itt_context_name __itt_context_nameW
# define __itt_context_device __itt_context_deviceW
# define __itt_context_units __itt_context_unitsW
# define __itt_context_pci_addr __itt_context_pci_addrW
#else /* UNICODE || _UNICODE */
# define __itt_context_name __itt_context_nameA
# define __itt_context_device __itt_context_deviceA
# define __itt_context_units __itt_context_unitsA
# define __itt_context_pci_addr __itt_context_pci_addrA
#endif /* UNICODE || _UNICODE */
/** @cond exclude_from_documentation */
#pragma pack(push, 8)
typedef struct ___itt_context_metadata
{
__itt_context_type type; /*!< Type of the context metadata value */
void* value; /*!< Pointer to context metadata value itself */
} __itt_context_metadata;
#pragma pack(pop)
/** @endcond */
/** @cond exclude_from_documentation */
#pragma pack(push, 8)
typedef struct ___itt_counter_metadata
{
__itt_counter counter; /*!< Associated context metadata counter */
__itt_context_type type; /*!< Type of the context metadata value */
const char* str_valueA; /*!< String context metadata value */
#if defined(UNICODE) || defined(_UNICODE)
const wchar_t* str_valueW;
#else /* UNICODE || _UNICODE */
void* str_valueW;
#endif /* UNICODE || _UNICODE */
unsigned long long value; /*!< Numeric context metadata value */
int extra1; /*!< Reserved to the runtime */
void* extra2; /*!< Reserved to the runtime */
struct ___itt_counter_metadata* next;
} __itt_counter_metadata;
#pragma pack(pop)
/** @endcond */
/**
* @brief Bind context metadata to counter instance
* @param[in] counter Pointer to the counter instance to which the context metadata is to be associated.
* @param[in] length The number of elements in context metadata array.
* @param[in] metadata The context metadata itself.
*/
void ITTAPI __itt_bind_context_metadata_to_counter(__itt_counter counter, size_t length, __itt_context_metadata* metadata);
/** @cond exclude_from_documentation */
#ifndef INTEL_NO_MACRO_BODY
#ifndef INTEL_NO_ITTNOTIFY_API
ITT_STUBV(ITTAPI, void, bind_context_metadata_to_counter, (__itt_counter counter, size_t length, __itt_context_metadata* metadata))
#define __itt_bind_context_metadata_to_counter ITTNOTIFY_VOID(bind_context_metadata_to_counter)
#define __itt_bind_context_metadata_to_counter_ptr ITTNOTIFY_NAME(bind_context_metadata_to_counter)
#else /* INTEL_NO_ITTNOTIFY_API */
#define __itt_bind_context_metadata_to_counter(counter, length, metadata)
#define __itt_bind_context_metadata_to_counter_ptr 0
#endif /* INTEL_NO_ITTNOTIFY_API */
#else /* INTEL_NO_MACRO_BODY */
#define __itt_bind_context_metadata_to_counter_ptr 0
#endif /* INTEL_NO_MACRO_BODY */
/** @endcond */
#ifdef __cplusplus
}
@ -4005,7 +4547,7 @@ ITT_STUB(ITTAPI, __itt_caller, stack_caller_create, (void))
/** @endcond */
/**
* @brief Destroy the inforamtion about stitch point identified by the pointer previously returned by __itt_stack_caller_create()
* @brief Destroy the information about stitch point identified by the pointer previously returned by __itt_stack_caller_create()
*/
void ITTAPI __itt_stack_caller_destroy(__itt_caller id);

View File

@ -1,60 +1,8 @@
/* <copyright>
This file is provided under a dual BSD/GPLv2 license. When using or
redistributing this file, you may do so under either license.
/*
Copyright (C) 2005-2019 Intel Corporation
GPL LICENSE SUMMARY
Copyright (c) 2005-2014 Intel Corporation. All rights reserved.
This program is free software; you can redistribute it and/or modify
it under the terms of version 2 of the GNU General Public License as
published by the Free Software Foundation.
This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
The full GNU General Public License is included in this distribution
in the file called LICENSE.GPL.
Contact Information:
http://software.intel.com/en-us/articles/intel-vtune-amplifier-xe/
BSD LICENSE
Copyright (c) 2005-2014 Intel Corporation. All rights reserved.
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
* Neither the name of Intel Corporation nor the names of its
contributors may be used to endorse or promote products derived
from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
</copyright> */
SPDX-License-Identifier: GPL-2.0-only OR BSD-3-Clause
*/
#ifndef __JITPROFILING_H__
#define __JITPROFILING_H__
@ -66,7 +14,7 @@
* generated code that can be used by performance tools. The user inserts
* calls in the code generator to report information before JIT-compiled
* code goes to execution. This information is collected at runtime and used
* by tools like Intel(R) VTune(TM) Amplifier to display performance metrics
* by tools like Intel(R) VTune(TM) Profiler to display performance metrics
* associated with JIT-compiled code.
*
* These APIs can be used to\n
@ -97,16 +45,16 @@
* * Expected behavior:
* * If any iJVM_EVENT_TYPE_METHOD_LOAD_FINISHED event overwrites an
* already reported method, then such a method becomes invalid and its
* memory region is treated as unloaded. VTune Amplifier displays the metrics
* memory region is treated as unloaded. VTune Profiler displays the metrics
* collected by the method until it is overwritten.
* * If supplied line number information contains multiple source lines for
* the same assembly instruction (code location), then VTune Amplifier picks up
* the same assembly instruction (code location), then VTune Profiler picks up
* the first line number.
* * Dynamically generated code can be associated with a module name.
* Use the iJIT_Method_Load_V2 structure.\n
* Clarification of some cases:
* * If you register a function with the same method ID multiple times,
* specifying different module names, then the VTune Amplifier picks up
* specifying different module names, then the VTune Profiler picks up
* the module name registered first. If you want to distinguish the same
* function between different JIT engines, supply different method IDs for
* each function. Other symbolic information (for example, source file)
@ -143,18 +91,18 @@
* belonging to the same method. Symbolic information (method name,
* source file name) will be taken from the first notification, and all
* subsequent notifications with the same method ID will be processed
* only for line number table information. So, the VTune Amplifier will map
* only for line number table information. So, the VTune Profiler will map
* samples to a source line using the line number table from the current
* notification while taking the source file name from the very first one.\n
* Clarification of some cases:\n
* * If you register a second code region with a different source file
* name and the same method ID, then this information will be saved and
* will not be considered as an extension of the first code region, but
* VTune Amplifier will use the source file of the first code region and map
* VTune Profiler will use the source file of the first code region and map
* performance metrics incorrectly.
* * If you register a second code region with the same source file as
* for the first region and the same method ID, then the source file will be
* discarded but VTune Amplifier will map metrics to the source file correctly.
* discarded but VTune Profiler will map metrics to the source file correctly.
* * If you register a second code region with a null source file and
* the same method ID, then provided line number info will be associated
* with the source file of the first code region.
@ -293,7 +241,7 @@ typedef enum _iJIT_IsProfilingActiveFlags
* @brief Description of a single entry in the line number information of a code region.
* @details A table of line number entries gives information about how the reported code region
* is mapped to source file.
* Intel(R) VTune(TM) Amplifier uses line number information to attribute
* Intel(R) VTune(TM) Profiler uses line number information to attribute
* the samples (virtual address) to a line number. \n
* It is acceptable to report different code addresses for the same source line:
* @code
@ -304,7 +252,7 @@ typedef enum _iJIT_IsProfilingActiveFlags
* 18 1
* 21 30
*
* VTune Amplifier constructs the following table using the client data
* VTune Profiler constructs the following table using the client data
*
* Code subrange Line number
* 0-1 2
@ -428,7 +376,7 @@ typedef struct _iJIT_Method_Load_V2
char* module_name; /**<\brief Module name. Can be NULL.
The module name can be useful for distinguishing among
different JIT engines. VTune Amplifier will display
different JIT engines. VTune Profiler will display
reported methods grouped by specific module. */
} *piJIT_Method_Load_V2, iJIT_Method_Load_V2;
@ -480,7 +428,7 @@ typedef struct _iJIT_Method_Load_V3
char* module_name; /**<\brief Module name. Can be NULL.
* The module name can be useful for distinguishing among
* different JIT engines. VTune Amplifier will display
* different JIT engines. VTune Profiler will display
* reported methods grouped by specific module. */
iJIT_CodeArchitecture module_arch; /**<\brief Architecture of the method's code region.
@ -490,9 +438,9 @@ typedef struct _iJIT_Method_Load_V3
* engine generates 64-bit code.
*
* If JIT engine reports both 32-bit and 64-bit types
* of methods then VTune Amplifier splits the methods
* of methods then VTune Profiler splits the methods
* with the same module name but with different
* architectures in two different modules. VTune Amplifier
* architectures in two different modules. VTune Profiler
* modifies the original name provided with a 64-bit method
* version by ending it with '(64)' */
@ -561,9 +509,9 @@ typedef enum _iJIT_SegmentType
iJIT_CT_CODE, /**<\brief Executable code. */
iJIT_CT_DATA, /**<\brief Data (not executable code).
* VTune Amplifier uses the format string
* VTune Profiler uses the format string
* (see iJIT_Method_Update) to represent
* this data in the VTune Amplifier GUI */
* this data in the VTune Profiler GUI */
iJIT_CT_KEEP, /**<\brief Use the previous markup for the trace.
* Can be used for the following
@ -580,11 +528,11 @@ typedef enum _iJIT_SegmentType
* structure to describe the update of the content within a JIT-compiled method,
* use iJVM_EVENT_TYPE_METHOD_UPDATE_V2 as an event type to report it.
*
* On the first Update event, VTune Amplifier copies the original code range reported by
* On the first Update event, VTune Profiler copies the original code range reported by
* the iJVM_EVENT_TYPE_METHOD_LOAD event, then modifies it with the supplied bytes and
* adds the modified range to the original method. For next update events, VTune Amplifier
* adds the modified range to the original method. For next update events, VTune Profiler
* does the same but it uses the latest modified version of a code region for update.
* Eventually, VTune Amplifier GUI displays multiple code ranges for the method reported by
* Eventually, VTune Profiler GUI displays multiple code ranges for the method reported by
* the iJVM_EVENT_TYPE_METHOD_LOAD event.
* Notes:
* - Multiple update events with different types for the same trace are allowed
@ -673,7 +621,7 @@ iJIT_IsProfilingActiveFlags JITAPI iJIT_IsProfilingActive(void);
* @brief Reports infomation about JIT-compiled code to the agent.
*
* The reported information is used to attribute samples obtained from any
* Intel(R) VTune(TM) Amplifier collector. This API needs to be called
* Intel(R) VTune(TM) Profiler collector. This API needs to be called
* after JIT compilation and before the first entry into the JIT-compiled
* code.
*

View File

@ -1,60 +1,8 @@
/* <copyright>
This file is provided under a dual BSD/GPLv2 license. When using or
redistributing this file, you may do so under either license.
/*
Copyright (C) 2005-2019 Intel Corporation
GPL LICENSE SUMMARY
Copyright (c) 2005-2014 Intel Corporation. All rights reserved.
This program is free software; you can redistribute it and/or modify
it under the terms of version 2 of the GNU General Public License as
published by the Free Software Foundation.
This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
The full GNU General Public License is included in this distribution
in the file called LICENSE.GPL.
Contact Information:
http://software.intel.com/en-us/articles/intel-vtune-amplifier-xe/
BSD LICENSE
Copyright (c) 2005-2014 Intel Corporation. All rights reserved.
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
* Neither the name of Intel Corporation nor the names of its
contributors may be used to endorse or promote products derived
from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
</copyright> */
SPDX-License-Identifier: GPL-2.0-only OR BSD-3-Clause
*/
#ifndef _LEGACY_ITTNOTIFY_H_
#define _LEGACY_ITTNOTIFY_H_
@ -80,6 +28,10 @@
# define ITT_OS_FREEBSD 4
#endif /* ITT_OS_FREEBSD */
#ifndef ITT_OS_OPENBSD
# define ITT_OS_OPENBSD 5
#endif /* ITT_OS_OPENBSD */
#ifndef ITT_OS
# if defined WIN32 || defined _WIN32
# define ITT_OS ITT_OS_WIN
@ -87,6 +39,8 @@
# define ITT_OS ITT_OS_MAC
# elif defined( __FreeBSD__ )
# define ITT_OS ITT_OS_FREEBSD
# elif defined( __OpenBSD__ )
# define ITT_OS ITT_OS_OPENBSD
# else
# define ITT_OS ITT_OS_LINUX
# endif
@ -108,6 +62,10 @@
# define ITT_PLATFORM_FREEBSD 4
#endif /* ITT_PLATFORM_FREEBSD */
#ifndef ITT_PLATFORM_OPENBSD
# define ITT_PLATFORM_OPENBSD 5
#endif /* ITT_PLATFORM_OPENBSD */
#ifndef ITT_PLATFORM
# if ITT_OS==ITT_OS_WIN
# define ITT_PLATFORM ITT_PLATFORM_WIN
@ -115,6 +73,8 @@
# define ITT_PLATFORM ITT_PLATFORM_MAC
# elif ITT_OS==ITT_OS_FREEBSD
# define ITT_PLATFORM ITT_PLATFORM_FREEBSD
# elif ITT_OS==ITT_OS_OPENBSD
# define ITT_PLATFORM ITT_PLATFORM_OPENBSD
# else
# define ITT_PLATFORM ITT_PLATFORM_POSIX
# endif
@ -167,7 +127,12 @@
#if ITT_PLATFORM==ITT_PLATFORM_WIN
/* use __forceinline (VC++ specific) */
#define ITT_INLINE __forceinline
#if defined(__MINGW32__) && !defined(__cplusplus)
#define ITT_INLINE static __inline__ __attribute__((__always_inline__,__gnu_inline__))
#else
#define ITT_INLINE static __forceinline
#endif /* __MINGW32__ */
#define ITT_INLINE_ATTRIBUTE /* nothing */
#else /* ITT_PLATFORM==ITT_PLATFORM_WIN */
/*
@ -219,20 +184,20 @@
#define ITTNOTIFY_VOID(n) (!ITTNOTIFY_NAME(n)) ? (void)0 : ITTNOTIFY_NAME(n)
#define ITTNOTIFY_DATA(n) (!ITTNOTIFY_NAME(n)) ? 0 : ITTNOTIFY_NAME(n)
#define ITTNOTIFY_VOID_D0(n,d) (!(d)->flags) ? (void)0 : (!ITTNOTIFY_NAME(n)) ? (void)0 : ITTNOTIFY_NAME(n)(d)
#define ITTNOTIFY_VOID_D1(n,d,x) (!(d)->flags) ? (void)0 : (!ITTNOTIFY_NAME(n)) ? (void)0 : ITTNOTIFY_NAME(n)(d,x)
#define ITTNOTIFY_VOID_D2(n,d,x,y) (!(d)->flags) ? (void)0 : (!ITTNOTIFY_NAME(n)) ? (void)0 : ITTNOTIFY_NAME(n)(d,x,y)
#define ITTNOTIFY_VOID_D3(n,d,x,y,z) (!(d)->flags) ? (void)0 : (!ITTNOTIFY_NAME(n)) ? (void)0 : ITTNOTIFY_NAME(n)(d,x,y,z)
#define ITTNOTIFY_VOID_D4(n,d,x,y,z,a) (!(d)->flags) ? (void)0 : (!ITTNOTIFY_NAME(n)) ? (void)0 : ITTNOTIFY_NAME(n)(d,x,y,z,a)
#define ITTNOTIFY_VOID_D5(n,d,x,y,z,a,b) (!(d)->flags) ? (void)0 : (!ITTNOTIFY_NAME(n)) ? (void)0 : ITTNOTIFY_NAME(n)(d,x,y,z,a,b)
#define ITTNOTIFY_VOID_D6(n,d,x,y,z,a,b,c) (!(d)->flags) ? (void)0 : (!ITTNOTIFY_NAME(n)) ? (void)0 : ITTNOTIFY_NAME(n)(d,x,y,z,a,b,c)
#define ITTNOTIFY_DATA_D0(n,d) (!(d)->flags) ? 0 : (!ITTNOTIFY_NAME(n)) ? 0 : ITTNOTIFY_NAME(n)(d)
#define ITTNOTIFY_DATA_D1(n,d,x) (!(d)->flags) ? 0 : (!ITTNOTIFY_NAME(n)) ? 0 : ITTNOTIFY_NAME(n)(d,x)
#define ITTNOTIFY_DATA_D2(n,d,x,y) (!(d)->flags) ? 0 : (!ITTNOTIFY_NAME(n)) ? 0 : ITTNOTIFY_NAME(n)(d,x,y)
#define ITTNOTIFY_DATA_D3(n,d,x,y,z) (!(d)->flags) ? 0 : (!ITTNOTIFY_NAME(n)) ? 0 : ITTNOTIFY_NAME(n)(d,x,y,z)
#define ITTNOTIFY_DATA_D4(n,d,x,y,z,a) (!(d)->flags) ? 0 : (!ITTNOTIFY_NAME(n)) ? 0 : ITTNOTIFY_NAME(n)(d,x,y,z,a)
#define ITTNOTIFY_DATA_D5(n,d,x,y,z,a,b) (!(d)->flags) ? 0 : (!ITTNOTIFY_NAME(n)) ? 0 : ITTNOTIFY_NAME(n)(d,x,y,z,a,b)
#define ITTNOTIFY_DATA_D6(n,d,x,y,z,a,b,c) (!(d)->flags) ? 0 : (!ITTNOTIFY_NAME(n)) ? 0 : ITTNOTIFY_NAME(n)(d,x,y,z,a,b,c)
#define ITTNOTIFY_VOID_D0(n,d) (d == NULL) ? (void)0 : (!(d)->flags) ? (void)0 : (!ITTNOTIFY_NAME(n)) ? (void)0 : ITTNOTIFY_NAME(n)(d)
#define ITTNOTIFY_VOID_D1(n,d,x) (d == NULL) ? (void)0 : (!(d)->flags) ? (void)0 : (!ITTNOTIFY_NAME(n)) ? (void)0 : ITTNOTIFY_NAME(n)(d,x)
#define ITTNOTIFY_VOID_D2(n,d,x,y) (d == NULL) ? (void)0 : (!(d)->flags) ? (void)0 : (!ITTNOTIFY_NAME(n)) ? (void)0 : ITTNOTIFY_NAME(n)(d,x,y)
#define ITTNOTIFY_VOID_D3(n,d,x,y,z) (d == NULL) ? (void)0 : (!(d)->flags) ? (void)0 : (!ITTNOTIFY_NAME(n)) ? (void)0 : ITTNOTIFY_NAME(n)(d,x,y,z)
#define ITTNOTIFY_VOID_D4(n,d,x,y,z,a) (d == NULL) ? (void)0 : (!(d)->flags) ? (void)0 : (!ITTNOTIFY_NAME(n)) ? (void)0 : ITTNOTIFY_NAME(n)(d,x,y,z,a)
#define ITTNOTIFY_VOID_D5(n,d,x,y,z,a,b) (d == NULL) ? (void)0 : (!(d)->flags) ? (void)0 : (!ITTNOTIFY_NAME(n)) ? (void)0 : ITTNOTIFY_NAME(n)(d,x,y,z,a,b)
#define ITTNOTIFY_VOID_D6(n,d,x,y,z,a,b,c) (d == NULL) ? (void)0 : (!(d)->flags) ? (void)0 : (!ITTNOTIFY_NAME(n)) ? (void)0 : ITTNOTIFY_NAME(n)(d,x,y,z,a,b,c)
#define ITTNOTIFY_DATA_D0(n,d) (d == NULL) ? 0 : (!(d)->flags) ? 0 : (!ITTNOTIFY_NAME(n)) ? 0 : ITTNOTIFY_NAME(n)(d)
#define ITTNOTIFY_DATA_D1(n,d,x) (d == NULL) ? 0 : (!(d)->flags) ? 0 : (!ITTNOTIFY_NAME(n)) ? 0 : ITTNOTIFY_NAME(n)(d,x)
#define ITTNOTIFY_DATA_D2(n,d,x,y) (d == NULL) ? 0 : (!(d)->flags) ? 0 : (!ITTNOTIFY_NAME(n)) ? 0 : ITTNOTIFY_NAME(n)(d,x,y)
#define ITTNOTIFY_DATA_D3(n,d,x,y,z) (d == NULL) ? 0 : (!(d)->flags) ? 0 : (!ITTNOTIFY_NAME(n)) ? 0 : ITTNOTIFY_NAME(n)(d,x,y,z)
#define ITTNOTIFY_DATA_D4(n,d,x,y,z,a) (d == NULL) ? 0 : (!(d)->flags) ? 0 : (!ITTNOTIFY_NAME(n)) ? 0 : ITTNOTIFY_NAME(n)(d,x,y,z,a)
#define ITTNOTIFY_DATA_D5(n,d,x,y,z,a,b) (d == NULL) ? 0 : (!(d)->flags) ? 0 : (!ITTNOTIFY_NAME(n)) ? 0 : ITTNOTIFY_NAME(n)(d,x,y,z,a,b)
#define ITTNOTIFY_DATA_D6(n,d,x,y,z,a,b,c) (d == NULL) ? 0 : (!(d)->flags) ? 0 : (!ITTNOTIFY_NAME(n)) ? 0 : ITTNOTIFY_NAME(n)(d,x,y,z,a,b,c)
#ifdef ITT_STUB
#undef ITT_STUB
@ -269,7 +234,7 @@ extern "C" {
* only pauses tracing and analyzing memory access.
* It does not pause tracing or analyzing threading APIs.
* .
* - Intel(R) Parallel Amplifier and Intel(R) VTune(TM) Amplifier XE:
* - Intel(R) VTune(TM) Profiler:
* - Does continue to record when new threads are started.
* .
* - Other effects:
@ -1005,9 +970,9 @@ ITT_STUB(ITTAPI, __itt_frame, frame_create, (const char *domain))
#endif /* INTEL_NO_MACRO_BODY */
/** @endcond */
/** @brief Record an frame begin occurrence. */
/** @brief Record a frame begin occurrence. */
void ITTAPI __itt_frame_begin(__itt_frame frame);
/** @brief Record an frame end occurrence. */
/** @brief Record a frame end occurrence. */
void ITTAPI __itt_frame_end (__itt_frame frame);
/** @cond exclude_from_documentation */

View File

@ -1,60 +1,8 @@
/* <copyright>
This file is provided under a dual BSD/GPLv2 license. When using or
redistributing this file, you may do so under either license.
/*
Copyright (C) 2005-2019 Intel Corporation
GPL LICENSE SUMMARY
Copyright (c) 2005-2014 Intel Corporation. All rights reserved.
This program is free software; you can redistribute it and/or modify
it under the terms of version 2 of the GNU General Public License as
published by the Free Software Foundation.
This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
The full GNU General Public License is included in this distribution
in the file called LICENSE.GPL.
Contact Information:
http://software.intel.com/en-us/articles/intel-vtune-amplifier-xe/
BSD LICENSE
Copyright (c) 2005-2014 Intel Corporation. All rights reserved.
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
* Neither the name of Intel Corporation nor the names of its
contributors may be used to endorse or promote products derived
from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
</copyright> */
SPDX-License-Identifier: GPL-2.0-only OR BSD-3-Clause
*/
#ifndef _LIBITTNOTIFY_H_
#define _LIBITTNOTIFY_H_

View File

@ -1,241 +0,0 @@
/* <copyright>
This file is provided under a dual BSD/GPLv2 license. When using or
redistributing this file, you may do so under either license.
GPL LICENSE SUMMARY
Copyright (c) 2005-2014 Intel Corporation. All rights reserved.
This program is free software; you can redistribute it and/or modify
it under the terms of version 2 of the GNU General Public License as
published by the Free Software Foundation.
This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
The full GNU General Public License is included in this distribution
in the file called LICENSE.GPL.
Contact Information:
http://software.intel.com/en-us/articles/intel-vtune-amplifier-xe/
BSD LICENSE
Copyright (c) 2005-2014 Intel Corporation. All rights reserved.
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
* Neither the name of Intel Corporation nor the names of its
contributors may be used to endorse or promote products derived
from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
</copyright> */
/*
* This file implements an interface bridge from Low-Level Virtual Machine
* llvm::JITEventListener to Intel JIT Profiling API. It passes the function
* and line information to the appropriate functions in the JIT profiling
* interface so that any LLVM-based JIT engine can emit the JIT code
* notifications that the profiler will receive.
*
* Usage model:
*
* 1. Register the listener implementation instance with the execution engine:
*
* #include <llvm_jit_event_listener.hpp>
* ...
* ExecutionEngine *TheExecutionEngine;
* ...
* TheExecutionEngine = EngineBuilder(TheModule).create();
* ...
* __itt_llvm_jit_event_listener jitListener;
* TheExecutionEngine->RegisterJITEventListener(&jitListener);
* ...
*
* 2. When compiling make sure to add the ITT API include directory to the
* compiler include directories, ITT API library directory to the linker
* library directories and link with jitprofling static library.
*/
#ifndef __ITT_LLVM_JIT_EVENT_LISTENER_HPP__
#define __ITT_LLVM_JIT_EVENT_LISTENER_HPP__
#include "jitprofiling.h"
#include <llvm/Function.h>
#include <llvm/ExecutionEngine/JITEventListener.h>
#include <llvm/ADT/StringRef.h>
#include <llvm/Analysis/DebugInfo.h>
#include <map>
#include <cassert>
// Uncomment the line below to turn on logging to stderr
#define JITPROFILING_DEBUG_ENABLE
// Some elementary logging support
#ifdef JITPROFILING_DEBUG_ENABLE
#include <cstdio>
#include <cstdarg>
static void _jit_debug(const char* format, ...)
{
va_list args;
va_start(args, format);
vfprintf(stderr, format, args);
va_end(args);
}
// Use the macro as JITDEBUG(("foo: %d", foo_val));
#define JITDEBUG(x) \
do { \
_jit_debug("jit-listener: "); \
_jit_debug x; \
} \
while (0)
#else
#define JITDEBUG(x)
#endif
// LLVM JIT event listener, translates the notifications to the JIT profiling
// API information.
class __itt_llvm_jit_event_listener : public llvm::JITEventListener
{
public:
__itt_llvm_jit_event_listener() {}
public:
virtual void NotifyFunctionEmitted(const llvm::Function &F,
void *Code, size_t Size, const EmittedFunctionDetails &Details)
{
std::string name = F.getName().str();
JITDEBUG(("function jitted:\n"));
JITDEBUG((" addr=0x%08x\n", (int)Code));
JITDEBUG((" name=`%s'\n", name.c_str()));
JITDEBUG((" code-size=%d\n", (int)Size));
JITDEBUG((" line-infos-count=%d\n", Details.LineStarts.size()));
// The method must not be in the map - the entry must have been cleared
// from the map in NotifyFreeingMachineCode in case of rejitting.
assert(m_addr2MethodId.find(Code) == m_addr2MethodId.end());
int mid = iJIT_GetNewMethodID();
m_addr2MethodId[Code] = mid;
iJIT_Method_Load mload;
memset(&mload, 0, sizeof mload);
mload.method_id = mid;
// Populate the method size and name information
// TODO: The JIT profiling API should have members as const char pointers.
mload.method_name = (char*)name.c_str();
mload.method_load_address = Code;
mload.method_size = (unsigned int)Size;
// Populate line information now.
// From the JIT API documentation it is not quite clear whether the
// line information can be given in ranges, so we'll populate it for
// every byte of the function, hmm.
std::string srcFilePath;
std::vector<LineNumberInfo> lineInfos;
char *addr = (char*)Code;
char *lineAddr = addr; // Exclusive end point at which current
// line info changes.
const llvm::DebugLoc* loc = 0; // Current line info
int lineIndex = -1; // Current index into the line info table
for (int i = 0; i < Size; ++i, ++addr) {
while (addr >= lineAddr) {
if (lineIndex >= 0 && lineIndex < Details.LineStarts.size()) {
loc = &Details.LineStarts[lineIndex].Loc;
std::string p = getSrcFilePath(F.getContext(), *loc);
assert(srcFilePath.empty() || p == srcFilePath);
srcFilePath = p;
} else {
loc = NULL;
}
lineIndex++;
if (lineIndex >= 0 && lineIndex < Details.LineStarts.size()) {
lineAddr = (char*)Details.LineStarts[lineIndex].Address;
} else {
lineAddr = addr + Size;
}
}
if (loc) {
int line = loc->getLine();
LineNumberInfo info = { i, line };
lineInfos.push_back(info);
JITDEBUG((" addr 0x%08x -> line %d\n", addr, line));
}
}
if (!lineInfos.empty()) {
mload.line_number_size = lineInfos.size();
JITDEBUG((" translated to %d line infos to JIT", (int)lineInfos.size()));
mload.line_number_table = &lineInfos[0];
mload.source_file_name = (char*)srcFilePath.c_str();
}
iJIT_NotifyEvent(iJVM_EVENT_TYPE_METHOD_LOAD_FINISHED, &mload);
}
virtual void NotifyFreeingMachineCode(void *OldPtr)
{
JITDEBUG(("function unjitted\n"));
JITDEBUG((" addr=0x%08x\n", (int)OldPtr));
Addr2MethodId::iterator it = m_addr2MethodId.find(OldPtr);
assert(it != m_addr2MethodId.end());
iJIT_Method_Id mid = { it->second };
iJIT_NotifyEvent(iJVM_EVENT_TYPE_METHOD_UNLOAD_START, &mid);
m_addr2MethodId.erase(it);
}
private:
std::string getSrcFilePath(const llvm::LLVMContext& ctx, const llvm::DebugLoc& loc)
{
llvm::MDNode* node = loc.getAsMDNode(ctx);
llvm::DILocation srcLoc(node);
return srcLoc.getDirectory().str() + "/" + srcLoc.getFilename().str();
}
private:
/// Don't copy
__itt_llvm_jit_event_listener(const __itt_llvm_jit_event_listener&);
__itt_llvm_jit_event_listener& operator=(const __itt_llvm_jit_event_listener&);
private:
typedef std::vector<LineNumberInfo> LineInfoList;
// The method unload notification in VTune JIT profiling API takes the
// method ID, not method address so have to maintain the mapping. Is
// there a more efficient and simple way to do this like attaching the
// method ID information somehow to the LLVM function instance?
//
// TODO: It would be more convenient for the JIT API to take the method
// address, not method ID.
typedef std::map<const void*, int> Addr2MethodId;
Addr2MethodId m_addr2MethodId;
};
#endif // Header guard

View File

@ -1,7 +1,8 @@
Copyright (c) 2011, Intel Corporation
All rights reserved.
Copyright (c) 2019 Intel Corporation. All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
• Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
• Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
• Neither the name of the Intel Corporation nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

View File

@ -1,65 +1,103 @@
The GNU General Public License (GPL)
GNU GENERAL PUBLIC LICENSE
Version 2, June 1991
Copyright (C) 1989, 1991 Free Software Foundation, Inc.
59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
Everyone is permitted to copy and distribute verbatim copies
of this license document, but changing it is not allowed.
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA
Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed.
Preamble
The licenses for most software are designed to take away your freedom to share and change it. By contrast, the GNU General Public License is intended to guarantee your freedom to share and change free software--to make sure the software is free for all its users. This General Public License applies to most of the Free Software Foundation's software and to any other program whose authors commit to using it. (Some other Free Software Foundation software is covered by the GNU Library General Public License instead.) You can apply it to your programs, too.
The licenses for most software are designed to take away your freedom to share and change it. By contrast, the GNU General Public License is intended to guarantee your freedom to share and change free software--to make sure the software is free for all its users. This General Public License applies to most of the Free Software Foundation's software and to any other program whose authors commit to using it. (Some other Free Software Foundation software is covered by the GNU Lesser General Public License instead.) You can apply it to your programs, too.
When we speak of free software, we are referring to freedom, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for this service if you wish), that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs; and that you know you can do these things.
To protect your rights, we need to make restrictions that forbid anyone to deny you these rights or to ask you to surrender the rights. These restrictions translate to certain responsibilities for you if you distribute copies of the software, or if you modify it.
For example, if you distribute copies of such a program, whether gratis or for a fee, you must give the recipients all the rights that you have. You must make sure that they, too, receive or can get the source code. And you must show them these terms so they know their rights.
We protect your rights with two steps: (1) copyright the software, and (2) offer you this license which gives you legal permission to copy, distribute and/or modify the software.
Also, for each author's protection and ours, we want to make certain that everyone understands that there is no warranty for this free software. If the software is modified by someone else and passed on, we want its recipients to know that what they have is not the original, so that any problems introduced by others will not reflect on the original authors' reputations.
Finally, any free program is threatened constantly by software patents. We wish to avoid the danger that redistributors of a free program will individually obtain patent licenses, in effect making the program proprietary. To prevent this, we have made it clear that any patent must be licensed for everyone's free use or not licensed at all.
The precise terms and conditions for copying, distribution and modification follow.
TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
0. This License applies to any program or other work which contains a notice placed by the copyright holder saying it may be distributed under the terms of this General Public License. The "Program", below, refers to any such program or work, and a "work based on the Program" means either the Program or any derivative work under copyright law: that is to say, a work containing the Program or a portion of it, either verbatim or with modifications and/or translated into another language. (Hereinafter, translation is included without limitation in the term "modification".) Each licensee is addressed as "you".
Activities other than copying, distribution and modification are not covered by this License; they are outside its scope. The act of running the Program is not restricted, and the output from the Program is covered only if its contents constitute a work based on the Program (independent of having been made by running the Program). Whether that is true depends on what the Program does.
1. You may copy and distribute verbatim copies of the Program's source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty; keep intact all the notices that refer to this License and to the absence of any warranty; and give any other recipients of the Program a copy of this License along with the Program.
You may charge a fee for the physical act of transferring a copy, and you may at your option offer warranty protection in exchange for a fee.
2. You may modify your copy or copies of the Program or any portion of it, thus forming a work based on the Program, and copy and distribute such modifications or work under the terms of Section 1 above, provided that you also meet all of these conditions:
a) You must cause the modified files to carry prominent notices stating that you changed the files and the date of any change.
b) You must cause any work that you distribute or publish, that in whole or in part contains or is derived from the Program or any part thereof, to be licensed as a whole at no charge to all third parties under the terms of this License.
c) If the modified program normally reads commands interactively when run, you must cause it, when started running for such interactive use in the most ordinary way, to print or display an announcement including an appropriate copyright notice and a notice that there is no warranty (or else, saying that you provide a warranty) and that users may redistribute the program under these conditions, and telling the user how to view a copy of this License. (Exception: if the Program itself is interactive but does not normally print such an announcement, your work based on the Program is not required to print an announcement.)
These requirements apply to the modified work as a whole. If identifiable sections of that work are not derived from the Program, and can be reasonably considered independent and separate works in themselves, then this License, and its terms, do not apply to those sections when you distribute them as separate works. But when you distribute the same sections as part of a whole which is a work based on the Program, the distribution of the whole must be on the terms of this License, whose permissions for other licensees extend to the entire whole, and thus to each and every part regardless of who wrote it.
Thus, it is not the intent of this section to claim rights or contest your rights to work written entirely by you; rather, the intent is to exercise the right to control the distribution of derivative or collective works based on the Program.
In addition, mere aggregation of another work not based on the Program with the Program (or with a work based on the Program) on a volume of a storage or distribution medium does not bring the other work under the scope of this License.
3. You may copy and distribute the Program (or a work based on it, under Section 2) in object code or executable form under the terms of Sections 1 and 2 above provided that you also do one of the following:
a) Accompany it with the complete corresponding machine-readable source code, which must be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or,
b) Accompany it with a written offer, valid for at least three years, to give any third party, for a charge no more than your cost of physically performing source distribution, a complete machine-readable copy of the corresponding source code, to be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or,
c) Accompany it with the information you received as to the offer to distribute corresponding source code. (This alternative is allowed only for noncommercial distribution and only if you received the program in object code or executable form with such an offer, in accord with Subsection b above.)
The source code for a work means the preferred form of the work for making modifications to it. For an executable work, complete source code means all the source code for all modules it contains, plus any associated interface definition files, plus the scripts used to control compilation and installation of the executable. However, as a special exception, the source code distributed need not include anything that is normally distributed (in either source or binary form) with the major components (compiler, kernel, and so on) of the operating system on which the executable runs, unless that component itself accompanies the executable.
If distribution of executable or object code is made by offering access to copy from a designated place, then offering equivalent access to copy the source code from the same place counts as distribution of the source code, even though third parties are not compelled to copy the source along with the object code.
4. You may not copy, modify, sublicense, or distribute the Program except as expressly provided under this License. Any attempt otherwise to copy, modify, sublicense or distribute the Program is void, and will automatically terminate your rights under this License. However, parties who have received copies, or rights, from you under this License will not have their licenses terminated so long as such parties remain in full compliance.
5. You are not required to accept this License, since you have not signed it. However, nothing else grants you permission to modify or distribute the Program or its derivative works. These actions are prohibited by law if you do not accept this License. Therefore, by modifying or distributing the Program (or any work based on the Program), you indicate your acceptance of this License to do so, and all its terms and conditions for copying, distributing or modifying the Program or works based on it.
6. Each time you redistribute the Program (or any work based on the Program), the recipient automatically receives a license from the original licensor to copy, distribute or modify the Program subject to these terms and conditions. You may not impose any further restrictions on the recipients' exercise of the rights granted herein. You are not responsible for enforcing compliance by third parties to this License.
7. If, as a consequence of a court judgment or allegation of patent infringement or for any other reason (not limited to patent issues), conditions are imposed on you (whether by court order, agreement or otherwise) that contradict the conditions of this License, they do not excuse you from the conditions of this License. If you cannot distribute so as to satisfy simultaneously your obligations under this License and any other pertinent obligations, then as a consequence you may not distribute the Program at all. For example, if a patent license would not permit royalty-free redistribution of the Program by all those who receive copies directly or indirectly through you, then the only way you could satisfy both it and this License would be to refrain entirely from distribution of the Program.
If any portion of this section is held invalid or unenforceable under any particular circumstance, the balance of the section is intended to apply and the section as a whole is intended to apply in other circumstances.
It is not the purpose of this section to induce you to infringe any patents or other property right claims or to contest validity of any such claims; this section has the sole purpose of protecting the integrity of the free software distribution system, which is implemented by public license practices. Many people have made generous contributions to the wide range of software distributed through that system in reliance on consistent application of that system; it is up to the author/donor to decide if he or she is willing to distribute software through any other system and a licensee cannot impose that choice.
This section is intended to make thoroughly clear what is believed to be a consequence of the rest of this License.
8. If the distribution and/or use of the Program is restricted in certain countries either by patents or by copyrighted interfaces, the original copyright holder who places the Program under this License may add an explicit geographical distribution limitation excluding those countries, so that distribution is permitted only in or among countries not thus excluded. In such case, this License incorporates the limitation as if written in the body of this License.
9. The Free Software Foundation may publish revised and/or new versions of the General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns.
Each version is given a distinguishing version number. If the Program specifies a version number of this License which applies to it and "any later version", you have the option of following the terms and conditions either of that version or of any later version published by the Free Software Foundation. If the Program does not specify a version number of this License, you may choose any version ever published by the Free Software Foundation.
10. If you wish to incorporate parts of the Program into other free programs whose distribution conditions are different, write to the author to ask for permission. For software which is copyrighted by the Free Software Foundation, write to the Free Software Foundation; we sometimes make exceptions for this. Our decision will be guided by the two goals of preserving the free status of all derivatives of our free software and of promoting the sharing and reuse of software generally.
NO WARRANTY
11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
END OF TERMS AND CONDITIONS
How to Apply These Terms to Your New Programs
If you develop a new program, and you want it to be of the greatest possible use to the public, the best way to achieve this is to make it free software which everyone can redistribute and change under these terms.
To do so, attach the following notices to the program. It is safest to attach them to the start of each source file to most effectively convey the exclusion of warranty; and each file should have at least the "copyright" line and a pointer to where the full notice is found.
One line to give the program's name and a brief idea of what it does.
Copyright (C) <year> <name of author>
<one line to give the program's name and an idea of what it does.>
Copyright (C) < yyyy> <name of author>
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
Also add information on how to contact you by electronic and paper mail.
If the program is interactive, make it output a short notice like this when it starts in an interactive mode:
Gnomovision version 69, Copyright (C) year name of author Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. This is free software, and you are welcome to redistribute it under certain conditions; type `show c' for details.
The hypothetical commands `show w' and `show c' should show the appropriate parts of the General Public License. Of course, the commands you use may be called something other than `show w' and `show c'; they could even be mouse-clicks or menu items--whatever suits your program.
You should also get your employer (if you work as a programmer) or your school, if any, to sign a "copyright disclaimer" for the program, if necessary. Here is a sample; alter the names:
Yoyodyne, Inc., hereby disclaims all copyright interest in the program `Gnomovision' (which makes passes at compilers) written by James Hacker.
signature of Ty Coon, 1 April 1989
Ty Coon, President of Vice
This General Public License does not permit incorporating your program into proprietary programs. If your program is a subroutine library, you may consider it more useful to permit linking proprietary applications with the library. If this is what you want to do, use the GNU Library General Public License instead of this License.
<signature of Ty Coon>, 1 April 1989 Ty Coon, President of Vice
This General Public License does not permit incorporating your program into proprietary programs. If your program is a subroutine library, you may consider it more useful to permit linking proprietary applications with the library. If this is what you want to do, use the GNU Lesser General Public License instead of this License.

View File

@ -1,71 +1,23 @@
/* <copyright>
This file is provided under a dual BSD/GPLv2 license. When using or
redistributing this file, you may do so under either license.
/*
Copyright (C) 2005-2019 Intel Corporation
GPL LICENSE SUMMARY
Copyright (c) 2005-2014 Intel Corporation. All rights reserved.
This program is free software; you can redistribute it and/or modify
it under the terms of version 2 of the GNU General Public License as
published by the Free Software Foundation.
This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
The full GNU General Public License is included in this distribution
in the file called LICENSE.GPL.
Contact Information:
http://software.intel.com/en-us/articles/intel-vtune-amplifier-xe/
BSD LICENSE
Copyright (c) 2005-2014 Intel Corporation. All rights reserved.
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
* Neither the name of Intel Corporation nor the names of its
contributors may be used to endorse or promote products derived
from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
</copyright> */
SPDX-License-Identifier: GPL-2.0-only OR BSD-3-Clause
*/
#include "ittnotify_config.h"
#if ITT_PLATFORM==ITT_PLATFORM_WIN
#if defined _MSC_VER
#pragma warning (disable: 593) /* parameter "XXXX" was set but never used */
#pragma warning (disable: 344) /* typedef name has already been declared (with same type) */
#pragma warning (disable: 174) /* expression has no effect */
#pragma warning (disable: 4127) /* conditional expression is constant */
#pragma warning (disable: 4306) /* conversion from '?' to '?' of greater size */
#endif
#endif /* ITT_PLATFORM==ITT_PLATFORM_WIN */
#if defined __INTEL_COMPILER

View File

@ -1,60 +1,8 @@
/* <copyright>
This file is provided under a dual BSD/GPLv2 license. When using or
redistributing this file, you may do so under either license.
/*
Copyright (C) 2005-2019 Intel Corporation
GPL LICENSE SUMMARY
Copyright (c) 2005-2014 Intel Corporation. All rights reserved.
This program is free software; you can redistribute it and/or modify
it under the terms of version 2 of the GNU General Public License as
published by the Free Software Foundation.
This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
The full GNU General Public License is included in this distribution
in the file called LICENSE.GPL.
Contact Information:
http://software.intel.com/en-us/articles/intel-vtune-amplifier-xe/
BSD LICENSE
Copyright (c) 2005-2014 Intel Corporation. All rights reserved.
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
* Neither the name of Intel Corporation nor the names of its
contributors may be used to endorse or promote products derived
from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
</copyright> */
SPDX-License-Identifier: GPL-2.0-only OR BSD-3-Clause
*/
#ifndef _ITTNOTIFY_CONFIG_H_
#define _ITTNOTIFY_CONFIG_H_
@ -75,6 +23,10 @@
# define ITT_OS_FREEBSD 4
#endif /* ITT_OS_FREEBSD */
#ifndef ITT_OS_OPENBSD
# define ITT_OS_OPENBSD 5
#endif /* ITT_OS_OPENBSD */
#ifndef ITT_OS
# if defined WIN32 || defined _WIN32
# define ITT_OS ITT_OS_WIN
@ -82,6 +34,8 @@
# define ITT_OS ITT_OS_MAC
# elif defined( __FreeBSD__ )
# define ITT_OS ITT_OS_FREEBSD
# elif defined( __OpenBSD__ )
# define ITT_OS ITT_OS_OPENBSD
# else
# define ITT_OS ITT_OS_LINUX
# endif
@ -103,6 +57,10 @@
# define ITT_PLATFORM_FREEBSD 4
#endif /* ITT_PLATFORM_FREEBSD */
#ifndef ITT_PLATFORM_OPENBSD
# define ITT_PLATFORM_OPENBSD 5
#endif /* ITT_PLATFORM_OPENBSD */
#ifndef ITT_PLATFORM
# if ITT_OS==ITT_OS_WIN
# define ITT_PLATFORM ITT_PLATFORM_WIN
@ -110,6 +68,8 @@
# define ITT_PLATFORM ITT_PLATFORM_MAC
# elif ITT_OS==ITT_OS_FREEBSD
# define ITT_PLATFORM ITT_PLATFORM_FREEBSD
# elif ITT_OS==ITT_OS_OPENBSD
# define ITT_PLATFORM ITT_PLATFORM_OPENBSD
# else
# define ITT_PLATFORM ITT_PLATFORM_POSIX
# endif
@ -162,7 +122,12 @@
#if ITT_PLATFORM==ITT_PLATFORM_WIN
/* use __forceinline (VC++ specific) */
#define ITT_INLINE __forceinline
#if defined(__MINGW32__) && !defined(__cplusplus)
#define ITT_INLINE static __inline__ __attribute__((__always_inline__,__gnu_inline__))
#else
#define ITT_INLINE static __forceinline
#endif /* __MINGW32__ */
#define ITT_INLINE_ATTRIBUTE /* nothing */
#else /* ITT_PLATFORM==ITT_PLATFORM_WIN */
/*
@ -188,6 +153,10 @@
# define ITT_ARCH_IA32E 2
#endif /* ITT_ARCH_IA32E */
#ifndef ITT_ARCH_IA64
# define ITT_ARCH_IA64 3
#endif /* ITT_ARCH_IA64 */
#ifndef ITT_ARCH_ARM
# define ITT_ARCH_ARM 4
#endif /* ITT_ARCH_ARM */
@ -196,9 +165,9 @@
# define ITT_ARCH_PPC64 5
#endif /* ITT_ARCH_PPC64 */
#ifndef ITT_ARCH_AARCH64 /* 64-bit ARM */
# define ITT_ARCH_AARCH64 6
#endif /* ITT_ARCH_AARCH64 */
#ifndef ITT_ARCH_ARM64
# define ITT_ARCH_ARM64 6
#endif /* ITT_ARCH_ARM64 */
#ifndef ITT_ARCH
# if defined _M_IX86 || defined __i386__
@ -210,7 +179,7 @@
# elif defined _M_ARM || defined __arm__
# define ITT_ARCH ITT_ARCH_ARM
# elif defined __aarch64__
# define ITT_ARCH ITT_ARCH_AARCH64
# define ITT_ARCH ITT_ARCH_ARM64
# elif defined __powerpc64__
# define ITT_ARCH ITT_ARCH_PPC64
# endif
@ -239,10 +208,10 @@
#define ITT_MAGIC { 0xED, 0xAB, 0xAB, 0xEC, 0x0D, 0xEE, 0xDA, 0x30 }
/* Replace with snapshot date YYYYMMDD for promotion build. */
#define API_VERSION_BUILD 20151119
#define API_VERSION_BUILD 20250113
#ifndef API_VERSION_NUM
#define API_VERSION_NUM 0.0.0
#define API_VERSION_NUM 3.25.4
#endif /* API_VERSION_NUM */
#define API_VERSION "ITT-API-Version " ITT_TO_STR(API_VERSION_NUM) \
@ -254,7 +223,11 @@
typedef HMODULE lib_t;
typedef DWORD TIDT;
typedef CRITICAL_SECTION mutex_t;
#ifdef __cplusplus
#define MUTEX_INITIALIZER {}
#else
#define MUTEX_INITIALIZER { 0 }
#endif
#define strong_alias(name, aliasname) /* empty for Windows */
#else /* ITT_PLATFORM==ITT_PLATFORM_WIN */
#include <dlfcn.h>
@ -282,13 +255,13 @@ typedef pthread_mutex_t mutex_t;
#define __itt_mutex_init(mutex) InitializeCriticalSection(mutex)
#define __itt_mutex_lock(mutex) EnterCriticalSection(mutex)
#define __itt_mutex_unlock(mutex) LeaveCriticalSection(mutex)
#define __itt_mutex_destroy(mutex) DeleteCriticalSection(mutex)
#define __itt_load_lib(name) LoadLibraryA(name)
#define __itt_unload_lib(handle) FreeLibrary(handle)
#define __itt_system_error() (int)GetLastError()
#define __itt_fstrcmp(s1, s2) lstrcmpA(s1, s2)
#define __itt_fstrnlen(s, l) strnlen_s(s, l)
#define __itt_fstrcpyn(s1, b, s2, l) strncpy_s(s1, b, s2, l)
#define __itt_fstrdup(s) _strdup(s)
#define __itt_thread_id() GetCurrentThreadId()
#define __itt_thread_yield() SwitchToThread()
#ifndef ITT_SIMPLE_INIT
@ -298,6 +271,13 @@ ITT_INLINE long __itt_interlocked_increment(volatile long* ptr)
{
return InterlockedIncrement(ptr);
}
ITT_INLINE long
__itt_interlocked_compare_exchange(volatile long* ptr, long exchange, long comperand) ITT_INLINE_ATTRIBUTE;
ITT_INLINE long
__itt_interlocked_compare_exchange(volatile long* ptr, long exchange, long comperand)
{
return InterlockedCompareExchange(ptr, exchange, comperand);
}
#endif /* ITT_SIMPLE_INIT */
#define DL_SYMBOLS (1)
@ -327,6 +307,7 @@ ITT_INLINE long __itt_interlocked_increment(volatile long* ptr)
}
#define __itt_mutex_lock(mutex) pthread_mutex_lock(mutex)
#define __itt_mutex_unlock(mutex) pthread_mutex_unlock(mutex)
#define __itt_mutex_destroy(mutex) pthread_mutex_destroy(mutex)
#define __itt_load_lib(name) dlopen(name, RTLD_LAZY)
#define __itt_unload_lib(handle) dlclose(handle)
#define __itt_system_error() errno
@ -341,10 +322,18 @@ ITT_INLINE long __itt_interlocked_increment(volatile long* ptr)
#ifdef SDL_STRNCPY_S
#define __itt_fstrcpyn(s1, b, s2, l) SDL_STRNCPY_S(s1, b, s2, l)
#else
#define __itt_fstrcpyn(s1, b, s2, l) strncpy(s1, s2, b)
#define __itt_fstrcpyn(s1, b, s2, l) { \
if (b > 0) { \
/* 'volatile' is used to suppress the warning that a destination */ \
/* bound depends on the length of the source. */ \
volatile size_t num_to_copy = (size_t)(b - 1) < (size_t)(l) ? \
(size_t)(b - 1) : (size_t)(l); \
strncpy(s1, s2, num_to_copy); \
s1[num_to_copy] = 0; \
} \
}
#endif /* SDL_STRNCPY_S */
#define __itt_fstrdup(s) strdup(s)
#define __itt_thread_id() pthread_self()
#define __itt_thread_yield() sched_yield()
#if ITT_ARCH==ITT_ARCH_IA64
@ -360,12 +349,12 @@ ITT_INLINE long __TBB_machine_fetchadd4(volatile void* ptr, long addend)
{
long result;
__asm__ __volatile__("lock\nxadd %0,%1"
: "=r"(result),"=m"(*(int*)ptr)
: "0"(addend), "m"(*(int*)ptr)
: "=r"(result),"=m"(*(volatile int*)ptr)
: "0"(addend), "m"(*(volatile int*)ptr)
: "memory");
return result;
}
#elif ITT_ARCH==ITT_ARCH_ARM || ITT_ARCH==ITT_ARCH_AARCH64 || ITT_ARCH==ITT_ARCH_PPC64
#else
#define __TBB_machine_fetchadd4(addr, val) __sync_fetch_and_add(addr, val)
#endif /* ITT_ARCH==ITT_ARCH_IA64 */
#ifndef ITT_SIMPLE_INIT
@ -375,6 +364,13 @@ ITT_INLINE long __itt_interlocked_increment(volatile long* ptr)
{
return __TBB_machine_fetchadd4(ptr, 1) + 1L;
}
ITT_INLINE long
__itt_interlocked_compare_exchange(volatile long* ptr, long exchange, long comperand) ITT_INLINE_ATTRIBUTE;
ITT_INLINE long
__itt_interlocked_compare_exchange(volatile long* ptr, long exchange, long comperand)
{
return __sync_val_compare_and_swap(ptr, exchange, comperand);
}
#endif /* ITT_SIMPLE_INIT */
void* dlopen(const char*, int) __attribute__((weak));
@ -394,10 +390,20 @@ pthread_t pthread_self(void) __attribute__((weak));
#endif /* ITT_PLATFORM==ITT_PLATFORM_WIN */
typedef enum {
__itt_collection_normal = 0,
__itt_collection_paused = 1
} __itt_collection_state;
/* strdup() is not included into C99 which results in a compiler warning about
* implicitly declared symbol. To avoid the issue strdup is implemented
* manually.
*/
#define ITT_STRDUP_MAX_STRING_SIZE 4096
#define __itt_fstrdup(s, new_s) do { \
if (s != NULL) { \
size_t s_len = __itt_fstrnlen(s, ITT_STRDUP_MAX_STRING_SIZE); \
new_s = (char *)malloc(s_len + 1); \
if (new_s != NULL) { \
__itt_fstrcpyn(new_s, s_len + 1, s, s_len); \
} \
} \
} while(0)
typedef enum {
__itt_thread_normal = 0,
@ -463,6 +469,10 @@ typedef struct __itt_counter_info
struct ___itt_domain;
struct ___itt_string_handle;
struct ___itt_histogram;
struct ___itt_counter_metadata;
#include "ittnotify.h"
typedef struct ___itt_global
{
@ -484,7 +494,10 @@ typedef struct ___itt_global
struct ___itt_domain* domain_list;
struct ___itt_string_handle* string_list;
__itt_collection_state state;
__itt_counter_info_t* counter_list;
__itt_counter_info_t* counter_list;
unsigned int ipt_collect_events;
struct ___itt_histogram* histogram_list;
struct ___itt_counter_metadata* counter_metadata_list;
} __itt_global;
#pragma pack(pop)
@ -510,7 +523,9 @@ typedef struct ___itt_global
h = (__itt_thread_info*)malloc(sizeof(__itt_thread_info)); \
if (h != NULL) { \
h->tid = t; \
h->nameA = n ? __itt_fstrdup(n) : NULL; \
char *n_copy = NULL; \
__itt_fstrdup(n, n_copy); \
h->nameA = n_copy; \
h->nameW = NULL; \
h->state = s; \
h->extra1 = 0; /* reserved */ \
@ -543,7 +558,9 @@ typedef struct ___itt_global
h = (__itt_domain*)malloc(sizeof(__itt_domain)); \
if (h != NULL) { \
h->flags = 1; /* domain is enabled by default */ \
h->nameA = name ? __itt_fstrdup(name) : NULL; \
char *name_copy = NULL; \
__itt_fstrdup(name, name_copy); \
h->nameA = name_copy; \
h->nameW = NULL; \
h->extra1 = 0; /* reserved */ \
h->extra2 = NULL; /* reserved */ \
@ -573,7 +590,9 @@ typedef struct ___itt_global
#define NEW_STRING_HANDLE_A(gptr,h,h_tail,name) { \
h = (__itt_string_handle*)malloc(sizeof(__itt_string_handle)); \
if (h != NULL) { \
h->strA = name ? __itt_fstrdup(name) : NULL; \
char *name_copy = NULL; \
__itt_fstrdup(name, name_copy); \
h->strA = name_copy; \
h->strW = NULL; \
h->extra1 = 0; /* reserved */ \
h->extra2 = NULL; /* reserved */ \
@ -591,7 +610,7 @@ typedef struct ___itt_global
h->nameA = NULL; \
h->nameW = name ? _wcsdup(name) : NULL; \
h->domainA = NULL; \
h->domainW = name ? _wcsdup(domain) : NULL; \
h->domainW = domain ? _wcsdup(domain) : NULL; \
h->type = type; \
h->index = 0; \
h->next = NULL; \
@ -605,9 +624,13 @@ typedef struct ___itt_global
#define NEW_COUNTER_A(gptr,h,h_tail,name,domain,type) { \
h = (__itt_counter_info_t*)malloc(sizeof(__itt_counter_info_t)); \
if (h != NULL) { \
h->nameA = name ? __itt_fstrdup(name) : NULL; \
char *name_copy = NULL; \
__itt_fstrdup(name, name_copy); \
h->nameA = name_copy; \
h->nameW = NULL; \
h->domainA = domain ? __itt_fstrdup(domain) : NULL; \
char *domain_copy = NULL; \
__itt_fstrdup(domain, domain_copy); \
h->domainA = domain_copy; \
h->domainW = NULL; \
h->type = type; \
h->index = 0; \
@ -619,4 +642,98 @@ typedef struct ___itt_global
} \
}
#define NEW_HISTOGRAM_W(gptr,h,h_tail,domain,name,x_type,y_type) { \
h = (__itt_histogram*)malloc(sizeof(__itt_histogram)); \
if (h != NULL) { \
h->domain = domain; \
h->nameA = NULL; \
h->nameW = name ? _wcsdup(name) : NULL; \
h->x_type = x_type; \
h->y_type = y_type; \
h->extra1 = 0; \
h->extra2 = NULL; \
h->next = NULL; \
if (h_tail == NULL) \
(gptr)->histogram_list = h; \
else \
h_tail->next = h; \
} \
}
#define NEW_HISTOGRAM_A(gptr,h,h_tail,domain,name,x_type,y_type) { \
h = (__itt_histogram*)malloc(sizeof(__itt_histogram)); \
if (h != NULL) { \
h->domain = domain; \
char *name_copy = NULL; \
__itt_fstrdup(name, name_copy); \
h->nameA = name_copy; \
h->nameW = NULL; \
h->x_type = x_type; \
h->y_type = y_type; \
h->extra1 = 0; \
h->extra2 = NULL; \
h->next = NULL; \
if (h_tail == NULL) \
(gptr)->histogram_list = h; \
else \
h_tail->next = h; \
} \
}
#define NEW_COUNTER_METADATA_NUM(gptr,h,h_tail,counter,type,value) { \
h = (__itt_counter_metadata*)malloc(sizeof(__itt_counter_metadata)); \
if (h != NULL) { \
h->counter = counter; \
h->type = type; \
h->str_valueA = NULL; \
h->str_valueW = NULL; \
h->value = value; \
h->extra1 = 0; \
h->extra2 = NULL; \
h->next = NULL; \
if (h_tail == NULL) \
(gptr)->counter_metadata_list = h; \
else \
h_tail->next = h; \
} \
}
#define NEW_COUNTER_METADATA_STR_A(gptr,h,h_tail,counter,type,str_valueA) { \
h = (__itt_counter_metadata*)malloc(sizeof(__itt_counter_metadata)); \
if (h != NULL) { \
h->counter = counter; \
h->type = type; \
char *str_value_copy = NULL; \
__itt_fstrdup(str_valueA, str_value_copy); \
h->str_valueA = str_value_copy; \
h->str_valueW = NULL; \
h->value = 0; \
h->extra1 = 0; \
h->extra2 = NULL; \
h->next = NULL; \
if (h_tail == NULL) \
(gptr)->counter_metadata_list = h; \
else \
h_tail->next = h; \
} \
}
#define NEW_COUNTER_METADATA_STR_W(gptr,h,h_tail,counter,type,str_valueW) { \
h = (__itt_counter_metadata*)malloc(sizeof(__itt_counter_metadata)); \
if (h != NULL) { \
h->counter = counter; \
h->type = type; \
h->str_valueA = NULL; \
h->str_valueW = str_valueW ? _wcsdup(str_valueW) : NULL; \
h->value = 0; \
h->extra1 = 0; \
h->extra2 = NULL; \
h->next = NULL; \
if (h_tail == NULL) \
(gptr)->counter_metadata_list = h; \
else \
h_tail->next = h; \
} \
}
#endif /* _ITTNOTIFY_CONFIG_H_ */

File diff suppressed because it is too large Load Diff

View File

@ -1,60 +1,8 @@
/* <copyright>
This file is provided under a dual BSD/GPLv2 license. When using or
redistributing this file, you may do so under either license.
/*
Copyright (C) 2005-2019 Intel Corporation
GPL LICENSE SUMMARY
Copyright (c) 2005-2014 Intel Corporation. All rights reserved.
This program is free software; you can redistribute it and/or modify
it under the terms of version 2 of the GNU General Public License as
published by the Free Software Foundation.
This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
The full GNU General Public License is included in this distribution
in the file called LICENSE.GPL.
Contact Information:
http://software.intel.com/en-us/articles/intel-vtune-amplifier-xe/
BSD LICENSE
Copyright (c) 2005-2014 Intel Corporation. All rights reserved.
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
* Neither the name of Intel Corporation nor the names of its
contributors may be used to endorse or promote products derived
from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
</copyright> */
SPDX-License-Identifier: GPL-2.0-only OR BSD-3-Clause
*/
#include "ittnotify_config.h"
@ -81,6 +29,9 @@ ITT_STUB(ITTAPI, __itt_domain*, domain_createW, (const wchar_t *name), (ITT_FORM
ITT_STUB(ITTAPI, __itt_domain*, domain_create, (const char *name), (ITT_FORMAT name), domain_create, __itt_group_structure, "\"%s\"")
#endif /* ITT_PLATFORM==ITT_PLATFORM_WIN */
ITT_STUBV(ITTAPI, void, module_load_with_sections, (__itt_module_object* module_obj), (ITT_FORMAT module_obj), module_load_with_sections, __itt_group_module, "%p")
ITT_STUBV(ITTAPI, void, module_unload_with_sections, (__itt_module_object* module_obj), (ITT_FORMAT module_obj), module_unload_with_sections, __itt_group_module, "%p")
#if ITT_PLATFORM==ITT_PLATFORM_WIN
ITT_STUB(ITTAPI, __itt_string_handle*, string_handle_createA, (const char *name), (ITT_FORMAT name), string_handle_createA, __itt_group_structure, "\"%s\"")
ITT_STUB(ITTAPI, __itt_string_handle*, string_handle_createW, (const wchar_t *name), (ITT_FORMAT name), string_handle_createW, __itt_group_structure, "\"%S\"")
@ -105,6 +56,8 @@ ITT_STUB(ITTAPI, __itt_counter, counter_create_typed, (const char *name, con
ITT_STUBV(ITTAPI, void, pause, (void), (ITT_NO_PARAMS), pause, __itt_group_control | __itt_group_legacy, "no args")
ITT_STUBV(ITTAPI, void, resume, (void), (ITT_NO_PARAMS), resume, __itt_group_control | __itt_group_legacy, "no args")
ITT_STUBV(ITTAPI, void, pause_scoped, (__itt_collection_scope scope), (ITT_FORMAT scope), pause_scoped, __itt_group_control, "%d")
ITT_STUBV(ITTAPI, void, resume_scoped, (__itt_collection_scope scope), (ITT_FORMAT scope), resume_scoped, __itt_group_control, "%d")
#if ITT_PLATFORM==ITT_PLATFORM_WIN
ITT_STUBV(ITTAPI, void, thread_set_nameA, (const char *name), (ITT_FORMAT name), thread_set_nameA, __itt_group_thread, "\"%s\"")
@ -121,6 +74,23 @@ ITT_STUB(LIBITTAPI, int, thr_name_setW, (const wchar_t *name, int namelen), (IT
ITT_STUB(LIBITTAPI, int, thr_name_set, (const char *name, int namelen), (ITT_FORMAT name, namelen), thr_name_set, __itt_group_thread | __itt_group_legacy, "\"%s\", %d")
#endif /* ITT_PLATFORM==ITT_PLATFORM_WIN */
ITT_STUBV(LIBITTAPI, void, thr_ignore, (void), (ITT_NO_PARAMS), thr_ignore, __itt_group_thread | __itt_group_legacy, "no args")
#if ITT_PLATFORM==ITT_PLATFORM_WIN
ITT_STUB(ITTAPI, __itt_histogram*, histogram_createA, (const __itt_domain* domain, const char* name, __itt_metadata_type x_type, __itt_metadata_type y_type), (ITT_FORMAT domain, name, x_type, y_type), histogram_createA, __itt_group_structure, "%p, \"%s\", %d, %d")
ITT_STUB(ITTAPI, __itt_histogram*, histogram_createW, (const __itt_domain* domain, const wchar_t* name, __itt_metadata_type x_type, __itt_metadata_type y_type), (ITT_FORMAT domain, name, x_type, y_type), histogram_createW, __itt_group_structure, "%p, \"%s\", %d, %d")
#else /* ITT_PLATFORM==ITT_PLATFORM_WIN */
ITT_STUB(ITTAPI, __itt_histogram*, histogram_create, (const __itt_domain* domain, const char* name, __itt_metadata_type x_type, __itt_metadata_type y_type), (ITT_FORMAT domain, name, x_type, y_type), histogram_create, __itt_group_structure, "%p, \"%s\", %d, %d")
#endif /* ITT_PLATFORM==ITT_PLATFORM_WIN */
#if ITT_PLATFORM==ITT_PLATFORM_WIN
ITT_STUB(ITTAPI, __itt_counter, counter_createA_v3, (const __itt_domain* domain, const char *name, __itt_metadata_type type), (ITT_FORMAT domain, name, type), counter_createA_v3, __itt_group_counter, "%p, \"%s\", %d")
ITT_STUB(ITTAPI, __itt_counter, counter_createW_v3, (const __itt_domain* domain, const wchar_t *name, __itt_metadata_type type), (ITT_FORMAT domain, name, type), counter_createW_v3, __itt_group_counter, "%p, \"%s\", %d")
#else /* ITT_PLATFORM==ITT_PLATFORM_WIN */
ITT_STUB(ITTAPI, __itt_counter, counter_create_v3, (const __itt_domain* domain, const char *name, __itt_metadata_type type), (ITT_FORMAT domain, name, type), counter_create_v3, __itt_group_counter, "%p, \"%s\", %d")
#endif /* ITT_PLATFORM==ITT_PLATFORM_WIN */
ITT_STUBV(ITTAPI, void, bind_context_metadata_to_counter, (__itt_counter counter, size_t length, __itt_context_metadata* metadata), (ITT_FORMAT counter, length, metadata), bind_context_metadata_to_counter, __itt_group_structure, "%p, %lu, %p")
#endif /* __ITT_INTERNAL_BODY */
ITT_STUBV(ITTAPI, void, enable_attach, (void), (ITT_NO_PARAMS), enable_attach, __itt_group_all, "no args")
@ -296,6 +266,13 @@ ITT_STUB(ITTAPI, __itt_frame, frame_createW, (const wchar_t *domain), (ITT_FORMA
#else /* ITT_PLATFORM==ITT_PLATFORM_WIN */
ITT_STUB(ITTAPI, __itt_frame, frame_create, (const char *domain), (ITT_FORMAT domain), frame_create, __itt_group_frame, "\"%s\"")
#endif /* ITT_PLATFORM==ITT_PLATFORM_WIN */
#if ITT_PLATFORM==ITT_PLATFORM_WIN
ITT_STUB(ITTAPI, __itt_pt_region, pt_region_createA, (const char *name), (ITT_FORMAT name), pt_region_createA, __itt_group_structure, "\"%s\"")
ITT_STUB(ITTAPI, __itt_pt_region, pt_region_createW, (const wchar_t *name), (ITT_FORMAT name), pt_region_createW, __itt_group_structure, "\"%S\"")
#else /* ITT_PLATFORM!=ITT_PLATFORM_WIN */
ITT_STUB(ITTAPI, __itt_pt_region, pt_region_create, (const char *name), (ITT_FORMAT name), pt_region_create, __itt_group_structure, "\"%s\"")
#endif /* ITT_PLATFORM==ITT_PLATFORM_WIN */
#endif /* __ITT_INTERNAL_BODY */
ITT_STUBV(ITTAPI, void, frame_begin, (__itt_frame frame), (ITT_FORMAT frame), frame_begin, __itt_group_frame, "%p")
ITT_STUBV(ITTAPI, void, frame_end, (__itt_frame frame), (ITT_FORMAT frame), frame_end, __itt_group_frame, "%p")
@ -376,14 +353,16 @@ ITT_STUB(ITTAPI, int, av_save, (void *data, int rank, const int *dimensions, in
#endif /* ITT_PLATFORM==ITT_PLATFORM_WIN */
#endif /* __ITT_INTERNAL_BODY */
#ifndef __ITT_INTERNAL_BODY
#if ITT_PLATFORM==ITT_PLATFORM_WIN
ITT_STUBV(ITTAPI, void, module_loadA, (void *start_addr, void* end_addr, const char *path), (ITT_FORMAT start_addr, end_addr, path), module_loadA, __itt_group_none, "%p, %p, %p")
ITT_STUBV(ITTAPI, void, module_loadW, (void *start_addr, void* end_addr, const wchar_t *path), (ITT_FORMAT start_addr, end_addr, path), module_loadW, __itt_group_none, "%p, %p, %p")
ITT_STUBV(ITTAPI, void, module_loadA, (void *start_addr, void* end_addr, const char *path), (ITT_FORMAT start_addr, end_addr, path), module_loadA, __itt_group_module, "%p, %p, %p")
ITT_STUBV(ITTAPI, void, module_loadW, (void *start_addr, void* end_addr, const wchar_t *path), (ITT_FORMAT start_addr, end_addr, path), module_loadW, __itt_group_module, "%p, %p, %p")
#else /* ITT_PLATFORM!=ITT_PLATFORM_WIN */
ITT_STUBV(ITTAPI, void, module_load, (void *start_addr, void *end_addr, const char *path), (ITT_FORMAT start_addr, end_addr, path), module_load, __itt_group_none, "%p, %p, %p")
ITT_STUBV(ITTAPI, void, module_load, (void *start_addr, void *end_addr, const char *path), (ITT_FORMAT start_addr, end_addr, path), module_load, __itt_group_module, "%p, %p, %p")
#endif /* ITT_PLATFORM==ITT_PLATFORM_WIN */
#endif /* __ITT_INTERNAL_BODY */
ITT_STUBV(ITTAPI, void, module_unload, (void *start_addr), (ITT_FORMAT start_addr), module_unload, __itt_group_module, "%p")
ITT_STUBV(ITTAPI, void, histogram_submit, (__itt_histogram* hist, size_t length, void* x_data, void* y_data), (ITT_FORMAT hist, length, x_data, y_data), histogram_submit, __itt_group_structure, "%p, %lu, %p, %p")
ITT_STUBV(ITTAPI, void, counter_set_value_v3, (__itt_counter counter, void *value_ptr), (ITT_FORMAT counter, value_ptr), counter_set_value_v3, __itt_group_counter, "%p, %p")
#endif /* __ITT_INTERNAL_INIT */

View File

@ -1,85 +1,34 @@
/* <copyright>
This file is provided under a dual BSD/GPLv2 license. When using or
redistributing this file, you may do so under either license.
/*
Copyright (C) 2005-2019 Intel Corporation
GPL LICENSE SUMMARY
Copyright (c) 2005-2014 Intel Corporation. All rights reserved.
This program is free software; you can redistribute it and/or modify
it under the terms of version 2 of the GNU General Public License as
published by the Free Software Foundation.
This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
The full GNU General Public License is included in this distribution
in the file called LICENSE.GPL.
Contact Information:
http://software.intel.com/en-us/articles/intel-vtune-amplifier-xe/
BSD LICENSE
Copyright (c) 2005-2014 Intel Corporation. All rights reserved.
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
* Neither the name of Intel Corporation nor the names of its
contributors may be used to endorse or promote products derived
from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
</copyright> */
SPDX-License-Identifier: GPL-2.0-only OR BSD-3-Clause
*/
#ifndef _ITTNOTIFY_TYPES_H_
#define _ITTNOTIFY_TYPES_H_
typedef enum ___itt_group_id
{
__itt_group_none = 0,
__itt_group_legacy = 1<<0,
__itt_group_control = 1<<1,
__itt_group_thread = 1<<2,
__itt_group_mark = 1<<3,
__itt_group_sync = 1<<4,
__itt_group_fsync = 1<<5,
__itt_group_jit = 1<<6,
__itt_group_model = 1<<7,
__itt_group_splitter_min = 1<<7,
__itt_group_counter = 1<<8,
__itt_group_frame = 1<<9,
__itt_group_stitch = 1<<10,
__itt_group_heap = 1<<11,
__itt_group_splitter_max = 1<<12,
__itt_group_structure = 1<<12,
__itt_group_suppress = 1<<13,
__itt_group_arrays = 1<<14,
__itt_group_all = -1
__itt_group_none = 0,
__itt_group_legacy = 1<<0,
__itt_group_control = 1<<1,
__itt_group_thread = 1<<2,
__itt_group_mark = 1<<3,
__itt_group_sync = 1<<4,
__itt_group_fsync = 1<<5,
__itt_group_jit = 1<<6,
__itt_group_model = 1<<7,
__itt_group_splitter_min = 1<<7,
__itt_group_counter = 1<<8,
__itt_group_frame = 1<<9,
__itt_group_stitch = 1<<10,
__itt_group_heap = 1<<11,
__itt_group_splitter_max = 1<<12,
__itt_group_structure = 1<<12,
__itt_group_suppress = 1<<13,
__itt_group_arrays = 1<<14,
__itt_group_module = 1<<15,
__itt_group_all = -1
} __itt_group_id;
#pragma pack(push, 8)
@ -109,6 +58,7 @@ typedef struct ___itt_group_list
{ __itt_group_structure, "structure" }, \
{ __itt_group_suppress, "suppress" }, \
{ __itt_group_arrays, "arrays" }, \
{ __itt_group_module, "module" }, \
{ __itt_group_none, NULL } \
}

View File

@ -1,76 +1,24 @@
/* <copyright>
This file is provided under a dual BSD/GPLv2 license. When using or
redistributing this file, you may do so under either license.
/*
Copyright (C) 2005-2019 Intel Corporation
GPL LICENSE SUMMARY
Copyright (c) 2005-2014 Intel Corporation. All rights reserved.
This program is free software; you can redistribute it and/or modify
it under the terms of version 2 of the GNU General Public License as
published by the Free Software Foundation.
This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
The full GNU General Public License is included in this distribution
in the file called LICENSE.GPL.
Contact Information:
http://software.intel.com/en-us/articles/intel-vtune-amplifier-xe/
BSD LICENSE
Copyright (c) 2005-2014 Intel Corporation. All rights reserved.
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
* Neither the name of Intel Corporation nor the names of its
contributors may be used to endorse or promote products derived
from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
</copyright> */
SPDX-License-Identifier: GPL-2.0-only OR BSD-3-Clause
*/
#include "ittnotify_config.h"
#if ITT_PLATFORM==ITT_PLATFORM_WIN
#include <windows.h>
#include <string.h>
#include <ctype.h>
#endif /* ITT_PLATFORM==ITT_PLATFORM_WIN */
#if ITT_PLATFORM != ITT_PLATFORM_MAC && ITT_PLATFORM != ITT_PLATFORM_FREEBSD
#if ITT_PLATFORM != ITT_PLATFORM_MAC && ITT_PLATFORM != ITT_PLATFORM_FREEBSD && ITT_PLATFORM != ITT_PLATFORM_OPENBSD
#include <malloc.h>
#endif
#include <stdlib.h>
#include "jitprofiling.h"
static const char rcsid[] = "\n@(#) $Revision: 471937 $\n";
#define DLL_ENVIRONMENT_VAR "VS_PROFILER"
static const char rcsid[] = "\n@(#) $Revision$\n";
#ifndef NEW_DLL_ENVIRONMENT_VAR
#if ITT_ARCH==ITT_ARCH_IA32
@ -81,13 +29,10 @@ static const char rcsid[] = "\n@(#) $Revision: 471937 $\n";
#endif /* NEW_DLL_ENVIRONMENT_VAR */
#if ITT_PLATFORM==ITT_PLATFORM_WIN
#define DEFAULT_DLLNAME "JitPI.dll"
HINSTANCE m_libHandle = NULL;
#elif ITT_PLATFORM==ITT_PLATFORM_MAC
#define DEFAULT_DLLNAME "libJitPI.dylib"
void* m_libHandle = NULL;
#else
#define DEFAULT_DLLNAME "libJitPI.so"
void* m_libHandle = NULL;
#endif /* ITT_PLATFORM==ITT_PLATFORM_WIN */
@ -169,6 +114,38 @@ ITT_EXTERN_C iJIT_IsProfilingActiveFlags JITAPI iJIT_IsProfilingActive()
return executionMode;
}
#if ITT_PLATFORM == ITT_PLATFORM_WIN
static int isValidAbsolutePath(char *path, size_t maxPathLength)
{
if (path == NULL)
{
return 0;
}
size_t pathLength = strnlen(path, maxPathLength);
if (pathLength == maxPathLength)
{
/* The strnlen() function returns maxPathLength if there is no null terminating
* among the first maxPathLength characters in the string pointed to by path.
*/
return 0;
}
if (pathLength > 2)
{
if (isalpha(path[0]) && path[1] == ':' && path[2] == '\\')
{
return 1;
}
else if (path[0] == '\\' && path[1] == '\\')
{
return 1;
}
}
return 0;
}
#endif
/* This function loads the collector dll and the relevant functions.
* on success: all functions load, iJIT_DLL_is_missing = 0, return value = 1
* on failure: all functions are NULL, iJIT_DLL_is_missing = 1, return value = 0
@ -212,7 +189,7 @@ static int loadiJIT_Funcs()
{
envret = GetEnvironmentVariableA(NEW_DLL_ENVIRONMENT_VAR,
dllName, dNameLength);
if (envret)
if (envret && isValidAbsolutePath(dllName, dNameLength))
{
/* Try to load the dll from the PATH... */
m_libHandle = LoadLibraryExA(dllName,
@ -220,30 +197,9 @@ static int loadiJIT_Funcs()
}
free(dllName);
}
} else {
/* Try to use old VS_PROFILER variable */
dNameLength = GetEnvironmentVariableA(DLL_ENVIRONMENT_VAR, NULL, 0);
if (dNameLength)
{
DWORD envret = 0;
dllName = (char*)malloc(sizeof(char) * (dNameLength + 1));
if(dllName != NULL)
{
envret = GetEnvironmentVariableA(DLL_ENVIRONMENT_VAR,
dllName, dNameLength);
if (envret)
{
/* Try to load the dll from the PATH... */
m_libHandle = LoadLibraryA(dllName);
}
free(dllName);
}
}
}
#else /* ITT_PLATFORM==ITT_PLATFORM_WIN */
dllName = getenv(NEW_DLL_ENVIRONMENT_VAR);
if (!dllName)
dllName = getenv(DLL_ENVIRONMENT_VAR);
#if defined(__ANDROID__) || defined(ANDROID)
if (!dllName)
dllName = ANDROID_JIT_AGENT_PATH;
@ -251,19 +207,13 @@ static int loadiJIT_Funcs()
if (dllName)
{
/* Try to load the dll from the PATH... */
m_libHandle = dlopen(dllName, RTLD_LAZY);
if (DL_SYMBOLS)
{
m_libHandle = dlopen(dllName, RTLD_LAZY);
}
}
#endif /* ITT_PLATFORM==ITT_PLATFORM_WIN */
if (!m_libHandle)
{
#if ITT_PLATFORM==ITT_PLATFORM_WIN
m_libHandle = LoadLibraryA(DEFAULT_DLLNAME);
#else /* ITT_PLATFORM==ITT_PLATFORM_WIN */
m_libHandle = dlopen(DEFAULT_DLLNAME, RTLD_LAZY);
#endif /* ITT_PLATFORM==ITT_PLATFORM_WIN */
}
/* if the dll wasn't loaded - exit. */
if (!m_libHandle)
{

View File

@ -1,23 +1,11 @@
project(kleidicv_hal)
set(KLEIDICV_SOURCE_PATH "" CACHE PATH "Directory containing KleidiCV sources")
ocv_update(KLEIDICV_SRC_COMMIT "0.1.0")
ocv_update(KLEIDICV_SRC_HASH "9388f28cf2fbe3338197b2b57d491468")
if(KLEIDICV_SOURCE_PATH)
set(THE_ROOT "${KLEIDICV_SOURCE_PATH}")
else()
ocv_download(FILENAME "kleidicv-${KLEIDICV_SRC_COMMIT}.tar.gz"
HASH ${KLEIDICV_SRC_HASH}
URL
"${OPENCV_KLEIDICV_URL}"
"$ENV{OPENCV_KLEIDICV_URL}"
"https://gitlab.arm.com/kleidi/kleidicv/-/archive/${KLEIDICV_SRC_COMMIT}/"
DESTINATION_DIR "${OpenCV_BINARY_DIR}/3rdparty/kleidicv/"
ID KLEIDICV
STATUS res
UNPACK RELATIVE_URL)
set(THE_ROOT "${OpenCV_BINARY_DIR}/3rdparty/kleidicv/kleidicv-${KLEIDICV_SRC_COMMIT}")
if(HAVE_KLEIDICV)
option(KLEIDICV_ENABLE_SME2 "" OFF) # not compatible with some CLang versions in NDK
include("${KLEIDICV_SOURCE_PATH}/adapters/opencv/CMakeLists.txt")
# HACK to suppress adapters/opencv/kleidicv_hal.cpp:343:12: warning: unused function 'from_opencv' [-Wunused-function]
target_compile_options( kleidicv_hal PRIVATE
$<TARGET_PROPERTY:kleidicv,COMPILE_OPTIONS>
"-Wno-old-style-cast" "-Wno-unused-function"
)
endif()
include("${THE_ROOT}/adapters/opencv/CMakeLists.txt")

21
3rdparty/kleidicv/kleidicv.cmake vendored Normal file
View File

@ -0,0 +1,21 @@
function(download_kleidicv root_var)
set(${root_var} "" PARENT_SCOPE)
ocv_update(KLEIDICV_SRC_COMMIT "0.3.0")
ocv_update(KLEIDICV_SRC_HASH "51a77b0185c2bac2a968a2163869b1ed")
set(THE_ROOT "${OpenCV_BINARY_DIR}/3rdparty/kleidicv")
ocv_download(FILENAME "kleidicv-${KLEIDICV_SRC_COMMIT}.tar.gz"
HASH ${KLEIDICV_SRC_HASH}
URL
"${OPENCV_KLEIDICV_URL}"
"$ENV{OPENCV_KLEIDICV_URL}"
"https://gitlab.arm.com/kleidi/kleidicv/-/archive/${KLEIDICV_SRC_COMMIT}/"
DESTINATION_DIR ${THE_ROOT}
ID KLEIDICV
STATUS res
UNPACK RELATIVE_URL)
if(res)
set(${root_var} "${OpenCV_BINARY_DIR}/3rdparty/kleidicv/kleidicv-${KLEIDICV_SRC_COMMIT}" PARENT_SCOPE)
endif()
endfunction()

View File

@ -5,6 +5,8 @@
#ifndef OPENCV_NDSRVP_IMGPROC_HPP
#define OPENCV_NDSRVP_IMGPROC_HPP
struct cvhalFilter2D;
namespace cv {
namespace ndsrvp {
@ -71,6 +73,52 @@ int threshold(const uchar* src_data, size_t src_step,
#undef cv_hal_threshold
#define cv_hal_threshold (cv::ndsrvp::threshold)
// ################ filter ################
int filterInit(cvhalFilter2D **context,
uchar *kernel_data, size_t kernel_step,
int kernel_type, int kernel_width,
int kernel_height, int max_width, int max_height,
int src_type, int dst_type, int borderType,
double delta, int anchor_x, int anchor_y,
bool allowSubmatrix, bool allowInplace);
#undef cv_hal_filterInit
#define cv_hal_filterInit (cv::ndsrvp::filterInit)
int filter(cvhalFilter2D *context,
const uchar *src_data, size_t src_step,
uchar *dst_data, size_t dst_step,
int width, int height,
int full_width, int full_height,
int offset_x, int offset_y);
#undef cv_hal_filter
#define cv_hal_filter (cv::ndsrvp::filter)
int filterFree(cvhalFilter2D *context);
#undef cv_hal_filterFree
#define cv_hal_filterFree (cv::ndsrvp::filterFree)
// ################ medianBlur ################
int medianBlur(const uchar* src_data, size_t src_step,
uchar* dst_data, size_t dst_step,
int width, int height, int depth, int cn, int ksize);
#undef cv_hal_medianBlur
#define cv_hal_medianBlur (cv::ndsrvp::medianBlur)
// ################ bilateralFilter ################
int bilateralFilter(const uchar* src_data, size_t src_step,
uchar* dst_data, size_t dst_step, int width, int height, int depth,
int cn, int d, double sigma_color, double sigma_space, int border_type);
#undef cv_hal_bilateralFilter
#define cv_hal_bilateralFilter (cv::ndsrvp::bilateralFilter)
} // namespace ndsrvp
} // namespace cv

270
3rdparty/ndsrvp/src/bilateralFilter.cpp vendored Normal file
View File

@ -0,0 +1,270 @@
// This file is part of OpenCV project.
// It is subject to the license terms in the LICENSE file found in the top-level directory
// of this distribution and at http://opencv.org/license.html.
#include "ndsrvp_hal.hpp"
#include "opencv2/imgproc/hal/interface.h"
#include "cvutils.hpp"
namespace cv {
namespace ndsrvp {
static void bilateralFilterProcess(uchar* dst_data, size_t dst_step, uchar* pad_data, size_t pad_step,
int width, int height, int cn, int radius, int maxk,
int* space_ofs, float *space_weight, float *color_weight)
{
int i, j, k;
for( i = 0; i < height; i++ )
{
const uchar* sptr = pad_data + (i + radius) * pad_step + radius * cn;
uchar* dptr = dst_data + i * dst_step;
if( cn == 1 )
{
std::vector<float> buf(width + width, 0.0);
float *sum = &buf[0];
float *wsum = sum + width;
k = 0;
for(; k <= maxk-4; k+=4)
{
const uchar* ksptr0 = sptr + space_ofs[k];
const uchar* ksptr1 = sptr + space_ofs[k+1];
const uchar* ksptr2 = sptr + space_ofs[k+2];
const uchar* ksptr3 = sptr + space_ofs[k+3];
j = 0;
for (; j < width; j++)
{
int rval = sptr[j];
int val = ksptr0[j];
float w = space_weight[k] * color_weight[std::abs(val - rval)];
wsum[j] += w;
sum[j] += val * w;
val = ksptr1[j];
w = space_weight[k+1] * color_weight[std::abs(val - rval)];
wsum[j] += w;
sum[j] += val * w;
val = ksptr2[j];
w = space_weight[k+2] * color_weight[std::abs(val - rval)];
wsum[j] += w;
sum[j] += val * w;
val = ksptr3[j];
w = space_weight[k+3] * color_weight[std::abs(val - rval)];
wsum[j] += w;
sum[j] += val * w;
}
}
for(; k < maxk; k++)
{
const uchar* ksptr = sptr + space_ofs[k];
j = 0;
for (; j < width; j++)
{
int val = ksptr[j];
float w = space_weight[k] * color_weight[std::abs(val - sptr[j])];
wsum[j] += w;
sum[j] += val * w;
}
}
j = 0;
for (; j < width; j++)
{
// overflow is not possible here => there is no need to use cv::saturate_cast
ndsrvp_assert(fabs(wsum[j]) > 0);
dptr[j] = (uchar)(sum[j] / wsum[j] + 0.5);
}
}
else
{
ndsrvp_assert( cn == 3 );
std::vector<float> buf(width * 3 + width);
float *sum_b = &buf[0];
float *sum_g = sum_b + width;
float *sum_r = sum_g + width;
float *wsum = sum_r + width;
k = 0;
for(; k <= maxk-4; k+=4)
{
const uchar* ksptr0 = sptr + space_ofs[k];
const uchar* ksptr1 = sptr + space_ofs[k+1];
const uchar* ksptr2 = sptr + space_ofs[k+2];
const uchar* ksptr3 = sptr + space_ofs[k+3];
const uchar* rsptr = sptr;
j = 0;
for(; j < width; j++, rsptr += 3, ksptr0 += 3, ksptr1 += 3, ksptr2 += 3, ksptr3 += 3)
{
int rb = rsptr[0], rg = rsptr[1], rr = rsptr[2];
int b = ksptr0[0], g = ksptr0[1], r = ksptr0[2];
float w = space_weight[k] * color_weight[std::abs(b - rb) + std::abs(g - rg) + std::abs(r - rr)];
wsum[j] += w;
sum_b[j] += b * w; sum_g[j] += g * w; sum_r[j] += r * w;
b = ksptr1[0]; g = ksptr1[1]; r = ksptr1[2];
w = space_weight[k+1] * color_weight[std::abs(b - rb) + std::abs(g - rg) + std::abs(r - rr)];
wsum[j] += w;
sum_b[j] += b * w; sum_g[j] += g * w; sum_r[j] += r * w;
b = ksptr2[0]; g = ksptr2[1]; r = ksptr2[2];
w = space_weight[k+2] * color_weight[std::abs(b - rb) + std::abs(g - rg) + std::abs(r - rr)];
wsum[j] += w;
sum_b[j] += b * w; sum_g[j] += g * w; sum_r[j] += r * w;
b = ksptr3[0]; g = ksptr3[1]; r = ksptr3[2];
w = space_weight[k+3] * color_weight[std::abs(b - rb) + std::abs(g - rg) + std::abs(r - rr)];
wsum[j] += w;
sum_b[j] += b * w; sum_g[j] += g * w; sum_r[j] += r * w;
}
}
for(; k < maxk; k++)
{
const uchar* ksptr = sptr + space_ofs[k];
const uchar* rsptr = sptr;
j = 0;
for(; j < width; j++, ksptr += 3, rsptr += 3)
{
int b = ksptr[0], g = ksptr[1], r = ksptr[2];
float w = space_weight[k] * color_weight[std::abs(b - rsptr[0]) + std::abs(g - rsptr[1]) + std::abs(r - rsptr[2])];
wsum[j] += w;
sum_b[j] += b * w; sum_g[j] += g * w; sum_r[j] += r * w;
}
}
j = 0;
for(; j < width; j++)
{
ndsrvp_assert(fabs(wsum[j]) > 0);
wsum[j] = 1.f / wsum[j];
*(dptr++) = (uchar)(sum_b[j] * wsum[j] + 0.5);
*(dptr++) = (uchar)(sum_g[j] * wsum[j] + 0.5);
*(dptr++) = (uchar)(sum_r[j] * wsum[j] + 0.5);
}
}
}
}
int bilateralFilter(const uchar* src_data, size_t src_step,
uchar* dst_data, size_t dst_step, int width, int height, int depth,
int cn, int d, double sigma_color, double sigma_space, int border_type)
{
if( depth != CV_8U || !(cn == 1 || cn == 3) || src_data == dst_data)
return CV_HAL_ERROR_NOT_IMPLEMENTED;
int i, j, maxk, radius;
if( sigma_color <= 0 )
sigma_color = 1;
if( sigma_space <= 0 )
sigma_space = 1;
double gauss_color_coeff = -0.5/(sigma_color * sigma_color);
double gauss_space_coeff = -0.5/(sigma_space * sigma_space);
if( d <= 0 )
radius = (int)(sigma_space * 1.5 + 0.5);
else
radius = d / 2;
radius = MAX(radius, 1);
d = radius * 2 + 1;
// no enough submatrix info
// fetch original image data
const uchar *ogn_data = src_data;
int ogn_step = src_step;
// ROI fully used in the computation
int cal_width = width + d - 1;
int cal_height = height + d - 1;
int cal_x = 0 - radius; // negative if left border exceeded
int cal_y = 0 - radius; // negative if top border exceeded
// calculate source border
std::vector<uchar> padding;
padding.resize(cal_width * cal_height * cn);
uchar* pad_data = &padding[0];
int pad_step = cal_width * cn;
uchar* pad_ptr;
const uchar* ogn_ptr;
std::vector<uchar> vec_zeros(cn, 0);
for(i = 0; i < cal_height; i++)
{
int y = borderInterpolate(i + cal_y, height, border_type);
if(y < 0) {
memset(pad_data + i * pad_step, 0, cn * cal_width);
continue;
}
// left border
j = 0;
for(; j + cal_x < 0; j++)
{
int x = borderInterpolate(j + cal_x, width, border_type);
if(x < 0) // border constant return value -1
ogn_ptr = &vec_zeros[0];
else
ogn_ptr = ogn_data + y * ogn_step + x * cn;
pad_ptr = pad_data + i * pad_step + j * cn;
memcpy(pad_ptr, ogn_ptr, cn);
}
// center
int rborder = MIN(cal_width, width - cal_x);
ogn_ptr = ogn_data + y * ogn_step + (j + cal_x) * cn;
pad_ptr = pad_data + i * pad_step + j * cn;
memcpy(pad_ptr, ogn_ptr, cn * (rborder - j));
// right border
j = rborder;
for(; j < cal_width; j++)
{
int x = borderInterpolate(j + cal_x, width, border_type);
if(x < 0) // border constant return value -1
ogn_ptr = &vec_zeros[0];
else
ogn_ptr = ogn_data + y * ogn_step + x * cn;
pad_ptr = pad_data + i * pad_step + j * cn;
memcpy(pad_ptr, ogn_ptr, cn);
}
}
std::vector<float> _color_weight(cn * 256);
std::vector<float> _space_weight(d * d);
std::vector<int> _space_ofs(d * d);
float* color_weight = &_color_weight[0];
float* space_weight = &_space_weight[0];
int* space_ofs = &_space_ofs[0];
// initialize color-related bilateral filter coefficients
for( i = 0; i < 256 * cn; i++ )
color_weight[i] = (float)std::exp(i * i * gauss_color_coeff);
// initialize space-related bilateral filter coefficients
for( i = -radius, maxk = 0; i <= radius; i++ )
{
j = -radius;
for( ; j <= radius; j++ )
{
double r = std::sqrt((double)i * i + (double)j * j);
if( r > radius )
continue;
space_weight[maxk] = (float)std::exp(r * r * gauss_space_coeff);
space_ofs[maxk++] = (int)(i * pad_step + j * cn);
}
}
bilateralFilterProcess(dst_data, dst_step, pad_data, pad_step, width, height, cn, radius, maxk, space_ofs, space_weight, color_weight);
return CV_HAL_ERROR_OK;
}
} // namespace ndsrvp
} // namespace cv

View File

@ -73,6 +73,40 @@ int borderInterpolate(int p, int len, int borderType)
return p;
}
int16x4_t borderInterpolate_vector(int16x4_t vp, short len, int borderType)
{
int16x4_t vzero = (int16x4_t){0, 0, 0, 0};
int16x4_t vone = (int16x4_t){1, 1, 1, 1};
int16x4_t vlen = (int16x4_t){len, len, len, len};
if(borderType == CV_HAL_BORDER_REPLICATE)
vp = (int16x4_t)__nds__bpick(0, __nds__bpick((long)(vlen - 1), (long)vp, (long)(vp >= vlen)), (long)(vp < 0));
else if(borderType == CV_HAL_BORDER_REFLECT || borderType == CV_HAL_BORDER_REFLECT_101)
{
int16x4_t vdelta = (borderType == CV_HAL_BORDER_REFLECT_101) ? vone : vzero;
if(len == 1)
return vzero;
do
{
int16x4_t vneg = -vp - 1 + vdelta;
int16x4_t vpos = vlen - 1 - (vp - vlen) - vdelta;
vp = (int16x4_t)__nds__bpick((long)vneg, __nds__bpick((long)vpos, (long)vp, (long)(vp >= vlen)), (long)(vp < 0));
}
while( (long)(vp >= vlen) || (long)(vp < 0) );
}
else if(borderType == CV_HAL_BORDER_WRAP)
{
ndsrvp_assert(len > 0);
int16x4_t vneg = vp - ((vp - vlen + 1) / vlen) * vlen;
int16x4_t vpos = vp % vlen;
vp = (int16x4_t)__nds__bpick((long)vneg, __nds__bpick((long)vpos, (long)vp, (long)(vp >= vlen)), (long)(vp < 0));
}
else if(borderType == CV_HAL_BORDER_CONSTANT)
vp = (int16x4_t)__nds__bpick((long)-vone, (long)vp, (long)(vp < 0 || vp >= vlen));
else
ndsrvp_error(Error::StsBadArg, "borderInterpolate_vector(): Unknown/unsupported border type");
return vp;
}
} // namespace ndsrvp
} // namespace cv

View File

@ -14,6 +14,7 @@
#include <iostream>
#include <string>
#include <array>
#include <vector>
#include <climits>
#include <algorithm>
@ -26,16 +27,26 @@ namespace ndsrvp {
void* fastMalloc(size_t size);
void fastFree(void* ptr);
int borderInterpolate(int p, int len, int borderType);
int16x4_t borderInterpolate_vector(int16x4_t vp, short len, int borderType);
#ifndef MAX
# define MAX(a,b) ((a) < (b) ? (b) : (a))
#endif
#ifndef MIN
# define MIN(a,b) ((a) > (b) ? (b) : (a))
#endif
#define CV_MAT_CN_MASK ((CV_CN_MAX - 1) << CV_CN_SHIFT)
#define CV_MAT_CN(flags) ((((flags) & CV_MAT_CN_MASK) >> CV_CN_SHIFT) + 1)
#define CV_ELEM_SIZE1(type) ((0x28442211 >> CV_MAT_DEPTH(type)*4) & 15)
#define CV_ELEM_SIZE(type) (CV_MAT_CN(type)*CV_ELEM_SIZE1(type))
#define CV_MALLOC_ALIGN 64
inline size_t getElemSize(int type) { return (size_t)CV_ELEM_SIZE(type); }
// error codes
enum Error{
@ -69,6 +80,135 @@ inline int32x2_t vclip(int32x2_t x, int32x2_t a, int32x2_t b)
return (int32x2_t)__nds__bpick((long)a, __nds__bpick((long)(b - 1), (long)x, (long)(x < b)), (long)(x >= a));
}
// expand
/*
[0] [1] [2] [3] [4] [5] [6] [7]
810 [ 0 ] [ 1 ] [ 4 ] [ 5 ]
832 [ 2 ] [ 3 ] [ 6 ] [ 7 ]
bb [ 0 ] [ 1 ] [ 2 ] [ 3 ]
tt [ 4 ] [ 5 ] [ 6 ] [ 7 ]
*/
inline void ndsrvp_u8_u16_expand8(const unsigned long vs, ushort* dst)
{
unsigned long vs810 = __nds__zunpkd810(vs);
unsigned long vs832 = __nds__zunpkd832(vs);
*(unsigned long*)dst = __nds__pkbb32(vs832, vs810);
*(unsigned long*)(dst + 4) = __nds__pktt32(vs832, vs810);
}
/*
[0] [1] [2] [3] [4] [5] [6] [7]
820 [ 0 ] [ 2 ] [ 4 ] [ 6 ]
831 [ 1 ] [ 3 ] [ 5 ] [ 7 ]
bb [ 0 ] [ 2 ] [ 1 ] [ 3 ]
tt [ 4 ] [ 6 ] [ 5 ] [ 7 ]
*/
inline void ndsrvp_u8_u16_eswap8(const unsigned long vs, ushort* dst)
{
unsigned long vs820 = __nds__zunpkd820(vs);
unsigned long vs831 = __nds__zunpkd831(vs);
*(unsigned long*)dst = __nds__pkbb32(vs831, vs820);
*(unsigned long*)(dst + 4) = __nds__pktt32(vs831, vs820);
}
/*
[0] [1] [2] [3] [4] [5] [6] [7]
820 [ 0 ] [ 2 ] [ 4 ] [ 6 ]
831 [ 1 ] [ 3 ] [ 5 ] [ 7 ]
bb [ 0 ] [ 2 ] [ 1 ] [ 3 ]
tt [ 4 ] [ 6 ] [ 5 ] [ 7 ]
bbbb[ 0 ] [ 1 ]
bbtt[ 2 ] [ 3 ]
ttbb[ 4 ] [ 5 ]
tttt[ 6 ] [ 7 ]
*/
inline void ndsrvp_u8_u32_expand8(const unsigned long vs, uint* dst)
{
unsigned long vs820 = __nds__zunpkd820(vs);
unsigned long vs831 = __nds__zunpkd831(vs);
unsigned long vsbb = __nds__pkbb32(vs831, vs820);
unsigned long vstt = __nds__pktt32(vs831, vs820);
*(unsigned long*)dst = __nds__pkbb16(0, vsbb);
*(unsigned long*)(dst + 2) = __nds__pktt16(0, vsbb);
*(unsigned long*)(dst + 4) = __nds__pkbb16(0, vstt);
*(unsigned long*)(dst + 6) = __nds__pktt16(0, vstt);
}
// float replacement
inline void ndsrvp_f32_add8(const float* a, const float* b, float* c)
{
c[0] = a[0] + b[0];
c[1] = a[1] + b[1];
c[2] = a[2] + b[2];
c[3] = a[3] + b[3];
c[4] = a[4] + b[4];
c[5] = a[5] + b[5];
c[6] = a[6] + b[6];
c[7] = a[7] + b[7];
}
/*
[1] [8] [23]
[24] [8]
*/
inline void ndsrvp_f32_u8_mul8(const float* a, const unsigned long b, float* c) // experimental, not bit exact
{
const int mask_frac = 0x007FFFFF;
const int mask_sign = 0x7FFFFFFF;
const int mask_lead = 0x40000000;
const int ofs_exp = 23;
uint32x2_t va01 = *(uint32x2_t*)a;
uint32x2_t va23 = *(uint32x2_t*)(a + 2);
uint32x2_t va45 = *(uint32x2_t*)(a + 4);
uint32x2_t va67 = *(uint32x2_t*)(a + 6);
uint32x2_t vaexp01 = va01 >> ofs_exp;
uint32x2_t vaexp23 = va23 >> ofs_exp;
uint32x2_t vaexp45 = va45 >> ofs_exp;
uint32x2_t vaexp67 = va67 >> ofs_exp;
uint32x2_t vafrac01 = ((va01 << 7) & mask_sign) | mask_lead;
uint32x2_t vafrac23 = ((va23 << 7) & mask_sign) | mask_lead;
uint32x2_t vafrac45 = ((va45 << 7) & mask_sign) | mask_lead;
uint32x2_t vafrac67 = ((va67 << 7) & mask_sign) | mask_lead;
int16x4_t vb[2]; // fake signed for signed multiply
ndsrvp_u8_u16_eswap8(b, (ushort*)vb);
vafrac01 = (uint32x2_t)__nds__kmmwb2_u((long)vafrac01, (unsigned long)vb[0]);
vafrac23 = (uint32x2_t)__nds__kmmwt2_u((long)vafrac23, (unsigned long)vb[0]);
vafrac45 = (uint32x2_t)__nds__kmmwb2_u((long)vafrac45, (unsigned long)vb[1]);
vafrac67 = (uint32x2_t)__nds__kmmwt2_u((long)vafrac67, (unsigned long)vb[1]);
uint32x2_t vaclz01 = __nds__v_clz32(vafrac01) - 8;
uint32x2_t vaclz23 = __nds__v_clz32(vafrac23) - 8;
uint32x2_t vaclz45 = __nds__v_clz32(vafrac45) - 8;
uint32x2_t vaclz67 = __nds__v_clz32(vafrac67) - 8;
vaexp01 += 8 - vaclz01;
vaexp23 += 8 - vaclz23;
vaexp45 += 8 - vaclz45;
vaexp67 += 8 - vaclz67;
vafrac01 <<= vaclz01;
vafrac23 <<= vaclz23;
vafrac45 <<= vaclz45;
vafrac67 <<= vaclz67;
*(uint32x2_t*)c = (vaexp01 << ofs_exp) | (vafrac01 & mask_frac);
*(uint32x2_t*)(c + 2) = (vaexp23 << ofs_exp) | (vafrac23 & mask_frac);
*(uint32x2_t*)(c + 4) = (vaexp45 << ofs_exp) | (vafrac45 & mask_frac);
*(uint32x2_t*)(c + 6) = (vaexp67 << ofs_exp) | (vafrac67 & mask_frac);
}
// saturate
template<typename _Tp> static inline _Tp saturate_cast(int v) { return _Tp(v); }
@ -94,6 +234,26 @@ template<> inline short saturate_cast<short>(double v) { return saturate_cas
template<> inline int saturate_cast<int>(float v) { return (int)lrintf(v); }
template<> inline int saturate_cast<int>(double v) { return (int)lrint(v); }
inline double cast_ptr_to_double(const uchar* v, int depth) {
switch (depth) {
case CV_8U: return (double)*(uchar*)v;
case CV_8S: return (double)*(char*)v;
case CV_16U: return (double)*(ushort*)v;
case CV_16S: return (double)*(short*)v;
case CV_32S: return (double)*(int*)v;
case CV_32F: return (double)*(float*)v;
case CV_64F: return (double)*(double*)v;
case CV_16F: return (double)*(float*)v;
default: return 0;
}
}
template <typename _Tp>
inline _Tp data_at(const uchar* data, int step, int y, int x, int cn)
{
return ((_Tp*)(data + y * step))[x * cn];
}
// align
inline long align(size_t v, int n)

321
3rdparty/ndsrvp/src/filter.cpp vendored Normal file
View File

@ -0,0 +1,321 @@
// This file is part of OpenCV project.
// It is subject to the license terms in the LICENSE file found in the top-level directory
// of this distribution and at http://opencv.org/license.html.
#include "ndsrvp_hal.hpp"
#include "opencv2/imgproc/hal/interface.h"
#include "cvutils.hpp"
namespace cv {
namespace ndsrvp {
class FilterData
{
public:
FilterData(uchar *_kernel_data, size_t _kernel_step, int _kernel_type, int _src_type, int _dst_type, int _borderType,
int _kernel_width, int _kernel_height, int _max_width, int _max_height, double _delta, int _anchor_x, int _anchor_y)
: kernel_data(_kernel_data), kernel_step(_kernel_step), kernel_type(_kernel_type), src_type(_src_type), dst_type(_dst_type), borderType(_borderType),
kernel_width(_kernel_width), kernel_height(_kernel_height), max_width(_max_width), max_height(_max_height), delta(_delta), anchor_x(_anchor_x), anchor_y(_anchor_y)
{
}
uchar *kernel_data;
size_t kernel_step; // bytes between rows(height)
int kernel_type, src_type, dst_type, borderType;
int kernel_width, kernel_height;
int max_width, max_height;
double delta;
int anchor_x, anchor_y;
std::vector<uchar> coords;
std::vector<float> coeffs;
int nz;
std::vector<uchar> padding;
};
static int countNonZero(const FilterData* ctx)
{
int i, j, nz = 0;
const uchar* ker_row = ctx->kernel_data;
for( i = 0; i < ctx->kernel_height; i++, ker_row += ctx->kernel_step )
{
for( j = 0; j < ctx->kernel_width; j++ )
{
if( ((float*)ker_row)[j] != 0.0 )
nz++;
}
}
return nz;
}
static void preprocess2DKernel(FilterData* ctx)
{
int i, j, k, nz = countNonZero(ctx), ktype = ctx->kernel_type;
if(nz == 0)
nz = 1; // (0, 0) == 0 by default
ndsrvp_assert( ktype == CV_32F );
ctx->coords.resize(nz * 2);
ctx->coeffs.resize(nz);
const uchar* ker_row = ctx->kernel_data;
for( i = k = 0; i < ctx->kernel_height; i++, ker_row += ctx->kernel_step )
{
for( j = 0; j < ctx->kernel_width; j++ )
{
float val = ((float*)ker_row)[j];
if( val == 0.0 )
continue;
ctx->coords[k * 2] = j;
ctx->coords[k * 2 + 1] = i;
ctx->coeffs[k++] = val;
}
}
ctx->nz = k;
}
int filterInit(cvhalFilter2D **context,
uchar *kernel_data, size_t kernel_step,
int kernel_type, int kernel_width,
int kernel_height, int max_width, int max_height,
int src_type, int dst_type, int borderType,
double delta, int anchor_x, int anchor_y,
bool allowSubmatrix, bool allowInplace)
{
int sdepth = CV_MAT_DEPTH(src_type), ddepth = CV_MAT_DEPTH(dst_type);
int cn = CV_MAT_CN(src_type), kdepth = kernel_type;
(void)allowSubmatrix;
(void)allowInplace;
if(delta - (int)delta != 0.0)
return CV_HAL_ERROR_NOT_IMPLEMENTED;
if(kdepth != CV_32F || (sdepth != CV_8U && sdepth != CV_16U) || ddepth != sdepth)
return CV_HAL_ERROR_NOT_IMPLEMENTED;
FilterData *ctx = new FilterData(kernel_data, kernel_step, kernel_type, src_type, dst_type, borderType,
kernel_width, kernel_height, max_width, max_height, delta, anchor_x, anchor_y);
*context = (cvhalFilter2D*)ctx;
ndsrvp_assert(cn == CV_MAT_CN(dst_type) && ddepth >= sdepth);
preprocess2DKernel(ctx);
return CV_HAL_ERROR_OK;
}
int filter(cvhalFilter2D *context,
const uchar *src_data, size_t src_step,
uchar *dst_data, size_t dst_step,
int width, int height,
int full_width, int full_height,
int offset_x, int offset_y)
{
FilterData *ctx = (FilterData*)context;
int cn = CV_MAT_CN(ctx->src_type);
int cnes = CV_ELEM_SIZE(ctx->src_type);
int ddepth = CV_MAT_DEPTH(ctx->dst_type);
float delta_sat = (uchar)(ctx->delta);
if(ddepth == CV_8U)
delta_sat = (float)saturate_cast<uchar>(ctx->delta);
else if(ddepth == CV_16U)
delta_sat = (float)saturate_cast<ushort>(ctx->delta);
// fetch original image data
const uchar *ogn_data = src_data - offset_y * src_step - offset_x * cnes;
int ogn_step = src_step;
// ROI fully used in the computation
int cal_width = width + ctx->kernel_width - 1;
int cal_height = height + ctx->kernel_height - 1;
int cal_x = offset_x - ctx->anchor_x; // negative if left border exceeded
int cal_y = offset_y - ctx->anchor_y; // negative if top border exceeded
// calculate source border
ctx->padding.resize(cal_width * cal_height * cnes);
uchar* pad_data = &ctx->padding[0];
int pad_step = cal_width * cnes;
uchar* pad_ptr;
const uchar* ogn_ptr;
std::vector<uchar> vec_zeros(cnes, 0);
for(int i = 0; i < cal_height; i++)
{
int y = borderInterpolate(i + cal_y, full_height, ctx->borderType);
if(y < 0) {
memset(pad_data + i * pad_step, 0, cnes * cal_width);
continue;
}
// left border
int j = 0;
int16x4_t vj = {0, 1, 2, 3};
vj += saturate_cast<short>(cal_x);
for(; j + cal_x < -4; j += 4, vj += 4)
{
int16x4_t vx = borderInterpolate_vector(vj, full_width, ctx->borderType);
for(int k = 0; k < 4; k++) {
if(vx[k] < 0) // border constant return value -1
ogn_ptr = &vec_zeros[0];
else
ogn_ptr = ogn_data + y * ogn_step + vx[k] * cnes;
pad_ptr = pad_data + i * pad_step + (j + k) * cnes;
memcpy(pad_ptr, ogn_ptr, cnes);
}
}
for(; j + cal_x < 0; j++)
{
int x = borderInterpolate(j + cal_x, full_width, ctx->borderType);
if(x < 0) // border constant return value -1
ogn_ptr = &vec_zeros[0];
else
ogn_ptr = ogn_data + y * ogn_step + x * cnes;
pad_ptr = pad_data + i * pad_step + j * cnes;
memcpy(pad_ptr, ogn_ptr, cnes);
}
// center
int rborder = MIN(cal_width, full_width - cal_x);
ogn_ptr = ogn_data + y * ogn_step + (j + cal_x) * cnes;
pad_ptr = pad_data + i * pad_step + j * cnes;
memcpy(pad_ptr, ogn_ptr, cnes * (rborder - j));
// right border
j = rborder;
vj = (int16x4_t){0, 1, 2, 3} + saturate_cast<short>(cal_x + rborder);
for(; j <= cal_width - 4; j += 4, vj += 4)
{
int16x4_t vx = borderInterpolate_vector(vj, full_width, ctx->borderType);
for(int k = 0; k < 4; k++) {
if(vx[k] < 0) // border constant return value -1
ogn_ptr = &vec_zeros[0];
else
ogn_ptr = ogn_data + y * ogn_step + vx[k] * cnes;
pad_ptr = pad_data + i * pad_step + (j + k) * cnes;
memcpy(pad_ptr, ogn_ptr, cnes);
}
}
for(; j < cal_width; j++)
{
int x = borderInterpolate(j + cal_x, full_width, ctx->borderType);
if(x < 0) // border constant return value -1
ogn_ptr = &vec_zeros[0];
else
ogn_ptr = ogn_data + y * ogn_step + x * cnes;
pad_ptr = pad_data + i * pad_step + j * cnes;
memcpy(pad_ptr, ogn_ptr, cnes);
}
}
// prepare the pointers
int i, k, count, nz = ctx->nz;
const uchar* ker_pts = &ctx->coords[0];
const float* ker_cfs = &ctx->coeffs[0];
if( ddepth == CV_8U )
{
std::vector<uchar*> src_ptrarr;
src_ptrarr.resize(nz);
uchar** src_ptrs = &src_ptrarr[0];
uchar* dst_row = dst_data;
uchar* pad_row = pad_data;
for( count = 0; count < height; count++, dst_row += dst_step, pad_row += pad_step )
{
for( k = 0; k < nz; k++ )
src_ptrs[k] = (uchar*)pad_row + ker_pts[k * 2 + 1] * pad_step + ker_pts[k * 2] * cnes;
i = 0;
for( ; i <= width * cnes - 8; i += 8 )
{
float vs0[8] = {delta_sat, delta_sat, delta_sat, delta_sat, delta_sat, delta_sat, delta_sat, delta_sat};
for( k = 0; k < nz; k++ ) {
float vker_cfs[8] = {ker_cfs[k], ker_cfs[k], ker_cfs[k], ker_cfs[k], ker_cfs[k], ker_cfs[k], ker_cfs[k], ker_cfs[k]};
// experimental code
// ndsrvp_f32_u8_mul8(vker_cfs, *(unsigned long*)(src_ptrs[k] + i), vker_cfs);
// ndsrvp_f32_add8(vs0, vker_cfs, vs0);
vs0[0] += vker_cfs[0] * src_ptrs[k][i];
vs0[1] += vker_cfs[1] * src_ptrs[k][i + 1];
vs0[2] += vker_cfs[2] * src_ptrs[k][i + 2];
vs0[3] += vker_cfs[3] * src_ptrs[k][i + 3];
vs0[4] += vker_cfs[4] * src_ptrs[k][i + 4];
vs0[5] += vker_cfs[5] * src_ptrs[k][i + 5];
vs0[6] += vker_cfs[6] * src_ptrs[k][i + 6];
vs0[7] += vker_cfs[7] * src_ptrs[k][i + 7];
}
dst_row[i] = saturate_cast<uchar>(vs0[0]);
dst_row[i + 1] = saturate_cast<uchar>(vs0[1]);
dst_row[i + 2] = saturate_cast<uchar>(vs0[2]);
dst_row[i + 3] = saturate_cast<uchar>(vs0[3]);
dst_row[i + 4] = saturate_cast<uchar>(vs0[4]);
dst_row[i + 5] = saturate_cast<uchar>(vs0[5]);
dst_row[i + 6] = saturate_cast<uchar>(vs0[6]);
dst_row[i + 7] = saturate_cast<uchar>(vs0[7]);
}
for( ; i < width * cnes; i++ )
{
float s0 = delta_sat;
for( k = 0; k < nz; k++ ) {
s0 += ker_cfs[k] * src_ptrs[k][i];
}
dst_row[i] = saturate_cast<uchar>(s0);
}
}
}
else if( ddepth == CV_16U )
{
std::vector<ushort*> src_ptrarr;
src_ptrarr.resize(nz);
ushort** src_ptrs = &src_ptrarr[0];
uchar* dst_row = dst_data;
uchar* pad_row = pad_data;
for( count = 0; count < height; count++, dst_row += dst_step, pad_row += pad_step )
{
for( k = 0; k < nz; k++ )
src_ptrs[k] = (ushort*)((uchar*)pad_row + ker_pts[k * 2 + 1] * pad_step + ker_pts[k * 2] * cnes);
i = 0;
for( ; i <= width * cn - 4; i += 4 )
{
float vs0[8] = {delta_sat, delta_sat, delta_sat, delta_sat};
for( k = 0; k < nz; k++ ) {
float vker_cfs[8] = {ker_cfs[k], ker_cfs[k], ker_cfs[k], ker_cfs[k]};
vs0[0] += vker_cfs[0] * src_ptrs[k][i];
vs0[1] += vker_cfs[1] * src_ptrs[k][i + 1];
vs0[2] += vker_cfs[2] * src_ptrs[k][i + 2];
vs0[3] += vker_cfs[3] * src_ptrs[k][i + 3];
}
ushort* dst_row_ptr = (ushort*)dst_row;
dst_row_ptr[i] = saturate_cast<ushort>(vs0[0]);
dst_row_ptr[i + 1] = saturate_cast<ushort>(vs0[1]);
dst_row_ptr[i + 2] = saturate_cast<ushort>(vs0[2]);
dst_row_ptr[i + 3] = saturate_cast<ushort>(vs0[3]);
}
for( ; i < width * cn; i++ )
{
float s0 = delta_sat;
for( k = 0; k < nz; k++ ) {
s0 += ker_cfs[k] * src_ptrs[k][i];
}
((ushort*)dst_row)[i] = saturate_cast<ushort>(s0);
}
}
}
return CV_HAL_ERROR_OK;
}
int filterFree(cvhalFilter2D *context) {
FilterData *ctx = (FilterData*)context;
delete ctx;
return CV_HAL_ERROR_OK;
}
} // namespace ndsrvp
} // namespace cv

300
3rdparty/ndsrvp/src/medianBlur.cpp vendored Normal file
View File

@ -0,0 +1,300 @@
// This file is part of OpenCV project.
// It is subject to the license terms in the LICENSE file found in the top-level directory
// of this distribution and at http://opencv.org/license.html.
#include "ndsrvp_hal.hpp"
#include "opencv2/imgproc/hal/interface.h"
#include "cvutils.hpp"
namespace cv {
namespace ndsrvp {
struct operators_minmax_t {
inline void vector(uint8x8_t & a, uint8x8_t & b) const {
uint8x8_t t = a;
a = __nds__v_umin8(a, b);
b = __nds__v_umax8(t, b);
}
inline void scalar(uchar & a, uchar & b) const {
uchar t = a;
a = __nds__umin8(a, b);
b = __nds__umax8(t, b);
}
inline void vector(int8x8_t & a, int8x8_t & b) const {
int8x8_t t = a;
a = __nds__v_smin8(a, b);
b = __nds__v_smax8(t, b);
}
inline void scalar(schar & a, schar & b) const {
schar t = a;
a = __nds__smin8(a, b);
b = __nds__smax8(t, b);
}
inline void vector(uint16x4_t & a, uint16x4_t & b) const {
uint16x4_t t = a;
a = __nds__v_umin16(a, b);
b = __nds__v_umax16(t, b);
}
inline void scalar(ushort & a, ushort & b) const {
ushort t = a;
a = __nds__umin16(a, b);
b = __nds__umax16(t, b);
}
inline void vector(int16x4_t & a, int16x4_t & b) const {
int16x4_t t = a;
a = __nds__v_smin16(a, b);
b = __nds__v_smax16(t, b);
}
inline void scalar(short & a, short & b) const {
short t = a;
a = __nds__smin16(a, b);
b = __nds__smax16(t, b);
}
};
template<typename T, typename WT, typename VT> // type, widen type, vector type
static void
medianBlur_SortNet( const uchar* src_data, size_t src_step,
uchar* dst_data, size_t dst_step,
int width, int height, int cn, int ksize )
{
const T* src = (T*)src_data;
T* dst = (T*)dst_data;
int sstep = (int)(src_step / sizeof(T));
int dstep = (int)(dst_step / sizeof(T));
int i, j, k;
operators_minmax_t op;
if( ksize == 3 )
{
if( width == 1 || height == 1 )
{
int len = width + height - 1;
int sdelta = height == 1 ? cn : sstep;
int sdelta0 = height == 1 ? 0 : sstep - cn;
int ddelta = height == 1 ? cn : dstep;
for( i = 0; i < len; i++, src += sdelta0, dst += ddelta )
for( j = 0; j < cn; j++, src++ )
{
T p0 = src[i > 0 ? -sdelta : 0];
T p1 = src[0];
T p2 = src[i < len - 1 ? sdelta : 0];
op.scalar(p0, p1); op.scalar(p1, p2); op.scalar(p0, p1);
dst[j] = (T)p1;
}
return;
}
width *= cn;
for( i = 0; i < height; i++, dst += dstep )
{
const T* row0 = src + std::max(i - 1, 0)*sstep;
const T* row1 = src + i*sstep;
const T* row2 = src + std::min(i + 1, height-1)*sstep;
int limit = cn;
for(j = 0;; )
{
for( ; j < limit; j++ )
{
int j0 = j >= cn ? j - cn : j;
int j2 = j < width - cn ? j + cn : j;
T p0 = row0[j0], p1 = row0[j], p2 = row0[j2];
T p3 = row1[j0], p4 = row1[j], p5 = row1[j2];
T p6 = row2[j0], p7 = row2[j], p8 = row2[j2];
op.scalar(p1, p2); op.scalar(p4, p5); op.scalar(p7, p8); op.scalar(p0, p1);
op.scalar(p3, p4); op.scalar(p6, p7); op.scalar(p1, p2); op.scalar(p4, p5);
op.scalar(p7, p8); op.scalar(p0, p3); op.scalar(p5, p8); op.scalar(p4, p7);
op.scalar(p3, p6); op.scalar(p1, p4); op.scalar(p2, p5); op.scalar(p4, p7);
op.scalar(p4, p2); op.scalar(p6, p4); op.scalar(p4, p2);
dst[j] = (T)p4;
}
if( limit == width )
break;
int nlanes = 8 / sizeof(T);
for( ; (cn % nlanes == 0) && (j <= width - nlanes - cn); j += nlanes ) // alignment
{
VT p0 = *(VT*)(row0+j-cn), p1 = *(VT*)(row0+j), p2 = *(VT*)(row0+j+cn);
VT p3 = *(VT*)(row1+j-cn), p4 = *(VT*)(row1+j), p5 = *(VT*)(row1+j+cn);
VT p6 = *(VT*)(row2+j-cn), p7 = *(VT*)(row2+j), p8 = *(VT*)(row2+j+cn);
op.vector(p1, p2); op.vector(p4, p5); op.vector(p7, p8); op.vector(p0, p1);
op.vector(p3, p4); op.vector(p6, p7); op.vector(p1, p2); op.vector(p4, p5);
op.vector(p7, p8); op.vector(p0, p3); op.vector(p5, p8); op.vector(p4, p7);
op.vector(p3, p6); op.vector(p1, p4); op.vector(p2, p5); op.vector(p4, p7);
op.vector(p4, p2); op.vector(p6, p4); op.vector(p4, p2);
*(VT*)(dst+j) = p4;
}
limit = width;
}
}
}
else if( ksize == 5 )
{
if( width == 1 || height == 1 )
{
int len = width + height - 1;
int sdelta = height == 1 ? cn : sstep;
int sdelta0 = height == 1 ? 0 : sstep - cn;
int ddelta = height == 1 ? cn : dstep;
for( i = 0; i < len; i++, src += sdelta0, dst += ddelta )
for( j = 0; j < cn; j++, src++ )
{
int i1 = i > 0 ? -sdelta : 0;
int i0 = i > 1 ? -sdelta*2 : i1;
int i3 = i < len-1 ? sdelta : 0;
int i4 = i < len-2 ? sdelta*2 : i3;
T p0 = src[i0], p1 = src[i1], p2 = src[0], p3 = src[i3], p4 = src[i4];
op.scalar(p0, p1); op.scalar(p3, p4); op.scalar(p2, p3); op.scalar(p3, p4); op.scalar(p0, p2);
op.scalar(p2, p4); op.scalar(p1, p3); op.scalar(p1, p2);
dst[j] = (T)p2;
}
return;
}
width *= cn;
for( i = 0; i < height; i++, dst += dstep )
{
const T* row[5];
row[0] = src + std::max(i - 2, 0)*sstep;
row[1] = src + std::max(i - 1, 0)*sstep;
row[2] = src + i*sstep;
row[3] = src + std::min(i + 1, height-1)*sstep;
row[4] = src + std::min(i + 2, height-1)*sstep;
int limit = cn*2;
for(j = 0;; )
{
for( ; j < limit; j++ )
{
T p[25];
int j1 = j >= cn ? j - cn : j;
int j0 = j >= cn*2 ? j - cn*2 : j1;
int j3 = j < width - cn ? j + cn : j;
int j4 = j < width - cn*2 ? j + cn*2 : j3;
for( k = 0; k < 5; k++ )
{
const T* rowk = row[k];
p[k*5] = rowk[j0]; p[k*5+1] = rowk[j1];
p[k*5+2] = rowk[j]; p[k*5+3] = rowk[j3];
p[k*5+4] = rowk[j4];
}
op.scalar(p[1], p[2]); op.scalar(p[0], p[1]); op.scalar(p[1], p[2]); op.scalar(p[4], p[5]); op.scalar(p[3], p[4]);
op.scalar(p[4], p[5]); op.scalar(p[0], p[3]); op.scalar(p[2], p[5]); op.scalar(p[2], p[3]); op.scalar(p[1], p[4]);
op.scalar(p[1], p[2]); op.scalar(p[3], p[4]); op.scalar(p[7], p[8]); op.scalar(p[6], p[7]); op.scalar(p[7], p[8]);
op.scalar(p[10], p[11]); op.scalar(p[9], p[10]); op.scalar(p[10], p[11]); op.scalar(p[6], p[9]); op.scalar(p[8], p[11]);
op.scalar(p[8], p[9]); op.scalar(p[7], p[10]); op.scalar(p[7], p[8]); op.scalar(p[9], p[10]); op.scalar(p[0], p[6]);
op.scalar(p[4], p[10]); op.scalar(p[4], p[6]); op.scalar(p[2], p[8]); op.scalar(p[2], p[4]); op.scalar(p[6], p[8]);
op.scalar(p[1], p[7]); op.scalar(p[5], p[11]); op.scalar(p[5], p[7]); op.scalar(p[3], p[9]); op.scalar(p[3], p[5]);
op.scalar(p[7], p[9]); op.scalar(p[1], p[2]); op.scalar(p[3], p[4]); op.scalar(p[5], p[6]); op.scalar(p[7], p[8]);
op.scalar(p[9], p[10]); op.scalar(p[13], p[14]); op.scalar(p[12], p[13]); op.scalar(p[13], p[14]); op.scalar(p[16], p[17]);
op.scalar(p[15], p[16]); op.scalar(p[16], p[17]); op.scalar(p[12], p[15]); op.scalar(p[14], p[17]); op.scalar(p[14], p[15]);
op.scalar(p[13], p[16]); op.scalar(p[13], p[14]); op.scalar(p[15], p[16]); op.scalar(p[19], p[20]); op.scalar(p[18], p[19]);
op.scalar(p[19], p[20]); op.scalar(p[21], p[22]); op.scalar(p[23], p[24]); op.scalar(p[21], p[23]); op.scalar(p[22], p[24]);
op.scalar(p[22], p[23]); op.scalar(p[18], p[21]); op.scalar(p[20], p[23]); op.scalar(p[20], p[21]); op.scalar(p[19], p[22]);
op.scalar(p[22], p[24]); op.scalar(p[19], p[20]); op.scalar(p[21], p[22]); op.scalar(p[23], p[24]); op.scalar(p[12], p[18]);
op.scalar(p[16], p[22]); op.scalar(p[16], p[18]); op.scalar(p[14], p[20]); op.scalar(p[20], p[24]); op.scalar(p[14], p[16]);
op.scalar(p[18], p[20]); op.scalar(p[22], p[24]); op.scalar(p[13], p[19]); op.scalar(p[17], p[23]); op.scalar(p[17], p[19]);
op.scalar(p[15], p[21]); op.scalar(p[15], p[17]); op.scalar(p[19], p[21]); op.scalar(p[13], p[14]); op.scalar(p[15], p[16]);
op.scalar(p[17], p[18]); op.scalar(p[19], p[20]); op.scalar(p[21], p[22]); op.scalar(p[23], p[24]); op.scalar(p[0], p[12]);
op.scalar(p[8], p[20]); op.scalar(p[8], p[12]); op.scalar(p[4], p[16]); op.scalar(p[16], p[24]); op.scalar(p[12], p[16]);
op.scalar(p[2], p[14]); op.scalar(p[10], p[22]); op.scalar(p[10], p[14]); op.scalar(p[6], p[18]); op.scalar(p[6], p[10]);
op.scalar(p[10], p[12]); op.scalar(p[1], p[13]); op.scalar(p[9], p[21]); op.scalar(p[9], p[13]); op.scalar(p[5], p[17]);
op.scalar(p[13], p[17]); op.scalar(p[3], p[15]); op.scalar(p[11], p[23]); op.scalar(p[11], p[15]); op.scalar(p[7], p[19]);
op.scalar(p[7], p[11]); op.scalar(p[11], p[13]); op.scalar(p[11], p[12]);
dst[j] = (T)p[12];
}
if( limit == width )
break;
int nlanes = 8 / sizeof(T);
for( ; (cn % nlanes == 0) && (j <= width - nlanes - cn*2); j += nlanes )
{
VT p0 = *(VT*)(row[0]+j-cn*2), p5 = *(VT*)(row[1]+j-cn*2), p10 = *(VT*)(row[2]+j-cn*2), p15 = *(VT*)(row[3]+j-cn*2), p20 = *(VT*)(row[4]+j-cn*2);
VT p1 = *(VT*)(row[0]+j-cn*1), p6 = *(VT*)(row[1]+j-cn*1), p11 = *(VT*)(row[2]+j-cn*1), p16 = *(VT*)(row[3]+j-cn*1), p21 = *(VT*)(row[4]+j-cn*1);
VT p2 = *(VT*)(row[0]+j-cn*0), p7 = *(VT*)(row[1]+j-cn*0), p12 = *(VT*)(row[2]+j-cn*0), p17 = *(VT*)(row[3]+j-cn*0), p22 = *(VT*)(row[4]+j-cn*0);
VT p3 = *(VT*)(row[0]+j+cn*1), p8 = *(VT*)(row[1]+j+cn*1), p13 = *(VT*)(row[2]+j+cn*1), p18 = *(VT*)(row[3]+j+cn*1), p23 = *(VT*)(row[4]+j+cn*1);
VT p4 = *(VT*)(row[0]+j+cn*2), p9 = *(VT*)(row[1]+j+cn*2), p14 = *(VT*)(row[2]+j+cn*2), p19 = *(VT*)(row[3]+j+cn*2), p24 = *(VT*)(row[4]+j+cn*2);
op.vector(p1, p2); op.vector(p0, p1); op.vector(p1, p2); op.vector(p4, p5); op.vector(p3, p4);
op.vector(p4, p5); op.vector(p0, p3); op.vector(p2, p5); op.vector(p2, p3); op.vector(p1, p4);
op.vector(p1, p2); op.vector(p3, p4); op.vector(p7, p8); op.vector(p6, p7); op.vector(p7, p8);
op.vector(p10, p11); op.vector(p9, p10); op.vector(p10, p11); op.vector(p6, p9); op.vector(p8, p11);
op.vector(p8, p9); op.vector(p7, p10); op.vector(p7, p8); op.vector(p9, p10); op.vector(p0, p6);
op.vector(p4, p10); op.vector(p4, p6); op.vector(p2, p8); op.vector(p2, p4); op.vector(p6, p8);
op.vector(p1, p7); op.vector(p5, p11); op.vector(p5, p7); op.vector(p3, p9); op.vector(p3, p5);
op.vector(p7, p9); op.vector(p1, p2); op.vector(p3, p4); op.vector(p5, p6); op.vector(p7, p8);
op.vector(p9, p10); op.vector(p13, p14); op.vector(p12, p13); op.vector(p13, p14); op.vector(p16, p17);
op.vector(p15, p16); op.vector(p16, p17); op.vector(p12, p15); op.vector(p14, p17); op.vector(p14, p15);
op.vector(p13, p16); op.vector(p13, p14); op.vector(p15, p16); op.vector(p19, p20); op.vector(p18, p19);
op.vector(p19, p20); op.vector(p21, p22); op.vector(p23, p24); op.vector(p21, p23); op.vector(p22, p24);
op.vector(p22, p23); op.vector(p18, p21); op.vector(p20, p23); op.vector(p20, p21); op.vector(p19, p22);
op.vector(p22, p24); op.vector(p19, p20); op.vector(p21, p22); op.vector(p23, p24); op.vector(p12, p18);
op.vector(p16, p22); op.vector(p16, p18); op.vector(p14, p20); op.vector(p20, p24); op.vector(p14, p16);
op.vector(p18, p20); op.vector(p22, p24); op.vector(p13, p19); op.vector(p17, p23); op.vector(p17, p19);
op.vector(p15, p21); op.vector(p15, p17); op.vector(p19, p21); op.vector(p13, p14); op.vector(p15, p16);
op.vector(p17, p18); op.vector(p19, p20); op.vector(p21, p22); op.vector(p23, p24); op.vector(p0, p12);
op.vector(p8, p20); op.vector(p8, p12); op.vector(p4, p16); op.vector(p16, p24); op.vector(p12, p16);
op.vector(p2, p14); op.vector(p10, p22); op.vector(p10, p14); op.vector(p6, p18); op.vector(p6, p10);
op.vector(p10, p12); op.vector(p1, p13); op.vector(p9, p21); op.vector(p9, p13); op.vector(p5, p17);
op.vector(p13, p17); op.vector(p3, p15); op.vector(p11, p23); op.vector(p11, p15); op.vector(p7, p19);
op.vector(p7, p11); op.vector(p11, p13); op.vector(p11, p12);
*(VT*)(dst+j) = p12;
}
limit = width;
}
}
}
}
int medianBlur(const uchar* src_data, size_t src_step,
uchar* dst_data, size_t dst_step,
int width, int height, int depth, int cn, int ksize)
{
bool useSortNet = ((ksize == 3) || (ksize == 5 && ( depth > CV_8U || cn == 2 || cn > 4 )));
if( useSortNet )
{
uchar* src_data_rep;
if( dst_data == src_data ) {
std::vector<uchar> src_data_copy(src_step * height);
memcpy(src_data_copy.data(), src_data, src_step * height);
src_data_rep = &src_data_copy[0];
}
else {
src_data_rep = (uchar*)src_data;
}
if( depth == CV_8U )
medianBlur_SortNet<uchar, int, uint8x8_t>( src_data_rep, src_step, dst_data, dst_step, width, height, cn, ksize );
else if( depth == CV_8S )
medianBlur_SortNet<schar, int, int8x8_t>( src_data_rep, src_step, dst_data, dst_step, width, height, cn, ksize );
else if( depth == CV_16U )
medianBlur_SortNet<ushort, int, uint16x4_t>( src_data_rep, src_step, dst_data, dst_step, width, height, cn, ksize );
else if( depth == CV_16S )
medianBlur_SortNet<short, int, int16x4_t>( src_data_rep, src_step, dst_data, dst_step, width, height, cn, ksize );
else
return CV_HAL_ERROR_NOT_IMPLEMENTED;
return CV_HAL_ERROR_OK;
}
else return CV_HAL_ERROR_NOT_IMPLEMENTED;
}
} // namespace ndsrvp
} // namespace cv

View File

@ -4,6 +4,7 @@ target_include_directories(openvx_hal PUBLIC
${OPENCV_3P_OPENVX_DIR}/include
${CMAKE_SOURCE_DIR}/modules/core/include
${CMAKE_SOURCE_DIR}/modules/imgproc/include
${CMAKE_SOURCE_DIR}/modules/features2d/include
${OPENVX_INCLUDE_DIR})
target_link_libraries(openvx_hal PUBLIC ${OPENVX_LIBRARIES})
set_target_properties(openvx_hal PROPERTIES ARCHIVE_OUTPUT_DIRECTORY ${3P_LIBRARY_OUTPUT_PATH})

View File

@ -1,5 +1,7 @@
#include "openvx_hal.hpp"
#include "opencv2/core/hal/interface.h"
#include "opencv2/imgproc/hal/interface.h"
#include "opencv2/features2d/hal/interface.h"
#define IVX_HIDE_INFO_WARNINGS
#include "ivx.hpp"
@ -191,7 +193,7 @@ int ovx_hal_mul(const T *a, size_t astep, const T *b, size_t bstep, T *c, size_t
#ifdef _WIN32
const float MAGIC_SCALE = 0x0.01010102p0;
#else
const float MAGIC_SCALE = 0x1.010102p-8;
const float MAGIC_SCALE = 0.003922; // 0x1.010102p-8;
#endif
try
{
@ -1145,3 +1147,931 @@ int ovx_hal_integral(int depth, int sdepth, int, const uchar * a, size_t astep,
return CV_HAL_ERROR_OK;
}
int ovx_hal_meanStdDev(const uchar* src_data, size_t src_step, int width, int height,
int src_type, double* mean_val, double* stddev_val, uchar* mask, size_t mask_step)
{
(void)mask_step;
if (src_type != CV_8UC1 || mask)
{
return CV_HAL_ERROR_NOT_IMPLEMENTED;
}
if (skipSmallImages<VX_KERNEL_MEAN_STDDEV>(width, height))
{
return CV_HAL_ERROR_NOT_IMPLEMENTED;
}
if (src_step == 0)
{
src_step = (int)width;
}
try
{
ivx::Context ctx = getOpenVXHALContext();
#ifndef VX_VERSION_1_1
if (ctx.vendorID() == VX_ID_KHRONOS)
return false; // Do not use OpenVX meanStdDev estimation for sample 1.0.1 implementation due to lack of accuracy
#endif
ivx::Image ia = ivx::Image::createFromHandle(ctx, VX_DF_IMAGE_U8,
ivx::Image::createAddressing(width, height, 1, (vx_int32)src_step), const_cast<uchar*>(src_data));
vx_float32 mean_temp, stddev_temp;
ivx::IVX_CHECK_STATUS(vxuMeanStdDev(ctx, ia, &mean_temp, &stddev_temp));
if (mean_val)
{
mean_val[0] = mean_temp;
}
if (stddev_val)
{
stddev_val[0] = stddev_temp;
}
}
catch (const ivx::RuntimeError & e)
{
PRINT_HALERR_MSG(runtime);
return CV_HAL_ERROR_UNKNOWN;
}
catch (const ivx::WrapperError & e)
{
PRINT_HALERR_MSG(wrapper);
return CV_HAL_ERROR_UNKNOWN;
}
return CV_HAL_ERROR_OK;
}
int ovx_hal_lut(const uchar *src_data, size_t src_step, size_t src_type,
const uchar* lut_data, size_t lut_channel_size, size_t lut_channels,
uchar *dst_data, size_t dst_step, int width, int height)
{
if (src_type != CV_8UC1 || lut_channels != 1 || lut_channel_size != 1)
{
return CV_HAL_ERROR_NOT_IMPLEMENTED;
}
if (skipSmallImages<VX_KERNEL_TABLE_LOOKUP>(width, height))
{
return CV_HAL_ERROR_NOT_IMPLEMENTED;
}
try
{
ivx::Context ctx = getOpenVXHALContext();
ivx::Image ia = ivx::Image::createFromHandle(ctx, VX_DF_IMAGE_U8,
ivx::Image::createAddressing(width, height, 1, (vx_int32)src_step),
const_cast<uchar*>(src_data));
ivx::Image ib = ivx::Image::createFromHandle(ctx, VX_DF_IMAGE_U8,
ivx::Image::createAddressing(width, height, 1, (vx_int32)dst_step),
dst_data);
ivx::LUT lut = ivx::LUT::create(ctx);
lut.copyFrom(lut_data);
ivx::IVX_CHECK_STATUS(vxuTableLookup(ctx, ia, lut, ib));
}
catch (const ivx::RuntimeError & e)
{
PRINT_HALERR_MSG(runtime);
return CV_HAL_ERROR_UNKNOWN;
}
catch (const ivx::WrapperError & e)
{
PRINT_HALERR_MSG(wrapper);
return CV_HAL_ERROR_UNKNOWN;
}
return CV_HAL_ERROR_OK;
}
template <> inline bool skipSmallImages<VX_KERNEL_MINMAXLOC>(int w, int h) { return w*h < 3840 * 2160; }
int ovx_hal_minMaxIdxMaskStep(const uchar* src_data, size_t src_step, int width, int height, int depth,
double* minVal, double* maxVal, int* minIdx, int* maxIdx, uchar* mask, size_t mask_step)
{
(void)mask_step;
if ((depth != CV_8U && depth != CV_16S) || mask )
{
return CV_HAL_ERROR_NOT_IMPLEMENTED;
}
if (skipSmallImages<VX_KERNEL_MINMAXLOC>(width, height))
{
return CV_HAL_ERROR_NOT_IMPLEMENTED;
}
if (src_step == 0)
{
src_step = (int)width;
}
try
{
ivx::Context ctx = getOpenVXHALContext();
ivx::Image ia = ivx::Image::createFromHandle(ctx, depth == CV_8U ? VX_DF_IMAGE_U8 : VX_DF_IMAGE_S16,
ivx::Image::createAddressing(width, height, depth == CV_8U ? 1 : 2, (vx_int32)src_step),
const_cast<uchar*>(src_data));
ivx::Scalar vxMinVal = ivx::Scalar::create(ctx, depth == CV_8U ? VX_TYPE_UINT8 : VX_TYPE_INT16, 0);
ivx::Scalar vxMaxVal = ivx::Scalar::create(ctx, depth == CV_8U ? VX_TYPE_UINT8 : VX_TYPE_INT16, 0);
ivx::Array vxMinInd, vxMaxInd;
ivx::Scalar vxMinCount, vxMaxCount;
if (minIdx)
{
vxMinInd = ivx::Array::create(ctx, VX_TYPE_COORDINATES2D, 1);
vxMinCount = ivx::Scalar::create(ctx, VX_TYPE_UINT32, 0);
}
if (maxIdx)
{
vxMaxInd = ivx::Array::create(ctx, VX_TYPE_COORDINATES2D, 1);
vxMaxCount = ivx::Scalar::create(ctx, VX_TYPE_UINT32, 0);
}
ivx::IVX_CHECK_STATUS(vxuMinMaxLoc(ctx, ia, vxMinVal, vxMaxVal, vxMinInd, vxMaxInd, vxMinCount, vxMaxCount));
if (minVal)
{
*minVal = depth == CV_8U ? vxMinVal.getValue<vx_uint8>() : vxMinVal.getValue<vx_int16>();
}
if (maxVal)
{
*maxVal = depth == CV_8U ? vxMaxVal.getValue<vx_uint8>() : vxMaxVal.getValue<vx_int16>();
}
if (minIdx)
{
if(vxMinCount.getValue<vx_uint32>()<1) throw ivx::RuntimeError(VX_ERROR_INVALID_VALUE, std::string(__func__) + "(): minimum value location not found");
vx_coordinates2d_t loc;
vxMinInd.copyRangeTo(0, 1, &loc);
minIdx[0] = loc.y;
minIdx[1] = loc.x;
}
if (maxIdx)
{
if (vxMaxCount.getValue<vx_uint32>()<1) throw ivx::RuntimeError(VX_ERROR_INVALID_VALUE, std::string(__func__) + "(): maximum value location not found");
vx_coordinates2d_t loc;
vxMaxInd.copyRangeTo(0, 1, &loc);
maxIdx[0] = loc.y;
maxIdx[1] = loc.x;
}
}
catch (const ivx::RuntimeError & e)
{
PRINT_HALERR_MSG(runtime);
return CV_HAL_ERROR_UNKNOWN;
}
catch (const ivx::WrapperError & e)
{
PRINT_HALERR_MSG(wrapper);
return CV_HAL_ERROR_UNKNOWN;
}
return CV_HAL_ERROR_OK;
}
template <> inline bool skipSmallImages<VX_KERNEL_FAST_CORNERS>(int w, int h) { return w*h < 800 * 600; }
int ovx_hal_FAST(const uchar* src_data, size_t src_step, int width, int height, uchar* keypoints_data, size_t* keypoints_count,
int threshold, bool nonmax_suppression, int /*cv::FastFeatureDetector::DetectorType*/ dtype)
{
// Nonmax suppression is done differently in OpenCV than in OpenVX
// 9/16 is the only supported mode in OpenVX
if(nonmax_suppression || dtype != CV_HAL_TYPE_9_16)
{
return CV_HAL_ERROR_NOT_IMPLEMENTED;
}
if (skipSmallImages<VX_KERNEL_FAST_CORNERS>(width, height))
{
return CV_HAL_ERROR_NOT_IMPLEMENTED;
}
try
{
ivx::Context context = getOpenVXHALContext();
ivx::Image img = ivx::Image::createFromHandle(context, VX_DF_IMAGE_U8,
ivx::Image::createAddressing(width, height, 1, (vx_int32)src_step),
const_cast<uchar*>(src_data));
ivx::Scalar vxthreshold = ivx::Scalar::create<VX_TYPE_FLOAT32>(context, threshold);
vx_size capacity = width * height;
ivx::Array corners = ivx::Array::create(context, VX_TYPE_KEYPOINT, capacity);
ivx::Scalar numCorners = ivx::Scalar::create<VX_TYPE_SIZE>(context, 0);
ivx::IVX_CHECK_STATUS(vxuFastCorners(context, img, vxthreshold, (vx_bool)nonmax_suppression, corners, numCorners));
size_t nPoints = numCorners.getValue<vx_size>();
std::vector<vx_keypoint_t> vxCorners(nPoints);
corners.copyTo(vxCorners);
cvhalKeyPoint* keypoints = (cvhalKeyPoint*)keypoints_data;
for(size_t i = 0; i < std::min(nPoints, *keypoints_count); i++)
{
//if nonmaxSuppression is false, vxCorners[i].strength is undefined
keypoints[i].x = vxCorners[i].x;
keypoints[i].y = vxCorners[i].y;
keypoints[i].size = 7;
keypoints[i].angle = -1;
keypoints[i].response = vxCorners[i].strength;
}
*keypoints_count = std::min(nPoints, *keypoints_count);
#ifdef VX_VERSION_1_1
//we should take user memory back before release
//(it's not done automatically according to standard)
img.swapHandle();
#endif
}
catch (const ivx::RuntimeError & e)
{
PRINT_HALERR_MSG(runtime);
return CV_HAL_ERROR_UNKNOWN;
}
catch (const ivx::WrapperError & e)
{
PRINT_HALERR_MSG(wrapper);
return CV_HAL_ERROR_UNKNOWN;
}
return CV_HAL_ERROR_OK;
}
template <> inline bool skipSmallImages<VX_KERNEL_MEDIAN_3x3>(int w, int h) { return w*h < 1280 * 720; }
int ovx_hal_medianBlur(const uchar* src_data, size_t src_step, uchar* dst_data, size_t dst_step,
int width, int height, int depth, int cn, int ksize)
{
if (depth != CV_8U || cn != 1
#ifndef VX_VERSION_1_1
|| ksize != 3
#endif
)
{
return CV_HAL_ERROR_NOT_IMPLEMENTED;
}
if (
#ifdef VX_VERSION_1_1
ksize != 3 ? skipSmallImages<VX_KERNEL_NON_LINEAR_FILTER>(width, height) :
#endif
skipSmallImages<VX_KERNEL_MEDIAN_3x3>(width, height)
)
{
return CV_HAL_ERROR_NOT_IMPLEMENTED;
}
try
{
ivx::Context ctx = getOpenVXHALContext();
#ifdef VX_VERSION_1_1
if ((vx_size)ksize > ctx.nonlinearMaxDimension())
{
return CV_HAL_ERROR_NOT_IMPLEMENTED;
}
#endif
ivx::Image ia = ivx::Image::createFromHandle(ctx, VX_DF_IMAGE_U8,
ivx::Image::createAddressing(width, height, 1, (vx_int32)src_step),
const_cast<uchar*>(src_data));
ivx::Image ib = ivx::Image::createFromHandle(ctx, VX_DF_IMAGE_U8,
ivx::Image::createAddressing(width, height, 1, (vx_int32)(dst_step)),
dst_data);
//ATTENTION: VX_CONTEXT_IMMEDIATE_BORDER attribute change could lead to strange issues in multi-threaded environments
//since OpenVX standard says nothing about thread-safety for now
ivx::border_t prevBorder = ctx.immediateBorder();
ctx.setImmediateBorder(VX_BORDER_REPLICATE);
#ifdef VX_VERSION_1_1
if (ksize == 3)
#endif
{
ivx::IVX_CHECK_STATUS(vxuMedian3x3(ctx, ia, ib));
}
#ifdef VX_VERSION_1_1
else
{
ivx::Matrix mtx;
if(ksize == 5)
mtx = ivx::Matrix::createFromPattern(ctx, VX_PATTERN_BOX, ksize, ksize);
else
{
vx_size supportedSize;
ivx::IVX_CHECK_STATUS(vxQueryContext(ctx, VX_CONTEXT_NONLINEAR_MAX_DIMENSION, &supportedSize, sizeof(supportedSize)));
if ((vx_size)ksize > supportedSize)
{
ctx.setImmediateBorder(prevBorder);
return false;
}
std::vector<uchar> mtx_data(ksize*ksize, 255);
mtx = ivx::Matrix::create(ctx, VX_TYPE_UINT8, ksize, ksize);
mtx.copyFrom(&mtx_data[0]);
}
ivx::IVX_CHECK_STATUS(vxuNonLinearFilter(ctx, VX_NONLINEAR_FILTER_MEDIAN, ia, mtx, ib));
}
#endif
ctx.setImmediateBorder(prevBorder);
}
catch (const ivx::RuntimeError & e)
{
PRINT_HALERR_MSG(runtime);
return CV_HAL_ERROR_UNKNOWN;
}
catch (const ivx::WrapperError & e)
{
PRINT_HALERR_MSG(wrapper);
return CV_HAL_ERROR_UNKNOWN;
}
return CV_HAL_ERROR_OK;
}
template <> inline bool skipSmallImages<VX_KERNEL_SOBEL_3x3>(int w, int h) { return w*h < 320 * 240; }
int ovx_hal_sobel(const uchar* src_data, size_t src_step, uchar* dst_data, size_t dst_step, int width, int height, int src_depth, int dst_depth, int cn, int margin_left, int margin_top, int margin_right, int margin_bottom, int dx, int dy, int ksize, double scale, double delta, int border_type)
{
if (cn != 1 || src_depth != CV_8U || dst_depth != CV_16S ||
ksize != 3 || scale != 1.0 || delta != 0.0 ||
(dx | dy) != 1 || (dx + dy) != 1 || width < ksize || height < ksize)
{
return CV_HAL_ERROR_NOT_IMPLEMENTED;
}
// ~BORDER_ISOLATED case not supported for now
if (margin_left != 0 || margin_top != 0 || margin_right != 0 || margin_bottom != 0)
{
return CV_HAL_ERROR_NOT_IMPLEMENTED;
}
if (skipSmallImages<VX_KERNEL_SOBEL_3x3>(width, height))
{
return CV_HAL_ERROR_NOT_IMPLEMENTED;
}
vx_enum border;
switch (border_type)
{
case CV_HAL_BORDER_CONSTANT:
border = VX_BORDER_CONSTANT;
break;
case CV_HAL_BORDER_REPLICATE:
// border = VX_BORDER_REPLICATE;
// break;
default:
return CV_HAL_ERROR_NOT_IMPLEMENTED;
}
try
{
ivx::Context ctx = getOpenVXHALContext();
//if ((vx_size)ksize > ctx.convolutionMaxDimension())
// return false;
ivx::Image ia = ivx::Image::createFromHandle(ctx, VX_DF_IMAGE_U8,
ivx::Image::createAddressing(width, height, 1, (vx_int32)(src_step)),
const_cast<uchar*>(src_data));
ivx::Image ib = ivx::Image::createFromHandle(ctx, VX_DF_IMAGE_S16,
ivx::Image::createAddressing(width, height, 2, (vx_int32)dst_step),
dst_data);
//ATTENTION: VX_CONTEXT_IMMEDIATE_BORDER attribute change could lead to strange issues in multi-threaded environments
//since OpenVX standard says nothing about thread-safety for now
ivx::border_t prevBorder = ctx.immediateBorder();
ctx.setImmediateBorder(border, (vx_uint8)(0));
if(dx)
ivx::IVX_CHECK_STATUS(vxuSobel3x3(ctx, ia, ib, NULL));
else
ivx::IVX_CHECK_STATUS(vxuSobel3x3(ctx, ia, NULL, ib));
ctx.setImmediateBorder(prevBorder);
}
catch (const ivx::RuntimeError & e)
{
PRINT_HALERR_MSG(runtime);
return CV_HAL_ERROR_UNKNOWN;
}
catch (const ivx::WrapperError & e)
{
PRINT_HALERR_MSG(wrapper);
return CV_HAL_ERROR_UNKNOWN;
}
return CV_HAL_ERROR_OK;
}
template <> inline bool skipSmallImages<VX_KERNEL_CANNY_EDGE_DETECTOR>(int w, int h) { return w*h < 640 * 480; }
int ovx_hal_canny(const uchar* src_data, size_t src_step, uchar* dst_data, size_t dst_step,
int width, int height, int cn, double lowThreshold, double highThreshold, int ksize, bool L2gradient)
{
if (cn != 1 || width <= ksize || height <= ksize)
{
return CV_HAL_ERROR_NOT_IMPLEMENTED;
}
if (skipSmallImages<VX_KERNEL_CANNY_EDGE_DETECTOR>(width, height))
{
return CV_HAL_ERROR_NOT_IMPLEMENTED;
}
ivx::Context context = getOpenVXHALContext();
try
{
ivx::Image _src = ivx::Image::createFromHandle(context, VX_DF_IMAGE_U8,
ivx::Image::createAddressing(width, height, 1, (vx_int32)src_step),
const_cast<uchar*>(src_data));
ivx::Image _dst = ivx::Image::createFromHandle( context, VX_DF_IMAGE_U8,
ivx::Image::createAddressing(width, height, 1, (vx_int32)dst_step),
dst_data);
ivx::Threshold threshold = ivx::Threshold::createRange(context, VX_TYPE_UINT8,
(vx_int32)lowThreshold,
(vx_int32)highThreshold);
#if 0
// the code below is disabled because vxuCannyEdgeDetector()
// ignores context attribute VX_CONTEXT_IMMEDIATE_BORDER
// FIXME: may fail in multithread case
border_t prevBorder = context.immediateBorder();
context.setImmediateBorder(VX_BORDER_REPLICATE);
IVX_CHECK_STATUS( vxuCannyEdgeDetector(context, _src, threshold, ksize, (L2gradient ? VX_NORM_L2 : VX_NORM_L1), _dst) );
context.setImmediateBorder(prevBorder);
#else
// alternative code without vxuCannyEdgeDetector()
ivx::Graph graph = ivx::Graph::create(context);
ivx::Node node = ivx::Node(vxCannyEdgeDetectorNode(graph, _src, threshold, ksize,
(L2gradient ? VX_NORM_L2 : VX_NORM_L1), _dst) );
node.setBorder(VX_BORDER_REPLICATE);
graph.verify();
graph.process();
#endif
#ifdef VX_VERSION_1_1
_src.swapHandle();
_dst.swapHandle();
#endif
}
catch (const ivx::RuntimeError & e)
{
PRINT_HALERR_MSG(runtime);
return CV_HAL_ERROR_UNKNOWN;
}
catch (const ivx::WrapperError & e)
{
PRINT_HALERR_MSG(wrapper);
return CV_HAL_ERROR_UNKNOWN;
}
return CV_HAL_ERROR_OK;
}
// static bool openvx_pyrDown( InputArray _src, OutputArray _dst, const Size& _dsz, int borderType )
int ovx_hal_pyrdown(const uchar* src_data, size_t src_step, int src_width, int src_height,
uchar* dst_data, size_t dst_step, int dst_width, int dst_height, int depth, int cn, int border_type)
{
if (depth != CV_8U || border_type != CV_HAL_BORDER_REPLICATE)
{
return CV_HAL_ERROR_NOT_IMPLEMENTED;
}
if (skipSmallImages<VX_KERNEL_HALFSCALE_GAUSSIAN>(src_width, src_height))
{
return CV_HAL_ERROR_NOT_IMPLEMENTED;
}
// The only border mode which is supported by both cv::pyrDown() and OpenVX
// and produces predictable results
ivx::border_t borderMode;
borderMode.mode = VX_BORDER_REPLICATE;
try
{
ivx::Context context = getOpenVXHALContext();
if(context.vendorID() == VX_ID_KHRONOS)
{
// This implementation performs floor-like rounding
// (OpenCV uses floor(x+0.5)-like rounding)
// and ignores border mode (and loses 1px size border)
return CV_HAL_ERROR_NOT_IMPLEMENTED;
}
ivx::Image srcImg = ivx::Image::createFromHandle(context, ivx::Image::matTypeToFormat(CV_8UC(cn)),
ivx::Image::createAddressing(src_width, src_height, 1, (vx_int32)src_step),
const_cast<uchar*>(src_data));
ivx::Image dstImg = ivx::Image::createFromHandle(context, ivx::Image::matTypeToFormat(CV_8UC(cn)),
ivx::Image::createAddressing(dst_width, dst_height, 1, (vx_int32)dst_step),
dst_data);
ivx::Scalar kernelSize = ivx::Scalar::create<VX_TYPE_INT32>(context, 5);
ivx::Graph graph = ivx::Graph::create(context);
ivx::Node halfNode = ivx::Node::create(graph, VX_KERNEL_HALFSCALE_GAUSSIAN, srcImg, dstImg, kernelSize);
halfNode.setBorder(borderMode);
graph.verify();
graph.process();
#ifdef VX_VERSION_1_1
//we should take user memory back before release
//(it's not done automatically according to standard)
srcImg.swapHandle(); dstImg.swapHandle();
#endif
}
catch (const ivx::RuntimeError & e)
{
PRINT_HALERR_MSG(runtime);
return CV_HAL_ERROR_UNKNOWN;
}
catch (const ivx::WrapperError & e)
{
PRINT_HALERR_MSG(wrapper);
return CV_HAL_ERROR_UNKNOWN;
}
return CV_HAL_ERROR_OK;
}
template <> inline bool skipSmallImages<VX_KERNEL_BOX_3x3>(int w, int h) { return w*h < 640 * 480; }
int ovx_hal_boxFilter(const uchar* src_data, size_t src_step, uchar* dst_data, size_t dst_step,
int width, int height, int src_depth, int dst_depth, int cn,
int margin_left, int margin_top, int margin_right, int margin_bottom,
size_t ksize_width, size_t ksize_height, int anchor_x, int anchor_y,
bool normalize, int border_type)
{
if (src_depth != CV_8U || cn != 1 || ksize_width != 3 || ksize_height != 3 || dst_depth != CV_8U ||
(anchor_x >= 0 && anchor_x != 1) || (anchor_y >= 0 && anchor_y != 1) || !normalize)
{
return CV_HAL_ERROR_NOT_IMPLEMENTED;
}
// ~BORDER_ISOLATED case not supported for now
if (margin_left != 0 || margin_top != 0 || margin_right != 0 || margin_bottom != 0)
{
return CV_HAL_ERROR_NOT_IMPLEMENTED;
}
if(skipSmallImages<VX_KERNEL_BOX_3x3>(width, height))
{
return CV_HAL_ERROR_NOT_IMPLEMENTED;
}
vx_enum border;
switch (border_type)
{
case CV_HAL_BORDER_CONSTANT:
border = VX_BORDER_CONSTANT;
break;
case CV_HAL_BORDER_REPLICATE:
border = VX_BORDER_REPLICATE;
break;
default:
return false;
}
try
{
ivx::Context ctx = getOpenVXHALContext();
ivx::Image ia = ivx::Image::createFromHandle(ctx, VX_DF_IMAGE_U8,
ivx::Image::createAddressing(width, height, 1, (vx_int32)src_step),
const_cast<uchar*>(src_data));
ivx::Image ib = ivx::Image::createFromHandle(ctx, VX_DF_IMAGE_U8,
ivx::Image::createAddressing(width, height, 1, (vx_int32)dst_step),
dst_data);
//ATTENTION: VX_CONTEXT_IMMEDIATE_BORDER attribute change could lead to strange issues in multi-threaded environments
//since OpenVX standard says nothing about thread-safety for now
ivx::border_t prevBorder = ctx.immediateBorder();
ctx.setImmediateBorder(border, (vx_uint8)(0));
ivx::IVX_CHECK_STATUS(vxuBox3x3(ctx, ia, ib));
ctx.setImmediateBorder(prevBorder);
}
catch (const ivx::RuntimeError & e)
{
PRINT_HALERR_MSG(runtime);
return CV_HAL_ERROR_UNKNOWN;
}
catch (const ivx::WrapperError & e)
{
PRINT_HALERR_MSG(wrapper);
return CV_HAL_ERROR_UNKNOWN;
}
return CV_HAL_ERROR_OK;
}
int ovx_hal_equalize_hist(const uchar* src_data, size_t src_step, uchar* dst_data, size_t dst_step, int width, int height)
{
if (skipSmallImages<VX_KERNEL_EQUALIZE_HISTOGRAM>(width, height))
{
return CV_HAL_ERROR_NOT_IMPLEMENTED;
}
try
{
ivx::Context context = getOpenVXHALContext();
ivx::Image srcImage = ivx::Image::createFromHandle(context, VX_DF_IMAGE_U8,
ivx::Image::createAddressing(width, height, 1, (vx_int32)src_step),
const_cast<uchar*>(src_data));
ivx::Image dstImage = ivx::Image::createFromHandle(context, VX_DF_IMAGE_U8,
ivx::Image::createAddressing(width, height, 1, (vx_int32)dst_step),
dst_data);
ivx::IVX_CHECK_STATUS(vxuEqualizeHist(context, srcImage, dstImage));
#ifdef VX_VERSION_1_1
//we should take user memory back before release
//(it's not done automatically according to standard)
srcImage.swapHandle(); dstImage.swapHandle();
#endif
}
catch (const ivx::RuntimeError & e)
{
PRINT_HALERR_MSG(runtime);
return CV_HAL_ERROR_UNKNOWN;
}
catch (const ivx::WrapperError & e)
{
PRINT_HALERR_MSG(wrapper);
return CV_HAL_ERROR_UNKNOWN;
}
return CV_HAL_ERROR_OK;
}
int ovx_hal_gaussianBlur(const uchar* src_data, size_t src_step, uchar* dst_data, size_t dst_step, int width, int height,
int depth, int cn, size_t margin_left, size_t margin_top, size_t margin_right, size_t margin_bottom,
size_t ksize_width, size_t ksize_height, double sigmaX, double sigmaY, int border_type)
{
if (sigmaY <= 0)
sigmaY = sigmaX;
// automatic detection of kernel size from sigma
if (ksize_width <= 0 && sigmaX > 0)
ksize_width = (vx_int32)(sigmaX*6 + 1) | 1;
if (ksize_height <= 0 && sigmaY > 0)
ksize_height = (vx_int32)(sigmaY*6 + 1) | 1;
if (depth != CV_8U || cn != 1 || width < 3 || height < 3 ||
ksize_width != 3 || ksize_height != 3)
{
return CV_HAL_ERROR_NOT_IMPLEMENTED;
}
sigmaX = std::max(sigmaX, 0.);
sigmaY = std::max(sigmaY, 0.);
if (!(sigmaX == 0.0 || (sigmaX - 0.8) < DBL_EPSILON) || !(sigmaY == 0.0 || (sigmaY - 0.8) < DBL_EPSILON))
{
return CV_HAL_ERROR_NOT_IMPLEMENTED;
}
// ~BORDER_ISOLATED case not supported for now
if (margin_left != 0 || margin_top != 0 || margin_right != 0 || margin_bottom != 0)
{
return CV_HAL_ERROR_NOT_IMPLEMENTED;
}
if (skipSmallImages<VX_KERNEL_GAUSSIAN_3x3>(width, height))
{
return CV_HAL_ERROR_NOT_IMPLEMENTED;
}
vx_enum border;
switch (border_type)
{
case CV_HAL_BORDER_CONSTANT:
border = VX_BORDER_CONSTANT;
break;
case CV_HAL_BORDER_REPLICATE:
border = VX_BORDER_REPLICATE;
break;
default:
return false;
}
try
{
ivx::Context ctx = getOpenVXHALContext();
ivx::Image ia = ivx::Image::createFromHandle(ctx, VX_DF_IMAGE_U8,
ivx::Image::createAddressing(width, height, 1, (vx_int32)src_step),
const_cast<uchar*>(src_data));
ivx::Image ib = ivx::Image::createFromHandle(ctx, VX_DF_IMAGE_U8,
ivx::Image::createAddressing(width, height, 1, (vx_int32)dst_step),
dst_data);
//ATTENTION: VX_CONTEXT_IMMEDIATE_BORDER attribute change could lead to strange issues in multi-threaded environments
//since OpenVX standard says nothing about thread-safety for now
ivx::border_t prevBorder = ctx.immediateBorder();
ctx.setImmediateBorder(border, (vx_uint8)(0));
ivx::IVX_CHECK_STATUS(vxuGaussian3x3(ctx, ia, ib));
ctx.setImmediateBorder(prevBorder);
}
catch (const ivx::RuntimeError & e)
{
PRINT_HALERR_MSG(runtime);
return CV_HAL_ERROR_UNKNOWN;
}
catch (const ivx::WrapperError & e)
{
PRINT_HALERR_MSG(wrapper);
return CV_HAL_ERROR_UNKNOWN;
}
return CV_HAL_ERROR_OK;
}
int ovx_hal_remap32f(int src_type, const uchar *src_data, size_t src_step, int src_width, int src_height,
uchar *dst_data, size_t dst_step, int dst_width, int dst_height,
float* mapx, size_t mapx_step, float* mapy, size_t mapy_step,
int interpolation, int border_type, const double border_value[4])
{
if (src_type != CV_8UC1 || border_type != CV_HAL_BORDER_CONSTANT || (interpolation & CV_HAL_WARP_RELATIVE_MAP))
{
return CV_HAL_ERROR_NOT_IMPLEMENTED;
}
if (skipSmallImages<VX_KERNEL_REMAP>(src_width, src_height))
{
return CV_HAL_ERROR_NOT_IMPLEMENTED;
}
vx_interpolation_type_e inter_type;
switch (interpolation)
{
case CV_HAL_INTER_LINEAR:
#if VX_VERSION > VX_VERSION_1_0
inter_type = VX_INTERPOLATION_BILINEAR;
#else
inter_type = VX_INTERPOLATION_TYPE_BILINEAR;
#endif
break;
case CV_HAL_INTER_NEAREST:
/* NEAREST_NEIGHBOR mode disabled since OpenCV round half to even while OpenVX sample implementation round half up
#if VX_VERSION > VX_VERSION_1_0
inter_type = VX_INTERPOLATION_NEAREST_NEIGHBOR;
#else
inter_type = VX_INTERPOLATION_TYPE_NEAREST_NEIGHBOR;
#endif
if (!map1.empty())
for (int y = 0; y < map1.rows; ++y)
{
float* line = map1.ptr<float>(y);
for (int x = 0; x < map1.cols; ++x)
line[x] = cvRound(line[x]);
}
if (!map2.empty())
for (int y = 0; y < map2.rows; ++y)
{
float* line = map2.ptr<float>(y);
for (int x = 0; x < map2.cols; ++x)
line[x] = cvRound(line[x]);
}
break;
*/
case CV_HAL_INTER_AREA://AREA interpolation mode is unsupported
default:
return CV_HAL_ERROR_NOT_IMPLEMENTED;
}
try
{
ivx::Context ctx = getOpenVXHALContext();
ivx::Image ia = ivx::Image::createFromHandle(ctx, VX_DF_IMAGE_U8,
ivx::Image::createAddressing(src_width, src_height, 1, (vx_int32)src_step),
const_cast<uchar*>(src_data));
ivx::Image ib = ivx::Image::createFromHandle(ctx, VX_DF_IMAGE_U8,
ivx::Image::createAddressing(dst_width, dst_height, 1, (vx_int32)dst_step),
dst_data);
//ATTENTION: VX_CONTEXT_IMMEDIATE_BORDER attribute change could lead to strange issues in multi-threaded environments
//since OpenVX standard says nothing about thread-safety for now
ivx::border_t prevBorder = ctx.immediateBorder();
ctx.setImmediateBorder(VX_BORDER_CONSTANT, (vx_uint8)(border_value[0]));
ivx::Remap map = ivx::Remap::create(ctx, src_width, src_height, dst_width, dst_height);
if (!mapx) map.setMappings(mapy, mapy_step);
else if (!mapy) map.setMappings(mapx, mapx_step);
else map.setMappings(mapx, mapx_step, mapy, mapy_step);
ivx::IVX_CHECK_STATUS(vxuRemap(ctx, ia, map, inter_type, ib));
#ifdef VX_VERSION_1_1
ib.swapHandle();
ia.swapHandle();
#endif
ctx.setImmediateBorder(prevBorder);
}
catch (const ivx::RuntimeError & e)
{
PRINT_HALERR_MSG(runtime);
return CV_HAL_ERROR_UNKNOWN;
}
catch (const ivx::WrapperError & e)
{
PRINT_HALERR_MSG(wrapper);
return CV_HAL_ERROR_UNKNOWN;
}
return CV_HAL_ERROR_OK;
}
#define IMPL_OPENVX_TOZERO 1
int ovx_hal_threshold(const uchar* src_data, size_t src_step, uchar* dst_data, size_t dst_step,
int width, int height, int depth, int cn, double thresh, double maxValue, int thresholdType)
{
if(depth != CV_8U)
{
return CV_HAL_ERROR_NOT_IMPLEMENTED;
}
int trueVal, falseVal;
switch (thresholdType)
{
case CV_HAL_THRESH_BINARY:
#ifndef VX_VERSION_1_1
if (maxValue != 255)
return CV_HAL_ERROR_NOT_IMPLEMENTED;
#endif
trueVal = maxValue;
falseVal = 0;
break;
case CV_HAL_THRESH_TOZERO:
#if IMPL_OPENVX_TOZERO
trueVal = 255;
falseVal = 0;
break;
#endif
case CV_HAL_THRESH_BINARY_INV:
#ifdef VX_VERSION_1_1
trueVal = 0;
falseVal = maxValue;
break;
#endif
case CV_HAL_THRESH_TOZERO_INV:
#ifdef VX_VERSION_1_1
#if IMPL_OPENVX_TOZERO
trueVal = 0;
falseVal = 255;
break;
#endif
#endif
case CV_HAL_THRESH_TRUNC:
default:
return CV_HAL_ERROR_NOT_IMPLEMENTED;
}
try
{
ivx::Context ctx = getOpenVXHALContext();
ivx::Threshold thh = ivx::Threshold::createBinary(ctx, VX_TYPE_UINT8, thresh);
thh.setValueTrue(trueVal);
thh.setValueFalse(falseVal);
ivx::Image ia = ivx::Image::createFromHandle(ctx, VX_DF_IMAGE_U8,
ivx::Image::createAddressing(width*cn, height, 1, (vx_int32)src_step),
const_cast<uchar*>(src_data));
ivx::Image ib = ivx::Image::createFromHandle(ctx, VX_DF_IMAGE_U8,
ivx::Image::createAddressing(width*cn, height, 1, (vx_int32)dst_step),
dst_data);
ivx::IVX_CHECK_STATUS(vxuThreshold(ctx, ia, thh, ib));
#if IMPL_OPENVX_TOZERO
if (thresholdType == CV_HAL_THRESH_TOZERO || thresholdType == CV_HAL_THRESH_TOZERO_INV)
{
ivx::Image ic = ivx::Image::createFromHandle(ctx, VX_DF_IMAGE_U8,
ivx::Image::createAddressing(width*cn, height, 1, (vx_int32)dst_step), dst_data);
ivx::IVX_CHECK_STATUS(vxuAnd(ctx, ib, ia, ic));
}
#endif
}
catch (const ivx::RuntimeError & e)
{
PRINT_HALERR_MSG(runtime);
return CV_HAL_ERROR_UNKNOWN;
}
catch (const ivx::WrapperError & e)
{
PRINT_HALERR_MSG(wrapper);
return CV_HAL_ERROR_UNKNOWN;
}
return CV_HAL_ERROR_OK;
}

View File

@ -54,6 +54,33 @@ int ovx_hal_cvtThreePlaneYUVtoBGR(const uchar * a, size_t astep, uchar * b, size
int ovx_hal_cvtBGRtoThreePlaneYUV(const uchar * a, size_t astep, uchar * b, size_t bstep, int w, int h, int acn, bool swapBlue, int uIdx);
int ovx_hal_cvtOnePlaneYUVtoBGR(const uchar * a, size_t astep, uchar * b, size_t bstep, int w, int h, int bcn, bool swapBlue, int uIdx, int ycn);
int ovx_hal_integral(int depth, int sdepth, int, const uchar * a, size_t astep, uchar * b, size_t bstep, uchar * c, size_t, uchar * d, size_t, int w, int h, int cn);
int ovx_hal_meanStdDev(const uchar* src_data, size_t src_step, int width, int height,
int src_type, double* mean_val, double* stddev_val, uchar* mask, size_t mask_step);
int ovx_hal_lut(const uchar *src_data, size_t src_step, size_t src_type, const uchar* lut_data, size_t lut_channel_size, size_t lut_channels, uchar *dst_data, size_t dst_step, int width, int height);
int ovx_hal_minMaxIdxMaskStep(const uchar* src_data, size_t src_step, int width, int height, int depth,
double* minVal, double* maxVal, int* minIdx, int* maxIdx, uchar* mask, size_t mask_step);
int ovx_hal_medianBlur(const uchar* src_data, size_t src_step, uchar* dst_data, size_t dst_step, int width, int height, int depth, int cn, int ksize);
int ovx_hal_sobel(const uchar* src_data, size_t src_step, uchar* dst_data, size_t dst_step, int width, int height, int src_depth, int dst_depth, int cn, int margin_left, int margin_top, int margin_right, int margin_bottom, int dx, int dy, int ksize, double scale, double delta, int border_type);
int ovx_hal_canny(const uchar* src_data, size_t src_step, uchar* dst_data, size_t dst_step,
int width, int height, int cn, double lowThreshold, double highThreshold, int ksize, bool L2gradient);
int ovx_hal_pyrdown(const uchar* src_data, size_t src_step, int src_width, int src_height,
uchar* dst_data, size_t dst_step, int dst_width, int dst_height, int depth, int cn, int border_type);
int ovx_hal_boxFilter(const uchar* src_data, size_t src_step, uchar* dst_data, size_t dst_step,
int width, int height, int src_depth, int dst_depth, int cn,
int margin_left, int margin_top, int margin_right, int margin_bottom,
size_t ksize_width, size_t ksize_height, int anchor_x, int anchor_y, bool normalize, int border_type);
int ovx_hal_equalize_hist(const uchar* src_data, size_t src_step, uchar* dst_data, size_t dst_step, int width, int height);
int ovx_hal_gaussianBlur(const uchar* src_data, size_t src_step, uchar* dst_data, size_t dst_step, int width, int height,
int depth, int cn, size_t margin_left, size_t margin_top, size_t margin_right, size_t margin_bottom,
size_t ksize_width, size_t ksize_height, double sigmaX, double sigmaY, int border_type);
int ovx_hal_remap32f(int src_type, const uchar *src_data, size_t src_step, int src_width, int src_height,
uchar *dst_data, size_t dst_step, int dst_width, int dst_height,
float* mapx, size_t mapx_step, float* mapy, size_t mapy_step,
int interpolation, int border_type, const double border_value[4]);
int ovx_hal_threshold(const uchar* src_data, size_t src_step, uchar* dst_data, size_t dst_step,
int width, int height, int depth, int cn, double thresh, double maxValue, int thresholdType);
int ovx_hal_FAST(const uchar* src_data, size_t src_step, int width, int height, uchar* keypoints_data, size_t* keypoints_count,
int threshold, bool nonmax_suppression, int /*cv::FastFeatureDetector::DetectorType*/ dtype);
//==================================================================================================
// functions redefinition
@ -141,5 +168,11 @@ int ovx_hal_integral(int depth, int sdepth, int, const uchar * a, size_t astep,
#define cv_hal_cvtOnePlaneYUVtoBGR ovx_hal_cvtOnePlaneYUVtoBGR
#undef cv_hal_integral
#define cv_hal_integral ovx_hal_integral
#undef cv_hal_meanStdDev
#define cv_hal_meanStdDev ovx_hal_meanStdDev
#undef cv_hal_lut
#define cv_hal_lut ovx_hal_lut
#undef cv_hal_minMaxIdxMaskStep
#define cv_hal_minMaxIdxMaskStep ovx_hal_minMaxIdxMaskStep
#endif

View File

@ -22,7 +22,13 @@ Details: TBD
#include <VX/vx.h>
#include <VX/vxu.h>
#ifndef VX_VERSION_1_1
// For OpenVX 1.2 & 1.3
#if (VX_VERSION > VX_VERSION_1_1)
# include <VX/vx_compatibility.h>
#endif
#if (VX_VERSION == VX_VERSION_1_0)
// 1.1 to 1.0 backward compatibility defines
static const vx_enum VX_INTERPOLATION_BILINEAR = VX_INTERPOLATION_TYPE_BILINEAR;
@ -32,12 +38,6 @@ static const vx_enum VX_INTERPOLATION_NEAREST_NEIGHBOR = VX_INTERPOLATION_TYPE_N
static const vx_enum VX_BORDER_CONSTANT = VX_BORDER_MODE_CONSTANT;
static const vx_enum VX_BORDER_REPLICATE = VX_BORDER_MODE_REPLICATE;
#else
#ifdef IVX_RENAMED_REFS
static const vx_enum VX_REF_ATTRIBUTE_TYPE = VX_REFERENCE_TYPE;
#endif
#endif
#ifndef IVX_USE_CXX98
@ -218,7 +218,7 @@ template<> struct TypeToEnum<vx_int64> { static const vx_enum value = VX_TYPE
template<> struct TypeToEnum<vx_uint64> { static const vx_enum value = VX_TYPE_UINT64; };
template<> struct TypeToEnum<vx_float32> { static const vx_enum value = VX_TYPE_FLOAT32, imgType = VX_DF_IMAGE('F', '0', '3', '2'); };
template<> struct TypeToEnum<vx_float64> { static const vx_enum value = VX_TYPE_FLOAT64; };
template<> struct TypeToEnum<vx_bool> { static const vx_enum value = VX_TYPE_BOOL; };
//template<> struct TypeToEnum<vx_bool> { static const vx_enum value = VX_TYPE_BOOL; };
template<> struct TypeToEnum<vx_keypoint_t> {static const vx_enum value = VX_TYPE_KEYPOINT; };
// the commented types are aliases (of integral tyes) and have conflicts with the types above
//template<> struct TypeToEnum<vx_enum> { static const vx_enum val = VX_TYPE_ENUM; };
@ -1717,6 +1717,22 @@ static const vx_enum
#endif
}
/// Convert cv::Mat type to standard image format (fourcc), throws WrapperError if not possible
static vx_df_image matTypeToFormat(int matType)
{
switch (matType)
{
case CV_8UC4: return VX_DF_IMAGE_RGBX;
case CV_8UC3: return VX_DF_IMAGE_RGB;
case CV_8UC1: return VX_DF_IMAGE_U8;
case CV_16UC1: return VX_DF_IMAGE_U16;
case CV_16SC1: return VX_DF_IMAGE_S16;
case CV_32SC1: return VX_DF_IMAGE_S32;
case CV_32FC1: return VX_DF_IMAGE('F', '0', '3', '2');
default: throw WrapperError(std::string(__func__)+"(): unsupported cv::Mat type");
}
}
#ifdef IVX_USE_OPENCV
/// Convert image format (fourcc) to cv::Mat type, throws WrapperError if not possible
static int formatToMatType(vx_df_image format, vx_uint32 planeIdx = 0)
@ -1742,22 +1758,6 @@ static const vx_enum
}
}
/// Convert cv::Mat type to standard image format (fourcc), throws WrapperError if not possible
static vx_df_image matTypeToFormat(int matType)
{
switch (matType)
{
case CV_8UC4: return VX_DF_IMAGE_RGBX;
case CV_8UC3: return VX_DF_IMAGE_RGB;
case CV_8UC1: return VX_DF_IMAGE_U8;
case CV_16UC1: return VX_DF_IMAGE_U16;
case CV_16SC1: return VX_DF_IMAGE_S16;
case CV_32SC1: return VX_DF_IMAGE_S32;
case CV_32FC1: return VX_DF_IMAGE('F', '0', '3', '2');
default: throw WrapperError(std::string(__func__)+"(): unsupported cv::Mat type");
}
}
/// Initialize cv::Mat shape to fit the specified image plane data
void createMatForPlane(cv::Mat& m, vx_uint32 planeIdx)
{
@ -3177,6 +3177,27 @@ public:
void getMapping(vx_uint32 dst_x, vx_uint32 dst_y, vx_float32 &src_x, vx_float32 &src_y) const
{ IVX_CHECK_STATUS(vxGetRemapPoint(ref, dst_x, dst_y, &src_x, &src_y)); }
void setMappings(vx_float32* map_x, size_t map_x_stride, vx_float32* map_y, size_t map_y_stride)
{
for (vx_uint32 y = 0; y < dstHeight(); y++)
{
const vx_float32* map_x_line = (vx_float32*)((char*)map_x + y*map_x_stride);
const vx_float32* map_y_line = (vx_float32*)((char*)map_y + y*map_y_stride);
for (vx_uint32 x = 0; x < dstWidth(); x++)
setMapping(x, y, map_x_line[x], map_y_line[x]);
}
}
void setMappings(vx_float32* map, size_t map_stride)
{
for (vx_uint32 y = 0; y < dstHeight(); y++)
{
const vx_float32* map_line = (vx_float32*)((char*)map + y*map_stride);
for (vx_uint32 x = 0; x < 2*dstWidth(); x+=2)
setMapping(x, y, map_line[x], map_line[x+1]);
}
}
#ifdef IVX_USE_OPENCV
void setMappings(const cv::Mat& map_x, const cv::Mat& map_y)
{

File diff suppressed because it is too large Load Diff

View File

@ -1,4 +1,4 @@
(C) 1995-2013 Jean-loup Gailly and Mark Adler
(C) 1995-2024 Jean-loup Gailly and Mark Adler
This software is provided 'as-is', without any express or implied
warranty. In no event will the authors be held liable for any damages

View File

@ -21,7 +21,6 @@ Features
* Support for CPU intrinsics when available
* Adler32 implementation using SSSE3, AVX2, AVX512, AVX512-VNNI, Neon, VMX & VSX
* CRC32-B implementation using PCLMULQDQ, VPCLMULQDQ, ACLE, & IBM Z
* Hash table implementation using CRC32-C intrinsics on x86 and ARM
* Slide hash implementations using SSE2, AVX2, ARMv6, Neon, VMX & VSX
* Compare256 implementations using SSE2, AVX2, Neon, POWER9 & RVV
* Inflate chunk copying using SSE2, SSSE3, AVX, Neon & VSX
@ -95,20 +94,21 @@ make test
Build Options
-------------
| CMake | configure | Description | Default |
|:-------------------------|:-------------------------|:--------------------------------------------------------------------------------------|---------|
| ZLIB_COMPAT | --zlib-compat | Compile with zlib compatible API | OFF |
| ZLIB_ENABLE_TESTS | | Build test binaries | ON |
| WITH_GZFILEOP | --without-gzfileops | Compile with support for gzFile related functions | ON |
| WITH_OPTIM | --without-optimizations | Build with optimisations | ON |
| WITH_NEW_STRATEGIES | --without-new-strategies | Use new strategies | ON |
| WITH_NATIVE_INSTRUCTIONS | | Compiles with full instruction set supported on this host (gcc/clang -march=native) | OFF |
| WITH_SANITIZER | | Build with sanitizer (memory, address, undefined) | OFF |
| WITH_GTEST | | Build gtest_zlib | ON |
| WITH_FUZZERS | | Build test/fuzz | OFF |
| WITH_BENCHMARKS | | Build test/benchmarks | OFF |
| WITH_MAINTAINER_WARNINGS | | Build with project maintainer warnings | OFF |
| WITH_CODE_COVERAGE | | Enable code coverage reporting | OFF |
| CMake | configure | Description | Default |
|:---------------------------|:-------------------------|:------------------------------------------------------------------------------------|---------|
| ZLIB_COMPAT | --zlib-compat | Compile with zlib compatible API | OFF |
| ZLIB_ENABLE_TESTS | | Build test binaries | ON |
| WITH_GZFILEOP | --without-gzfileops | Compile with support for gzFile related functions | ON |
| WITH_OPTIM | --without-optimizations | Build with optimisations | ON |
| WITH_NEW_STRATEGIES | --without-new-strategies | Use new strategies | ON |
| WITH_NATIVE_INSTRUCTIONS | | Compiles with full instruction set supported on this host (gcc/clang -march=native) | OFF |
| WITH_RUNTIME_CPU_DETECTION | | Compiles with runtime CPU detection | ON |
| WITH_SANITIZER | | Build with sanitizer (memory, address, undefined) | OFF |
| WITH_GTEST | | Build gtest_zlib | ON |
| WITH_FUZZERS | | Build test/fuzz | OFF |
| WITH_BENCHMARKS | | Build test/benchmarks | OFF |
| WITH_MAINTAINER_WARNINGS | | Build with project maintainer warnings | OFF |
| WITH_CODE_COVERAGE | | Enable code coverage reporting | OFF |
Install

View File

@ -7,70 +7,24 @@
#include "functable.h"
#include "adler32_p.h"
/* ========================================================================= */
Z_INTERNAL uint32_t adler32_c(uint32_t adler, const uint8_t *buf, size_t len) {
uint32_t sum2;
unsigned n;
/* split Adler-32 into component sums */
sum2 = (adler >> 16) & 0xffff;
adler &= 0xffff;
/* in case user likes doing a byte at a time, keep it fast */
if (UNLIKELY(len == 1))
return adler32_len_1(adler, buf, sum2);
/* initial Adler-32 value (deferred check for len == 1 speed) */
if (UNLIKELY(buf == NULL))
return 1L;
/* in case short lengths are provided, keep it somewhat fast */
if (UNLIKELY(len < 16))
return adler32_len_16(adler, buf, len, sum2);
/* do length NMAX blocks -- requires just one modulo operation */
while (len >= NMAX) {
len -= NMAX;
#ifdef UNROLL_MORE
n = NMAX / 16; /* NMAX is divisible by 16 */
#else
n = NMAX / 8; /* NMAX is divisible by 8 */
#endif
do {
#ifdef UNROLL_MORE
DO16(adler, sum2, buf); /* 16 sums unrolled */
buf += 16;
#else
DO8(adler, sum2, buf, 0); /* 8 sums unrolled */
buf += 8;
#endif
} while (--n);
adler %= BASE;
sum2 %= BASE;
}
/* do remaining bytes (less than NMAX, still just one modulo) */
return adler32_len_64(adler, buf, len, sum2);
}
#ifdef ZLIB_COMPAT
unsigned long Z_EXPORT PREFIX(adler32_z)(unsigned long adler, const unsigned char *buf, size_t len) {
return (unsigned long)functable.adler32((uint32_t)adler, buf, len);
return (unsigned long)FUNCTABLE_CALL(adler32)((uint32_t)adler, buf, len);
}
#else
uint32_t Z_EXPORT PREFIX(adler32_z)(uint32_t adler, const unsigned char *buf, size_t len) {
return functable.adler32(adler, buf, len);
return FUNCTABLE_CALL(adler32)(adler, buf, len);
}
#endif
/* ========================================================================= */
#ifdef ZLIB_COMPAT
unsigned long Z_EXPORT PREFIX(adler32)(unsigned long adler, const unsigned char *buf, unsigned int len) {
return (unsigned long)functable.adler32((uint32_t)adler, buf, len);
return (unsigned long)FUNCTABLE_CALL(adler32)((uint32_t)adler, buf, len);
}
#else
uint32_t Z_EXPORT PREFIX(adler32)(uint32_t adler, const unsigned char *buf, uint32_t len) {
return functable.adler32(adler, buf, len);
return FUNCTABLE_CALL(adler32)(adler, buf, len);
}
#endif

View File

@ -1,11 +0,0 @@
/* adler32_fold.h -- adler32 folding interface
* Copyright (C) 2022 Adam Stylinski
* For conditions of distribution and use, see copyright notice in zlib.h
*/
#ifndef ADLER32_FOLD_H_
#define ADLER32_FOLD_H_
Z_INTERNAL uint32_t adler32_fold_copy_c(uint32_t adler, uint8_t *dst, const uint8_t *src, size_t len);
#endif

View File

@ -1,2 +0,0 @@
# ignore Makefiles; they're all automatically generated
Makefile

View File

@ -25,7 +25,6 @@ all: \
crc32_acle.o crc32_acle.lo \
slide_hash_neon.o slide_hash_neon.lo \
slide_hash_armv6.o slide_hash_armv6.lo \
insert_string_acle.o insert_string_acle.lo
adler32_neon.o:
$(CC) $(CFLAGS) $(NEONFLAG) $(NOLTOFLAG) $(INCLUDES) -c -o $@ $(SRCDIR)/adler32_neon.c
@ -69,12 +68,6 @@ slide_hash_armv6.o:
slide_hash_armv6.lo:
$(CC) $(SFLAGS) $(ARMV6FLAG) $(NOLTOFLAG) $(INCLUDES) -c -o $@ $(SRCDIR)/slide_hash_armv6.c
insert_string_acle.o:
$(CC) $(CFLAGS) $(ACLEFLAG) $(NOLTOFLAG) $(INCLUDES) -c -o $@ $(SRCDIR)/insert_string_acle.c
insert_string_acle.lo:
$(CC) $(SFLAGS) $(ACLEFLAG) $(NOLTOFLAG) $(INCLUDES) -c -o $@ $(SRCDIR)/insert_string_acle.c
mostlyclean: clean
clean:
rm -f *.o *.lo *~

View File

@ -7,8 +7,8 @@
*/
#ifdef ARM_NEON
#include "neon_intrins.h"
#include "../../zbuild.h"
#include "../../adler32_p.h"
#include "zbuild.h"
#include "adler32_p.h"
static void NEON_accum32(uint32_t *s, const uint8_t *buf, size_t len) {
static const uint16_t ALIGNED_(16) taps[64] = {

View File

@ -1,4 +1,4 @@
#include "../../zbuild.h"
#include "zbuild.h"
#include "arm_features.h"
#if defined(__linux__) && defined(HAVE_SYS_AUXV_H)
@ -11,6 +11,11 @@
# ifndef ID_AA64ISAR0_CRC32_VAL
# define ID_AA64ISAR0_CRC32_VAL ID_AA64ISAR0_CRC32
# endif
#elif defined(__OpenBSD__) && defined(__aarch64__)
# include <machine/armreg.h>
# include <machine/cpu.h>
# include <sys/sysctl.h>
# include <sys/types.h>
#elif defined(__APPLE__)
# if !defined(_DARWIN_C_SOURCE)
# define _DARWIN_C_SOURCE /* enable types aliases (eg u_int) */
@ -30,6 +35,16 @@ static int arm_has_crc32() {
#elif defined(__FreeBSD__) && defined(__aarch64__)
return getenv("QEMU_EMULATING") == NULL
&& ID_AA64ISAR0_CRC32_VAL(READ_SPECIALREG(id_aa64isar0_el1)) >= ID_AA64ISAR0_CRC32_BASE;
#elif defined(__OpenBSD__) && defined(__aarch64__)
int hascrc32 = 0;
int isar0_mib[] = { CTL_MACHDEP, CPU_ID_AA64ISAR0 };
uint64_t isar0 = 0;
size_t len = sizeof(isar0);
if (sysctl(isar0_mib, 2, &isar0, &len, NULL, 0) != -1) {
if (ID_AA64ISAR0_CRC32(isar0) >= ID_AA64ISAR0_CRC32_BASE)
hascrc32 = 1;
}
return hascrc32;
#elif defined(__APPLE__)
int hascrc32;
size_t size = sizeof(hascrc32);

View File

@ -2,8 +2,8 @@
* For conditions of distribution and use, see copyright notice in zlib.h
*/
#ifndef ARM_H_
#define ARM_H_
#ifndef ARM_FEATURES_H_
#define ARM_FEATURES_H_
struct arm_cpu_features {
int has_simd;
@ -13,4 +13,4 @@ struct arm_cpu_features {
void Z_INTERNAL arm_check_features(struct arm_cpu_features *features);
#endif /* ARM_H_ */
#endif /* ARM_FEATURES_H_ */

View File

@ -0,0 +1,65 @@
/* arm_functions.h -- ARM implementations for arch-specific functions.
* For conditions of distribution and use, see copyright notice in zlib.h
*/
#ifndef ARM_FUNCTIONS_H_
#define ARM_FUNCTIONS_H_
#ifdef ARM_NEON
uint32_t adler32_neon(uint32_t adler, const uint8_t *buf, size_t len);
uint32_t chunksize_neon(void);
uint8_t* chunkmemset_safe_neon(uint8_t *out, unsigned dist, unsigned len, unsigned left);
# ifdef HAVE_BUILTIN_CTZLL
uint32_t compare256_neon(const uint8_t *src0, const uint8_t *src1);
uint32_t longest_match_neon(deflate_state *const s, Pos cur_match);
uint32_t longest_match_slow_neon(deflate_state *const s, Pos cur_match);
# endif
void slide_hash_neon(deflate_state *s);
void inflate_fast_neon(PREFIX3(stream) *strm, uint32_t start);
#endif
#ifdef ARM_ACLE
uint32_t crc32_acle(uint32_t crc, const uint8_t *buf, size_t len);
#endif
#ifdef ARM_SIMD
void slide_hash_armv6(deflate_state *s);
#endif
#ifdef DISABLE_RUNTIME_CPU_DETECTION
// ARM - SIMD
# if (defined(ARM_SIMD) && defined(__ARM_FEATURE_SIMD32)) || defined(ARM_NOCHECK_SIMD)
# undef native_slide_hash
# define native_slide_hash slide_hash_armv6
# endif
// ARM - NEON
# if (defined(ARM_NEON) && (defined(__ARM_NEON__) || defined(__ARM_NEON))) || ARM_NOCHECK_NEON
# undef native_adler32
# define native_adler32 adler32_neon
# undef native_chunkmemset_safe
# define native_chunkmemset_safe chunkmemset_safe_neon
# undef native_chunksize
# define native_chunksize chunksize_neon
# undef native_inflate_fast
# define native_inflate_fast inflate_fast_neon
# undef native_slide_hash
# define native_slide_hash slide_hash_neon
# ifdef HAVE_BUILTIN_CTZLL
# undef native_compare256
# define native_compare256 compare256_neon
# undef native_longest_match
# define native_longest_match longest_match_neon
# undef native_longest_match_slow
# define native_longest_match_slow longest_match_slow_neon
# endif
# endif
// ARM - ACLE
# if defined(ARM_ACLE) && defined(__ARM_ACLE) && defined(__ARM_FEATURE_CRC32)
# undef native_crc32
# define native_crc32 crc32_acle
# endif
#endif
#endif /* ARM_FUNCTIONS_H_ */

View File

@ -4,8 +4,8 @@
#ifdef ARM_NEON
#include "neon_intrins.h"
#include "../../zbuild.h"
#include "../generic/chunk_permute_table.h"
#include "zbuild.h"
#include "arch/generic/chunk_permute_table.h"
typedef uint8x16_t chunk_t;

View File

@ -3,8 +3,9 @@
* For conditions of distribution and use, see copyright notice in zlib.h
*/
#include "../../zbuild.h"
#include "zbuild.h"
#include "zutil_p.h"
#include "deflate.h"
#include "fallback_builtins.h"
#if defined(ARM_NEON) && defined(HAVE_BUILTIN_CTZLL)

View File

@ -7,7 +7,7 @@
#ifdef ARM_ACLE
#include "acle_intrins.h"
#include "../../zbuild.h"
#include "zbuild.h"
Z_INTERNAL Z_TARGET_CRC uint32_t crc32_acle(uint32_t crc, const uint8_t *buf, size_t len) {
Z_REGISTER uint32_t c;

View File

@ -1,24 +0,0 @@
/* insert_string_acle.c -- insert_string integer hash variant using ACLE's CRC instructions
*
* Copyright (C) 1995-2013 Jean-loup Gailly and Mark Adler
* For conditions of distribution and use, see copyright notice in zlib.h
*
*/
#ifdef ARM_ACLE
#include "acle_intrins.h"
#include "../../zbuild.h"
#include "../../deflate.h"
#define HASH_CALC(s, h, val) \
h = __crc32w(0, val)
#define HASH_CALC_VAR h
#define HASH_CALC_VAR_INIT uint32_t h = 0
#define UPDATE_HASH Z_TARGET_CRC update_hash_acle
#define INSERT_STRING Z_TARGET_CRC insert_string_acle
#define QUICK_INSERT_STRING Z_TARGET_CRC quick_insert_string_acle
#include "../../insert_string_tpl.h"
#endif

View File

@ -25,6 +25,13 @@
out.val[3] = vqsubq_u16(a.val[3], b); \
} while (0)
# if defined(__clang__) && defined(__arm__) && defined(__ANDROID__)
/* Clang for 32-bit Android has too strict alignment requirement (:256) for x4 NEON intrinsics */
# undef ARM_NEON_HASLD4
# undef vld1q_u16_x4
# undef vld1q_u8_x4
# undef vst1q_u16_x4
# endif
# ifndef ARM_NEON_HASLD4

View File

@ -5,8 +5,8 @@
#if defined(ARM_SIMD)
#include "acle_intrins.h"
#include "../../zbuild.h"
#include "../../deflate.h"
#include "zbuild.h"
#include "deflate.h"
/* SIMD version of hash_chain rebase */
static inline void slide_hash_chain(Pos *table, uint32_t entries, uint16_t wsize) {

View File

@ -10,8 +10,8 @@
#ifdef ARM_NEON
#include "neon_intrins.h"
#include "../../zbuild.h"
#include "../../deflate.h"
#include "zbuild.h"
#include "deflate.h"
/* SIMD version of hash_chain rebase */
static inline void slide_hash_chain(Pos *table, uint32_t entries, uint16_t wsize) {

View File

@ -1,5 +1,6 @@
# Makefile for zlib
# Makefile for zlib-ng
# Copyright (C) 1995-2013 Jean-loup Gailly, Mark Adler
# Copyright (C) 2024 Hans Kristian Rosbach
# For conditions of distribution and use, see copyright notice in zlib.h
CC=
@ -11,12 +12,62 @@ SRCDIR=.
SRCTOP=../..
TOPDIR=$(SRCTOP)
all:
all: \
adler32_c.o adler32_c.lo \
adler32_fold_c.o adler32_fold_c.lo \
chunkset_c.o chunkset_c.lo \
compare256_c.o compare256_c.lo \
crc32_braid_c.o crc32_braid_c.lo \
crc32_fold_c.o crc32_fold_c.lo \
slide_hash_c.o slide_hash_c.lo
adler32_c.o: $(SRCDIR)/adler32_c.c $(SRCTOP)/zbuild.h $(SRCTOP)/adler32_p.h
$(CC) $(CFLAGS) $(INCLUDES) -c -o $@ $(SRCDIR)/adler32_c.c
adler32_c.lo: $(SRCDIR)/adler32_c.c $(SRCTOP)/zbuild.h $(SRCTOP)/adler32_p.h
$(CC) $(SFLAGS) $(INCLUDES) -c -o $@ $(SRCDIR)/adler32_c.c
adler32_fold_c.o: $(SRCDIR)/adler32_fold_c.c $(SRCTOP)/zbuild.h $(SRCTOP)/functable.h
$(CC) $(CFLAGS) $(INCLUDES) -c -o $@ $(SRCDIR)/adler32_fold_c.c
adler32_fold_c.lo: $(SRCDIR)/adler32_fold_c.c $(SRCTOP)/zbuild.h $(SRCTOP)/functable.h
$(CC) $(SFLAGS) $(INCLUDES) -c -o $@ $(SRCDIR)/adler32_fold_c.c
chunkset_c.o: $(SRCDIR)/chunkset_c.c $(SRCTOP)/zbuild.h $(SRCTOP)/chunkset_tpl.h $(SRCTOP)/inffast_tpl.h
$(CC) $(CFLAGS) $(INCLUDES) -c -o $@ $(SRCDIR)/chunkset_c.c
chunkset_c.lo: $(SRCDIR)/chunkset_c.c $(SRCTOP)/zbuild.h $(SRCTOP)/chunkset_tpl.h $(SRCTOP)/inffast_tpl.h
$(CC) $(SFLAGS) $(INCLUDES) -c -o $@ $(SRCDIR)/chunkset_c.c
compare256_c.o: $(SRCDIR)/compare256_c.c $(SRCTOP)/zbuild.h $(SRCTOP)/zutil_p.h $(SRCTOP)/deflate.h $(SRCTOP)/fallback_builtins.h
$(CC) $(CFLAGS) $(INCLUDES) -c -o $@ $(SRCDIR)/compare256_c.c
compare256_c.lo: $(SRCDIR)/compare256_c.c $(SRCTOP)/zbuild.h $(SRCTOP)/zutil_p.h $(SRCTOP)/deflate.h $(SRCTOP)/fallback_builtins.h
$(CC) $(SFLAGS) $(INCLUDES) -c -o $@ $(SRCDIR)/compare256_c.c
crc32_braid_c.o: $(SRCDIR)/crc32_braid_c.c $(SRCTOP)/zbuild.h $(SRCTOP)/crc32_braid_p.h $(SRCTOP)/crc32_braid_tbl.h
$(CC) $(CFLAGS) $(INCLUDES) -c -o $@ $(SRCDIR)/crc32_braid_c.c
crc32_braid_c.lo: $(SRCDIR)/crc32_braid_c.c $(SRCTOP)/zbuild.h $(SRCTOP)/crc32_braid_p.h $(SRCTOP)/crc32_braid_tbl.h
$(CC) $(SFLAGS) $(INCLUDES) -c -o $@ $(SRCDIR)/crc32_braid_c.c
crc32_fold_c.o: $(SRCDIR)/crc32_fold_c.c $(SRCTOP)/zbuild.h $(SRCTOP)/functable.h
$(CC) $(CFLAGS) $(INCLUDES) -c -o $@ $(SRCDIR)/crc32_fold_c.c
crc32_fold_c.lo: $(SRCDIR)/crc32_fold_c.c $(SRCTOP)/zbuild.h $(SRCTOP)/functable.h
$(CC) $(SFLAGS) $(INCLUDES) -c -o $@ $(SRCDIR)/crc32_fold_c.c
slide_hash_c.o: $(SRCDIR)/slide_hash_c.c $(SRCTOP)/zbuild.h $(SRCTOP)/deflate.h
$(CC) $(CFLAGS) $(INCLUDES) -c -o $@ $(SRCDIR)/slide_hash_c.c
slide_hash_c.lo: $(SRCDIR)/slide_hash_c.c $(SRCTOP)/zbuild.h $(SRCTOP)/deflate.h
$(CC) $(SFLAGS) $(INCLUDES) -c -o $@ $(SRCDIR)/slide_hash_c.c
mostlyclean: clean
clean:
rm -f *.o *.lo *~ \
rm -f *.o *.lo *~
rm -rf objs
rm -f *.gcda *.gcno *.gcov

View File

@ -0,0 +1,54 @@
/* adler32.c -- compute the Adler-32 checksum of a data stream
* Copyright (C) 1995-2011, 2016 Mark Adler
* For conditions of distribution and use, see copyright notice in zlib.h
*/
#include "zbuild.h"
#include "functable.h"
#include "adler32_p.h"
/* ========================================================================= */
Z_INTERNAL uint32_t adler32_c(uint32_t adler, const uint8_t *buf, size_t len) {
uint32_t sum2;
unsigned n;
/* split Adler-32 into component sums */
sum2 = (adler >> 16) & 0xffff;
adler &= 0xffff;
/* in case user likes doing a byte at a time, keep it fast */
if (UNLIKELY(len == 1))
return adler32_len_1(adler, buf, sum2);
/* initial Adler-32 value (deferred check for len == 1 speed) */
if (UNLIKELY(buf == NULL))
return 1L;
/* in case short lengths are provided, keep it somewhat fast */
if (UNLIKELY(len < 16))
return adler32_len_16(adler, buf, len, sum2);
/* do length NMAX blocks -- requires just one modulo operation */
while (len >= NMAX) {
len -= NMAX;
#ifdef UNROLL_MORE
n = NMAX / 16; /* NMAX is divisible by 16 */
#else
n = NMAX / 8; /* NMAX is divisible by 8 */
#endif
do {
#ifdef UNROLL_MORE
DO16(adler, sum2, buf); /* 16 sums unrolled */
buf += 16;
#else
DO8(adler, sum2, buf, 0); /* 8 sums unrolled */
buf += 8;
#endif
} while (--n);
adler %= BASE;
sum2 %= BASE;
}
/* do remaining bytes (less than NMAX, still just one modulo) */
return adler32_len_64(adler, buf, len, sum2);
}

View File

@ -5,12 +5,11 @@
#include "zbuild.h"
#include "functable.h"
#include "adler32_fold.h"
#include <limits.h>
Z_INTERNAL uint32_t adler32_fold_copy_c(uint32_t adler, uint8_t *dst, const uint8_t *src, size_t len) {
adler = functable.adler32(adler, src, len);
adler = FUNCTABLE_CALL(adler32)(adler, src, len);
memcpy(dst, src, len);
return adler;
}

View File

@ -5,6 +5,7 @@
#include "zbuild.h"
#include "zutil_p.h"
#include "deflate.h"
#include "fallback_builtins.h"
/* ALIGNED, byte comparison */

View File

@ -8,43 +8,9 @@
*/
#include "zbuild.h"
#include "zutil.h"
#include "functable.h"
#include "crc32_braid_p.h"
#include "crc32_braid_tbl.h"
/* ========================================================================= */
const uint32_t * Z_EXPORT PREFIX(get_crc_table)(void) {
return (const uint32_t *)crc_table;
}
#ifdef ZLIB_COMPAT
unsigned long Z_EXPORT PREFIX(crc32_z)(unsigned long crc, const unsigned char *buf, size_t len) {
if (buf == NULL) return 0;
return (unsigned long)functable.crc32((uint32_t)crc, buf, len);
}
#else
uint32_t Z_EXPORT PREFIX(crc32_z)(uint32_t crc, const unsigned char *buf, size_t len) {
if (buf == NULL) return 0;
return functable.crc32(crc, buf, len);
}
#endif
#ifdef ZLIB_COMPAT
unsigned long Z_EXPORT PREFIX(crc32)(unsigned long crc, const unsigned char *buf, unsigned int len) {
return (unsigned long)PREFIX(crc32_z)((uint32_t)crc, buf, len);
}
#else
uint32_t Z_EXPORT PREFIX(crc32)(uint32_t crc, const unsigned char *buf, uint32_t len) {
return PREFIX(crc32_z)(crc, buf, len);
}
#endif
/* ========================================================================= */
/*
A CRC of a message is computed on N braids of words in the message, where
each word consists of W bytes (4 or 8). If N is 3, for example, then three
@ -66,24 +32,6 @@ uint32_t Z_EXPORT PREFIX(crc32)(uint32_t crc, const unsigned char *buf, uint32_t
level. Your mileage may vary.
*/
/* ========================================================================= */
#if BYTE_ORDER == LITTLE_ENDIAN
# define ZSWAPWORD(word) (word)
# define BRAID_TABLE crc_braid_table
#elif BYTE_ORDER == BIG_ENDIAN
# if W == 8
# define ZSWAPWORD(word) ZSWAP64(word)
# elif W == 4
# define ZSWAPWORD(word) ZSWAP32(word)
# endif
# define BRAID_TABLE crc_braid_big_table
#else
# error "No endian defined"
#endif
#define DO1 c = crc_table[(c ^ *buf++) & 0xff] ^ (c >> 8)
#define DO8 DO1; DO1; DO1; DO1; DO1; DO1; DO1; DO1
/* ========================================================================= */
#ifdef W
/*
@ -112,7 +60,7 @@ static z_word_t crc_word(z_word_t data) {
/* ========================================================================= */
Z_INTERNAL uint32_t PREFIX(crc32_braid)(uint32_t crc, const uint8_t *buf, size_t len) {
Z_REGISTER uint32_t c;
uint32_t c;
/* Pre-condition the CRC */
c = (~crc) & 0xffffffff;

View File

@ -3,11 +3,9 @@
* For conditions of distribution and use, see copyright notice in zlib.h
*/
#include "zbuild.h"
#include "zutil.h"
#include "functable.h"
#include "crc32_fold.h"
#include <limits.h>
#include "crc32.h"
Z_INTERNAL uint32_t crc32_fold_reset_c(crc32_fold *crc) {
crc->value = CRC32_INITIAL_VALUE;
@ -15,7 +13,7 @@ Z_INTERNAL uint32_t crc32_fold_reset_c(crc32_fold *crc) {
}
Z_INTERNAL void crc32_fold_copy_c(crc32_fold *crc, uint8_t *dst, const uint8_t *src, size_t len) {
crc->value = functable.crc32(crc->value, src, len);
crc->value = FUNCTABLE_CALL(crc32)(crc->value, src, len);
memcpy(dst, src, len);
}
@ -25,7 +23,7 @@ Z_INTERNAL void crc32_fold_c(crc32_fold *crc, const uint8_t *src, size_t len, ui
* same arguments for the versions that _do_ do a folding CRC but we don't want a copy. The
* init_crc is an unused argument in this context */
Z_UNUSED(init_crc);
crc->value = functable.crc32(crc->value, src, len);
crc->value = FUNCTABLE_CALL(crc32)(crc->value, src, len);
}
Z_INTERNAL uint32_t crc32_fold_final_c(crc32_fold *crc) {

View File

@ -0,0 +1,106 @@
/* generic_functions.h -- generic C implementations for arch-specific functions.
* For conditions of distribution and use, see copyright notice in zlib.h
*/
#ifndef GENERIC_FUNCTIONS_H_
#define GENERIC_FUNCTIONS_H_
#include "zendian.h"
Z_INTERNAL uint32_t crc32_fold_reset_c(crc32_fold *crc);
Z_INTERNAL void crc32_fold_copy_c(crc32_fold *crc, uint8_t *dst, const uint8_t *src, size_t len);
Z_INTERNAL void crc32_fold_c(crc32_fold *crc, const uint8_t *src, size_t len, uint32_t init_crc);
Z_INTERNAL uint32_t crc32_fold_final_c(crc32_fold *crc);
Z_INTERNAL uint32_t adler32_fold_copy_c(uint32_t adler, uint8_t *dst, const uint8_t *src, size_t len);
typedef uint32_t (*adler32_func)(uint32_t adler, const uint8_t *buf, size_t len);
typedef uint32_t (*compare256_func)(const uint8_t *src0, const uint8_t *src1);
typedef uint32_t (*crc32_func)(uint32_t crc32, const uint8_t *buf, size_t len);
uint32_t adler32_c(uint32_t adler, const uint8_t *buf, size_t len);
uint32_t chunksize_c(void);
uint8_t* chunkmemset_safe_c(uint8_t *out, unsigned dist, unsigned len, unsigned left);
void inflate_fast_c(PREFIX3(stream) *strm, uint32_t start);
uint32_t PREFIX(crc32_braid)(uint32_t crc, const uint8_t *buf, size_t len);
uint32_t compare256_c(const uint8_t *src0, const uint8_t *src1);
#if defined(UNALIGNED_OK) && BYTE_ORDER == LITTLE_ENDIAN
uint32_t compare256_unaligned_16(const uint8_t *src0, const uint8_t *src1);
# ifdef HAVE_BUILTIN_CTZ
uint32_t compare256_unaligned_32(const uint8_t *src0, const uint8_t *src1);
# endif
# if defined(UNALIGNED64_OK) && defined(HAVE_BUILTIN_CTZLL)
uint32_t compare256_unaligned_64(const uint8_t *src0, const uint8_t *src1);
# endif
#endif
typedef void (*slide_hash_func)(deflate_state *s);
void slide_hash_c(deflate_state *s);
uint32_t longest_match_c(deflate_state *const s, Pos cur_match);
# if defined(UNALIGNED_OK) && BYTE_ORDER == LITTLE_ENDIAN
uint32_t longest_match_unaligned_16(deflate_state *const s, Pos cur_match);
# ifdef HAVE_BUILTIN_CTZ
uint32_t longest_match_unaligned_32(deflate_state *const s, Pos cur_match);
# endif
# if defined(UNALIGNED64_OK) && defined(HAVE_BUILTIN_CTZLL)
uint32_t longest_match_unaligned_64(deflate_state *const s, Pos cur_match);
# endif
# endif
uint32_t longest_match_slow_c(deflate_state *const s, Pos cur_match);
# if defined(UNALIGNED_OK) && BYTE_ORDER == LITTLE_ENDIAN
uint32_t longest_match_slow_unaligned_16(deflate_state *const s, Pos cur_match);
uint32_t longest_match_slow_unaligned_32(deflate_state *const s, Pos cur_match);
# ifdef UNALIGNED64_OK
uint32_t longest_match_slow_unaligned_64(deflate_state *const s, Pos cur_match);
# endif
# endif
// Select generic implementation for longest_match, longest_match_slow, longest_match_slow functions.
#if defined(UNALIGNED_OK) && BYTE_ORDER == LITTLE_ENDIAN
# if defined(UNALIGNED64_OK) && defined(HAVE_BUILTIN_CTZLL)
# define longest_match_generic longest_match_unaligned_64
# define longest_match_slow_generic longest_match_slow_unaligned_64
# define compare256_generic compare256_unaligned_64
# elif defined(HAVE_BUILTIN_CTZ)
# define longest_match_generic longest_match_unaligned_32
# define longest_match_slow_generic longest_match_slow_unaligned_32
# define compare256_generic compare256_unaligned_32
# else
# define longest_match_generic longest_match_unaligned_16
# define longest_match_slow_generic longest_match_slow_unaligned_16
# define compare256_generic compare256_unaligned_16
# endif
#else
# define longest_match_generic longest_match_c
# define longest_match_slow_generic longest_match_slow_c
# define compare256_generic compare256_c
#endif
#ifdef DISABLE_RUNTIME_CPU_DETECTION
// Generic code
# define native_adler32 adler32_c
# define native_adler32_fold_copy adler32_fold_copy_c
# define native_chunkmemset_safe chunkmemset_safe_c
# define native_chunksize chunksize_c
# define native_crc32 PREFIX(crc32_braid)
# define native_crc32_fold crc32_fold_c
# define native_crc32_fold_copy crc32_fold_copy_c
# define native_crc32_fold_final crc32_fold_final_c
# define native_crc32_fold_reset crc32_fold_reset_c
# define native_inflate_fast inflate_fast_c
# define native_slide_hash slide_hash_c
# define native_longest_match longest_match_generic
# define native_longest_match_slow longest_match_slow_generic
# define native_compare256 compare256_generic
#endif
#endif

View File

@ -1,6 +1,6 @@
/* slide_hash.c -- slide hash table C implementation
*
* Copyright (C) 1995-2013 Jean-loup Gailly and Mark Adler
* Copyright (C) 1995-2024 Jean-loup Gailly and Mark Adler
* For conditions of distribution and use, see copyright notice in zlib.h
*/

View File

@ -4,7 +4,7 @@
#ifdef POWER8_VSX
#include <altivec.h>
#include "../../zbuild.h"
#include "zbuild.h"
typedef vector unsigned char chunk_t;

View File

@ -5,8 +5,10 @@
#ifdef POWER9
#include <altivec.h>
#include "../../zbuild.h"
#include "../../zendian.h"
#include "zbuild.h"
#include "zutil_p.h"
#include "deflate.h"
#include "zendian.h"
/* Older versions of GCC misimplemented semantics for these bit counting builtins.
* https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=3f30f2d1dbb3228b8468b26239fe60c2974ce2ac */

View File

@ -1,16 +1,19 @@
/* power_features.c - POWER feature check
* Copyright (C) 2020 Matheus Castanho <msc@linux.ibm.com>, IBM
* Copyright (C) 2021-2022 Mika T. Lindqvist <postmaster@raasu.org>
* Copyright (C) 2021-2024 Mika T. Lindqvist <postmaster@raasu.org>
* For conditions of distribution and use, see copyright notice in zlib.h
*/
#ifdef HAVE_SYS_AUXV_H
# include <sys/auxv.h>
#endif
#ifdef POWER_NEED_AUXVEC_H
# include <linux/auxvec.h>
#endif
#ifdef __FreeBSD__
# include <machine/cpu.h>
#endif
#include "../../zbuild.h"
#include "zbuild.h"
#include "power_features.h"
void Z_INTERNAL power_check_features(struct power_cpu_features *features) {

View File

@ -4,8 +4,8 @@
* For conditions of distribution and use, see copyright notice in zlib.h
*/
#ifndef POWER_H_
#define POWER_H_
#ifndef POWER_FEATURES_H_
#define POWER_FEATURES_H_
struct power_cpu_features {
int has_altivec;
@ -15,4 +15,4 @@ struct power_cpu_features {
void Z_INTERNAL power_check_features(struct power_cpu_features *features);
#endif /* POWER_H_ */
#endif /* POWER_FEATURES_H_ */

View File

@ -0,0 +1,67 @@
/* power_functions.h -- POWER implementations for arch-specific functions.
* Copyright (C) 2020 Matheus Castanho <msc@linux.ibm.com>, IBM
* Copyright (C) 2021 Mika T. Lindqvist <postmaster@raasu.org>
* For conditions of distribution and use, see copyright notice in zlib.h
*/
#ifndef POWER_FUNCTIONS_H_
#define POWER_FUNCTIONS_H_
#ifdef PPC_VMX
uint32_t adler32_vmx(uint32_t adler, const uint8_t *buf, size_t len);
void slide_hash_vmx(deflate_state *s);
#endif
#ifdef POWER8_VSX
uint32_t adler32_power8(uint32_t adler, const uint8_t *buf, size_t len);
uint32_t chunksize_power8(void);
uint8_t* chunkmemset_safe_power8(uint8_t *out, unsigned dist, unsigned len, unsigned left);
uint32_t crc32_power8(uint32_t crc, const uint8_t *buf, size_t len);
void slide_hash_power8(deflate_state *s);
void inflate_fast_power8(PREFIX3(stream) *strm, uint32_t start);
#endif
#ifdef POWER9
uint32_t compare256_power9(const uint8_t *src0, const uint8_t *src1);
uint32_t longest_match_power9(deflate_state *const s, Pos cur_match);
uint32_t longest_match_slow_power9(deflate_state *const s, Pos cur_match);
#endif
#ifdef DISABLE_RUNTIME_CPU_DETECTION
// Power - VMX
# if defined(PPC_VMX) && defined(__ALTIVEC__)
# undef native_adler32
# define native_adler32 adler32_vmx
# undef native_slide_hash
# define native_slide_hash slide_hash_vmx
# endif
// Power8 - VSX
# if defined(POWER8_VSX) && defined(_ARCH_PWR8) && defined(__VSX__)
# undef native_adler32
# define native_adler32 adler32_power8
# undef native_chunkmemset_safe
# define native_chunkmemset_safe chunkmemset_safe_power8
# undef native_chunksize
# define native_chunksize chunksize_power8
# undef native_inflate_fast
# define native_inflate_fast inflate_fast_power8
# undef native_slide_hash
# define native_slide_hash slide_hash_power8
# endif
# if defined(POWER8_VSX_CRC32) && defined(_ARCH_PWR8) && defined(__VSX__)
# undef native_crc32
# define native_crc32 crc32_power8
# endif
// Power9
# if defined(POWER9) && defined(_ARCH_PWR9)
# undef native_compare256
# define native_compare256 compare256_power9
# undef native_longest_match
# define native_longest_match longest_match_power9
# undef native_longest_match_slow
# define native_longest_match_slow longest_match_slow_power9
# endif
#endif
#endif /* POWER_FUNCTIONS_H_ */

View File

@ -9,8 +9,8 @@
#include <riscv_vector.h>
#include <stdint.h>
#include "../../zbuild.h"
#include "../../adler32_p.h"
#include "zbuild.h"
#include "adler32_p.h"
static inline uint32_t adler32_rvv_impl(uint32_t adler, uint8_t* restrict dst, const uint8_t *src, size_t len, int COPY) {
/* split Adler-32 into component sums */

View File

@ -6,7 +6,9 @@
#ifdef RISCV_RVV
#include "../../zbuild.h"
#include "zbuild.h"
#include "zutil_p.h"
#include "deflate.h"
#include "fallback_builtins.h"
#include <riscv_vector.h>

View File

@ -1,10 +1,13 @@
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/auxv.h>
#include <sys/utsname.h>
#include "../../zbuild.h"
#if defined(__linux__) && defined(HAVE_SYS_AUXV_H)
# include <sys/auxv.h>
#endif
#include "zbuild.h"
#include "riscv_features.h"
#define ISA_V_HWCAP (1 << ('v' - 'a'))
@ -33,7 +36,11 @@ void Z_INTERNAL riscv_check_features_compile_time(struct riscv_cpu_features *fea
}
void Z_INTERNAL riscv_check_features_runtime(struct riscv_cpu_features *features) {
#if defined(__linux__) && defined(HAVE_SYS_AUXV_H)
unsigned long hw_cap = getauxval(AT_HWCAP);
#else
unsigned long hw_cap = 0;
#endif
features->has_rvv = hw_cap & ISA_V_HWCAP;
}

View File

@ -6,8 +6,8 @@
* For conditions of distribution and use, see copyright notice in zlib.h
*/
#ifndef RISCV_H_
#define RISCV_H_
#ifndef RISCV_FEATURES_H_
#define RISCV_FEATURES_H_
struct riscv_cpu_features {
int has_rvv;
@ -15,4 +15,4 @@ struct riscv_cpu_features {
void Z_INTERNAL riscv_check_features(struct riscv_cpu_features *features);
#endif /* RISCV_H_ */
#endif /* RISCV_FEATURES_H_ */

View File

@ -0,0 +1,49 @@
/* riscv_functions.h -- RISCV implementations for arch-specific functions.
*
* Copyright (C) 2023 SiFive, Inc. All rights reserved.
* Contributed by Alex Chiang <alex.chiang@sifive.com>
*
* For conditions of distribution and use, see copyright notice in zlib.h
*/
#ifndef RISCV_FUNCTIONS_H_
#define RISCV_FUNCTIONS_H_
#ifdef RISCV_RVV
uint32_t adler32_rvv(uint32_t adler, const uint8_t *buf, size_t len);
uint32_t adler32_fold_copy_rvv(uint32_t adler, uint8_t *dst, const uint8_t *src, size_t len);
uint32_t chunksize_rvv(void);
uint8_t* chunkmemset_safe_rvv(uint8_t *out, unsigned dist, unsigned len, unsigned left);
uint32_t compare256_rvv(const uint8_t *src0, const uint8_t *src1);
uint32_t longest_match_rvv(deflate_state *const s, Pos cur_match);
uint32_t longest_match_slow_rvv(deflate_state *const s, Pos cur_match);
void slide_hash_rvv(deflate_state *s);
void inflate_fast_rvv(PREFIX3(stream) *strm, uint32_t start);
#endif
#ifdef DISABLE_RUNTIME_CPU_DETECTION
// RISCV - RVV
# if defined(RISCV_RVV) && defined(__riscv_v) && defined(__linux__)
# undef native_adler32
# define native_adler32 adler32_rvv
# undef native_adler32_fold_copy
# define native_adler32_fold_copy adler32_fold_copy_rvv
# undef native_chunkmemset_safe
# define native_chunkmemset_safe chunkmemset_safe_rvv
# undef native_chunksize
# define native_chunksize chunksize_rvv
# undef native_compare256
# define native_compare256 compare256_rvv
# undef native_inflate_fast
# define native_inflate_fast inflate_fast_rvv
# undef native_longest_match
# define native_longest_match longest_match_rvv
# undef native_longest_match_slow
# define native_longest_match_slow longest_match_slow_rvv
# undef native_slide_hash
# define native_slide_hash slide_hash_rvv
# endif
#endif
#endif /* RISCV_FUNCTIONS_H_ */

View File

@ -8,18 +8,16 @@
#include <riscv_vector.h>
#include "../../zbuild.h"
#include "../../deflate.h"
#include "zbuild.h"
#include "deflate.h"
static inline void slide_hash_chain(Pos *table, uint32_t entries, uint16_t wsize) {
size_t vl;
while (entries > 0) {
vl = __riscv_vsetvl_e16m4(entries);
vuint16m4_t v_tab = __riscv_vle16_v_u16m4(table, vl);
vuint16m4_t v_diff = __riscv_vsub_vx_u16m4(v_tab, wsize, vl);
vbool4_t mask = __riscv_vmsltu_vx_u16m4_b4(v_tab, wsize, vl);
v_tab = __riscv_vmerge_vxm_u16m4(v_diff, 0, mask, vl);
__riscv_vse16_v_u16m4(table, v_tab, vl);
vuint16m4_t v_diff = __riscv_vssubu_vx_u16m4(v_tab, wsize, vl);
__riscv_vse16_v_u16m4(table, v_diff, vl);
table += vl, entries -= vl;
}
}

48
3rdparty/zlib-ng/arch/s390/Makefile.in vendored Normal file
View File

@ -0,0 +1,48 @@
# Makefile for zlib-ng
# Copyright (C) 1995-2013 Jean-loup Gailly, Mark Adler
# For conditions of distribution and use, see copyright notice in zlib.h
CC=
CFLAGS=
SFLAGS=
INCLUDES=
SUFFIX=
VGFMAFLAG=
NOLTOFLAG=
SRCDIR=.
SRCTOP=../..
TOPDIR=$(SRCTOP)
s390_features.o:
$(CC) $(CFLAGS) $(INCLUDES) -c -o $@ $(SRCDIR)/s390_features.c
s390_features.lo:
$(CC) $(SFLAGS) $(INCLUDES) -c -o $@ $(SRCDIR)/s390_features.c
dfltcc_deflate.o:
$(CC) $(CFLAGS) $(INCLUDES) -c -o $@ $(SRCDIR)/dfltcc_deflate.c
dfltcc_deflate.lo:
$(CC) $(SFLAGS) $(INCLUDES) -c -o $@ $(SRCDIR)/dfltcc_deflate.c
dfltcc_inflate.o:
$(CC) $(CFLAGS) $(INCLUDES) -c -o $@ $(SRCDIR)/dfltcc_inflate.c
dfltcc_inflate.lo:
$(CC) $(SFLAGS) $(INCLUDES) -c -o $@ $(SRCDIR)/dfltcc_inflate.c
crc32-vx.o:
$(CC) $(CFLAGS) $(VGFMAFLAG) $(NOLTOFLAG) $(INCLUDES) -c -o $@ $(SRCDIR)/crc32-vx.c
crc32-vx.lo:
$(CC) $(SFLAGS) $(VGFMAFLAG) $(NOLTOFLAG) $(INCLUDES) -c -o $@ $(SRCDIR)/crc32-vx.c
mostlyclean: clean
clean:
rm -f *.o *.lo *~
rm -rf objs
rm -f *.gcda *.gcno *.gcov
distclean: clean
rm -f Makefile

277
3rdparty/zlib-ng/arch/s390/README.md vendored Normal file
View File

@ -0,0 +1,277 @@
# Introduction
This directory contains SystemZ deflate hardware acceleration support.
It can be enabled using the following build commands:
$ ./configure --with-dfltcc-deflate --with-dfltcc-inflate
$ make
or
$ cmake -DWITH_DFLTCC_DEFLATE=1 -DWITH_DFLTCC_INFLATE=1 .
$ make
When built like this, zlib-ng would compress using hardware on level 1,
and using software on all other levels. Decompression will always happen
in hardware. In order to enable hardware compression for levels 1-6
(i.e. to make it used by default) one could add
`-DDFLTCC_LEVEL_MASK=0x7e` to CFLAGS when building zlib-ng.
SystemZ deflate hardware acceleration is available on [IBM z15](
https://www.ibm.com/products/z15) and newer machines under the name [
"Integrated Accelerator for zEnterprise Data Compression"](
https://www.ibm.com/support/z-content-solutions/compression/). The
programming interface to it is a machine instruction called DEFLATE
CONVERSION CALL (DFLTCC). It is documented in Chapter 26 of [Principles
of Operation](https://publibfp.dhe.ibm.com/epubs/pdf/a227832c.pdf). Both
the code and the rest of this document refer to this feature simply as
"DFLTCC".
# Performance
Performance figures are published [here](
https://github.com/iii-i/zlib-ng/wiki/Performance-with-dfltcc-patch-applied-and-dfltcc-support-built-on-dfltcc-enabled-machine
). The compression speed-up can be as high as 110x and the decompression
speed-up can be as high as 15x.
# Limitations
Two DFLTCC compression calls with identical inputs are not guaranteed to
produce identical outputs. Therefore care should be taken when using
hardware compression when reproducible results are desired. In
particular, zlib-ng-specific `zng_deflateSetParams` call allows setting
`Z_DEFLATE_REPRODUCIBLE` parameter, which disables DFLTCC support for a
particular stream.
DFLTCC does not support every single zlib-ng feature, in particular:
* `inflate(Z_BLOCK)` and `inflate(Z_TREES)`
* `inflateMark()`
* `inflatePrime()`
* `inflateSyncPoint()`
When used, these functions will either switch to software, or, in case
this is not possible, gracefully fail.
# Code structure
All SystemZ-specific code lives in `arch/s390` directory and is
integrated with the rest of zlib-ng using hook macros.
## Hook macros
DFLTCC takes as arguments a parameter block, an input buffer, an output
buffer, and a window. Parameter blocks are stored alongside zlib states;
buffers are forwarded from the caller; and window - which must be
4k-aligned and is always 64k large, is managed using the `PAD_WINDOW()`,
`WINDOW_PAD_SIZE`, `HINT_ALIGNED_WINDOW` and `DEFLATE_ADJUST_WINDOW_SIZE()`
and `INFLATE_ADJUST_WINDOW_SIZE()` hooks.
Software and hardware window formats do not match, therefore,
`deflateSetDictionary()`, `deflateGetDictionary()`, `inflateSetDictionary()`
and `inflateGetDictionary()` need special handling, which is triggered using
`DEFLATE_SET_DICTIONARY_HOOK()`, `DEFLATE_GET_DICTIONARY_HOOK()`,
`INFLATE_SET_DICTIONARY_HOOK()` and `INFLATE_GET_DICTIONARY_HOOK()` macros.
`deflateResetKeep()` and `inflateResetKeep()` update the DFLTCC
parameter block using `DEFLATE_RESET_KEEP_HOOK()` and
`INFLATE_RESET_KEEP_HOOK()` macros.
`INFLATE_PRIME_HOOK()`, `INFLATE_MARK_HOOK()` and
`INFLATE_SYNC_POINT_HOOK()` macros make the respective unsupported
calls gracefully fail.
`DEFLATE_PARAMS_HOOK()` implements switching between hardware and
software compression mid-stream using `deflateParams()`. Switching
normally entails flushing the current block, which might not be possible
in low memory situations. `deflateParams()` uses `DEFLATE_DONE()` hook
in order to detect and gracefully handle such situations.
The algorithm implemented in hardware has different compression ratio
than the one implemented in software. `DEFLATE_BOUND_ADJUST_COMPLEN()`
and `DEFLATE_NEED_CONSERVATIVE_BOUND()` macros make `deflateBound()`
return the correct results for the hardware implementation.
Actual compression and decompression are handled by `DEFLATE_HOOK()` and
`INFLATE_TYPEDO_HOOK()` macros. Since inflation with DFLTCC manages the
window on its own, calling `updatewindow()` is suppressed using
`INFLATE_NEED_UPDATEWINDOW()` macro.
In addition to compression, DFLTCC computes CRC-32 and Adler-32
checksums, therefore, whenever it's used, software checksumming is
suppressed using `DEFLATE_NEED_CHECKSUM()` and `INFLATE_NEED_CHECKSUM()`
macros.
While software always produces reproducible compression results, this
is not the case for DFLTCC. Therefore, zlib-ng users are given the
ability to specify whether or not reproducible compression results
are required. While it is always possible to specify this setting
before the compression begins, it is not always possible to do so in
the middle of a deflate stream - the exact conditions for that are
determined by `DEFLATE_CAN_SET_REPRODUCIBLE()` macro.
## SystemZ-specific code
When zlib-ng is built with DFLTCC, the hooks described above are
converted to calls to functions, which are implemented in
`arch/s390/dfltcc_*` files. The functions can be grouped in three broad
categories:
* Base DFLTCC support, e.g. wrapping the machine instruction - `dfltcc()`.
* Translating between software and hardware data formats, e.g.
`dfltcc_deflate_set_dictionary()`.
* Translating between software and hardware state machines, e.g.
`dfltcc_deflate()` and `dfltcc_inflate()`.
The functions from the first two categories are fairly simple, however,
various quirks in both software and hardware state machines make the
functions from the third category quite complicated.
### `dfltcc_deflate()` function
This function is called by `deflate()` and has the following
responsibilities:
* Checking whether DFLTCC can be used with the current stream. If this
is not the case, then it returns `0`, making `deflate()` use some
other function in order to compress in software. Otherwise it returns
`1`.
* Block management and Huffman table generation. DFLTCC ends blocks only
when explicitly instructed to do so by the software. Furthermore,
whether to use fixed or dynamic Huffman tables must also be determined
by the software. Since looking at data in order to gather statistics
would negate performance benefits, the following approach is used: the
first `DFLTCC_FIRST_FHT_BLOCK_SIZE` bytes are placed into a fixed
block, and every next `DFLTCC_BLOCK_SIZE` bytes are placed into
dynamic blocks.
* Writing EOBS. Block Closing Control bit in the parameter block
instructs DFLTCC to write EOBS, however, certain conditions need to be
met: input data length must be non-zero or Continuation Flag must be
set. To put this in simpler terms, DFLTCC will silently refuse to
write EOBS if this is the only thing that it is asked to do. Since the
code has to be able to emit EOBS in software anyway, in order to avoid
tricky corner cases Block Closing Control is never used. Whether to
write EOBS is instead controlled by `soft_bcc` variable.
* Triggering block post-processing. Depending on flush mode, `deflate()`
must perform various additional actions when a block or a stream ends.
`dfltcc_deflate()` informs `deflate()` about this using
`block_state *result` parameter.
* Converting software state fields into hardware parameter block fields,
and vice versa. For example, `wrap` and Check Value Type or `bi_valid`
and Sub-Byte Boundary. Certain fields cannot be translated and must
persist untouched in the parameter block between calls, for example,
Continuation Flag or Continuation State Buffer.
* Handling flush modes and low-memory situations. These aspects are
quite intertwined and pervasive. The general idea here is that the
code must not do anything in software - whether explicitly by e.g.
calling `send_eobs()`, or implicitly - by returning to `deflate()`
with certain return and `*result` values, when Continuation Flag is
set.
* Ending streams. When a new block is started and flush mode is
`Z_FINISH`, Block Header Final parameter block bit is used to mark
this block as final. However, sometimes an empty final block is
needed, and, unfortunately, just like with EOBS, DFLTCC will silently
refuse to do this. The general idea of DFLTCC implementation is to
rely as much as possible on the existing code. Here in order to do
this, the code pretends that it does not support DFLTCC, which makes
`deflate()` call a software compression function, which writes an
empty final block. Whether this is required is controlled by
`need_empty_block` variable.
* Error handling. This is simply converting
Operation-Ending-Supplemental Code to string. Errors can only happen
due to things like memory corruption, and therefore they don't affect
the `deflate()` return code.
### `dfltcc_inflate()` function
This function is called by `inflate()` from the `TYPEDO` state (that is,
when all the metadata is parsed and the stream is positioned at the type
bits of deflate block header) and it's responsible for the following:
* Falling back to software when flush mode is `Z_BLOCK` or `Z_TREES`.
Unfortunately, there is no way to ask DFLTCC to stop decompressing on
block or tree boundary.
* `inflate()` decompression loop management. This is controlled using
the return value, which can be either `DFLTCC_INFLATE_BREAK` or
`DFLTCC_INFLATE_CONTINUE`.
* Converting software state fields into hardware parameter block fields,
and vice versa. For example, `whave` and History Length or `wnext` and
History Offset.
* Ending streams. This instructs `inflate()` to return `Z_STREAM_END`
and is controlled by `last` state field.
* Error handling. Like deflate, error handling comprises
Operation-Ending-Supplemental Code to string conversion. Unlike
deflate, errors may happen due to bad inputs, therefore they are
propagated to `inflate()` by setting `mode` field to `MEM` or `BAD`.
# Testing
Given complexity of DFLTCC machine instruction, it is not clear whether
QEMU TCG will ever support it. At the time of writing, one has to have
access to an IBM z15+ VM or LPAR in order to test DFLTCC support. Since
DFLTCC is a non-privileged instruction, neither special VM/LPAR
configuration nor root are required.
zlib-ng CI uses an IBM-provided z15 self-hosted builder for the DFLTCC
testing. There is no official IBM Z GitHub Actions runner, so we build
one inspired by `anup-kodlekere/gaplib`.
Future updates to actions-runner might need an updated patch. The .net
version number patch has been separated into a separate file to avoid a
need for constantly changing the patch.
## Configuring the builder.
### Install prerequisites.
```
sudo dnf install podman
```
### Add actions-runner service.
```
sudo cp self-hosted-builder/actions-runner.service /etc/systemd/system/
sudo systemctl daemon-reload
```
### Create a config file, needs github personal access token.
```
# Create file /etc/actions-runner
repo=<owner>/<name>
access_token=<ghp_***>
```
Access token should have the repo scope, consult
https://docs.github.com/en/rest/reference/actions#create-a-registration-token-for-a-repository
for details.
### Autostart actions-runner.
```
$ sudo systemctl enable --now actions-runner
```
## Rebuilding the container
In order to update the `gaplib-actions-runner` podman container, e.g. to get the
latest OS security fixes, follow these steps:
```
# Stop actions-runner service
sudo systemctl stop actions-runner
# Delete old container
sudo podman container rm gaplib-actions-runner
# Delete old image
sudo podman image rm localhost/zlib-ng/actions-runner
# Build image
sudo podman build --squash -f Dockerfile.zlib-ng --tag zlib-ng/actions-runner --build-arg .
# Build container
sudo podman create --name=gaplib-actions-runner --env-file=/etc/actions-runner --init --interactive --volume=actions-runner-temp:/home/actions-runner zlib-ng/actions-runner
# Start actions-runner service
sudo systemctl start actions-runner
```

222
3rdparty/zlib-ng/arch/s390/crc32-vx.c vendored Normal file
View File

@ -0,0 +1,222 @@
/*
* Hardware-accelerated CRC-32 variants for Linux on z Systems
*
* Use the z/Architecture Vector Extension Facility to accelerate the
* computing of bitreflected CRC-32 checksums.
*
* This CRC-32 implementation algorithm is bitreflected and processes
* the least-significant bit first (Little-Endian).
*
* This code was originally written by Hendrik Brueckner
* <brueckner@linux.vnet.ibm.com> for use in the Linux kernel and has been
* relicensed under the zlib license.
*/
#include "zbuild.h"
#include "arch_functions.h"
#include <vecintrin.h>
typedef unsigned char uv16qi __attribute__((vector_size(16)));
typedef unsigned int uv4si __attribute__((vector_size(16)));
typedef unsigned long long uv2di __attribute__((vector_size(16)));
static uint32_t crc32_le_vgfm_16(uint32_t crc, const uint8_t *buf, size_t len) {
/*
* The CRC-32 constant block contains reduction constants to fold and
* process particular chunks of the input data stream in parallel.
*
* For the CRC-32 variants, the constants are precomputed according to
* these definitions:
*
* R1 = [(x4*128+32 mod P'(x) << 32)]' << 1
* R2 = [(x4*128-32 mod P'(x) << 32)]' << 1
* R3 = [(x128+32 mod P'(x) << 32)]' << 1
* R4 = [(x128-32 mod P'(x) << 32)]' << 1
* R5 = [(x64 mod P'(x) << 32)]' << 1
* R6 = [(x32 mod P'(x) << 32)]' << 1
*
* The bitreflected Barret reduction constant, u', is defined as
* the bit reversal of floor(x**64 / P(x)).
*
* where P(x) is the polynomial in the normal domain and the P'(x) is the
* polynomial in the reversed (bitreflected) domain.
*
* CRC-32 (IEEE 802.3 Ethernet, ...) polynomials:
*
* P(x) = 0x04C11DB7
* P'(x) = 0xEDB88320
*/
const uv16qi perm_le2be = {15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0}; /* BE->LE mask */
const uv2di r2r1 = {0x1C6E41596, 0x154442BD4}; /* R2, R1 */
const uv2di r4r3 = {0x0CCAA009E, 0x1751997D0}; /* R4, R3 */
const uv2di r5 = {0, 0x163CD6124}; /* R5 */
const uv2di ru_poly = {0, 0x1F7011641}; /* u' */
const uv2di crc_poly = {0, 0x1DB710641}; /* P'(x) << 1 */
/*
* Load the initial CRC value.
*
* The CRC value is loaded into the rightmost word of the
* vector register and is later XORed with the LSB portion
* of the loaded input data.
*/
uv2di v0 = {0, 0};
v0 = (uv2di)vec_insert(crc, (uv4si)v0, 3);
/* Load a 64-byte data chunk and XOR with CRC */
uv2di v1 = vec_perm(((uv2di *)buf)[0], ((uv2di *)buf)[0], perm_le2be);
uv2di v2 = vec_perm(((uv2di *)buf)[1], ((uv2di *)buf)[1], perm_le2be);
uv2di v3 = vec_perm(((uv2di *)buf)[2], ((uv2di *)buf)[2], perm_le2be);
uv2di v4 = vec_perm(((uv2di *)buf)[3], ((uv2di *)buf)[3], perm_le2be);
v1 ^= v0;
buf += 64;
len -= 64;
while (len >= 64) {
/* Load the next 64-byte data chunk */
uv16qi part1 = vec_perm(((uv16qi *)buf)[0], ((uv16qi *)buf)[0], perm_le2be);
uv16qi part2 = vec_perm(((uv16qi *)buf)[1], ((uv16qi *)buf)[1], perm_le2be);
uv16qi part3 = vec_perm(((uv16qi *)buf)[2], ((uv16qi *)buf)[2], perm_le2be);
uv16qi part4 = vec_perm(((uv16qi *)buf)[3], ((uv16qi *)buf)[3], perm_le2be);
/*
* Perform a GF(2) multiplication of the doublewords in V1 with
* the R1 and R2 reduction constants in V0. The intermediate result
* is then folded (accumulated) with the next data chunk in PART1 and
* stored in V1. Repeat this step for the register contents
* in V2, V3, and V4 respectively.
*/
v1 = (uv2di)vec_gfmsum_accum_128(r2r1, v1, part1);
v2 = (uv2di)vec_gfmsum_accum_128(r2r1, v2, part2);
v3 = (uv2di)vec_gfmsum_accum_128(r2r1, v3, part3);
v4 = (uv2di)vec_gfmsum_accum_128(r2r1, v4, part4);
buf += 64;
len -= 64;
}
/*
* Fold V1 to V4 into a single 128-bit value in V1. Multiply V1 with R3
* and R4 and accumulating the next 128-bit chunk until a single 128-bit
* value remains.
*/
v1 = (uv2di)vec_gfmsum_accum_128(r4r3, v1, (uv16qi)v2);
v1 = (uv2di)vec_gfmsum_accum_128(r4r3, v1, (uv16qi)v3);
v1 = (uv2di)vec_gfmsum_accum_128(r4r3, v1, (uv16qi)v4);
while (len >= 16) {
/* Load next data chunk */
v2 = vec_perm(*(uv2di *)buf, *(uv2di *)buf, perm_le2be);
/* Fold next data chunk */
v1 = (uv2di)vec_gfmsum_accum_128(r4r3, v1, (uv16qi)v2);
buf += 16;
len -= 16;
}
/*
* Set up a vector register for byte shifts. The shift value must
* be loaded in bits 1-4 in byte element 7 of a vector register.
* Shift by 8 bytes: 0x40
* Shift by 4 bytes: 0x20
*/
uv16qi v9 = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0};
v9 = vec_insert((unsigned char)0x40, v9, 7);
/*
* Prepare V0 for the next GF(2) multiplication: shift V0 by 8 bytes
* to move R4 into the rightmost doubleword and set the leftmost
* doubleword to 0x1.
*/
v0 = vec_srb(r4r3, (uv2di)v9);
v0[0] = 1;
/*
* Compute GF(2) product of V1 and V0. The rightmost doubleword
* of V1 is multiplied with R4. The leftmost doubleword of V1 is
* multiplied by 0x1 and is then XORed with rightmost product.
* Implicitly, the intermediate leftmost product becomes padded
*/
v1 = (uv2di)vec_gfmsum_128(v0, v1);
/*
* Now do the final 32-bit fold by multiplying the rightmost word
* in V1 with R5 and XOR the result with the remaining bits in V1.
*
* To achieve this by a single VGFMAG, right shift V1 by a word
* and store the result in V2 which is then accumulated. Use the
* vector unpack instruction to load the rightmost half of the
* doubleword into the rightmost doubleword element of V1; the other
* half is loaded in the leftmost doubleword.
* The vector register with CONST_R5 contains the R5 constant in the
* rightmost doubleword and the leftmost doubleword is zero to ignore
* the leftmost product of V1.
*/
v9 = vec_insert((unsigned char)0x20, v9, 7);
v2 = vec_srb(v1, (uv2di)v9);
v1 = vec_unpackl((uv4si)v1); /* Split rightmost doubleword */
v1 = (uv2di)vec_gfmsum_accum_128(r5, v1, (uv16qi)v2);
/*
* Apply a Barret reduction to compute the final 32-bit CRC value.
*
* The input values to the Barret reduction are the degree-63 polynomial
* in V1 (R(x)), degree-32 generator polynomial, and the reduction
* constant u. The Barret reduction result is the CRC value of R(x) mod
* P(x).
*
* The Barret reduction algorithm is defined as:
*
* 1. T1(x) = floor( R(x) / x^32 ) GF2MUL u
* 2. T2(x) = floor( T1(x) / x^32 ) GF2MUL P(x)
* 3. C(x) = R(x) XOR T2(x) mod x^32
*
* Note: The leftmost doubleword of vector register containing
* CONST_RU_POLY is zero and, thus, the intermediate GF(2) product
* is zero and does not contribute to the final result.
*/
/* T1(x) = floor( R(x) / x^32 ) GF2MUL u */
v2 = vec_unpackl((uv4si)v1);
v2 = (uv2di)vec_gfmsum_128(ru_poly, v2);
/*
* Compute the GF(2) product of the CRC polynomial with T1(x) in
* V2 and XOR the intermediate result, T2(x), with the value in V1.
* The final result is stored in word element 2 of V2.
*/
v2 = vec_unpackl((uv4si)v2);
v2 = (uv2di)vec_gfmsum_accum_128(crc_poly, v2, (uv16qi)v1);
return ((uv4si)v2)[2];
}
#define VX_MIN_LEN 64
#define VX_ALIGNMENT 16L
#define VX_ALIGN_MASK (VX_ALIGNMENT - 1)
uint32_t Z_INTERNAL crc32_s390_vx(uint32_t crc, const unsigned char *buf, size_t len) {
size_t prealign, aligned, remaining;
if (len < VX_MIN_LEN + VX_ALIGN_MASK)
return PREFIX(crc32_braid)(crc, buf, len);
if ((uintptr_t)buf & VX_ALIGN_MASK) {
prealign = VX_ALIGNMENT - ((uintptr_t)buf & VX_ALIGN_MASK);
len -= prealign;
crc = PREFIX(crc32_braid)(crc, buf, prealign);
buf += prealign;
}
aligned = len & ~VX_ALIGN_MASK;
remaining = len & VX_ALIGN_MASK;
crc = crc32_le_vgfm_16(crc ^ 0xffffffff, buf, aligned) ^ 0xffffffff;
if (remaining)
crc = PREFIX(crc32_braid)(crc, buf + aligned, remaining);
return crc;
}

View File

@ -0,0 +1,119 @@
#ifndef DFLTCC_COMMON_H
#define DFLTCC_COMMON_H
#include "zutil.h"
/*
Parameter Block for Query Available Functions.
*/
struct dfltcc_qaf_param {
char fns[16];
char reserved1[8];
char fmts[2];
char reserved2[6];
} ALIGNED_(8);
/*
Parameter Block for Generate Dynamic-Huffman Table, Compress and Expand.
*/
struct dfltcc_param_v0 {
uint16_t pbvn; /* Parameter-Block-Version Number */
uint8_t mvn; /* Model-Version Number */
uint8_t ribm; /* Reserved for IBM use */
uint32_t reserved32 : 31;
uint32_t cf : 1; /* Continuation Flag */
uint8_t reserved64[8];
uint32_t nt : 1; /* New Task */
uint32_t reserved129 : 1;
uint32_t cvt : 1; /* Check Value Type */
uint32_t reserved131 : 1;
uint32_t htt : 1; /* Huffman-Table Type */
uint32_t bcf : 1; /* Block-Continuation Flag */
uint32_t bcc : 1; /* Block Closing Control */
uint32_t bhf : 1; /* Block Header Final */
uint32_t reserved136 : 1;
uint32_t reserved137 : 1;
uint32_t dhtgc : 1; /* DHT Generation Control */
uint32_t reserved139 : 5;
uint32_t reserved144 : 5;
uint32_t sbb : 3; /* Sub-Byte Boundary */
uint8_t oesc; /* Operation-Ending-Supplemental Code */
uint32_t reserved160 : 12;
uint32_t ifs : 4; /* Incomplete-Function Status */
uint16_t ifl; /* Incomplete-Function Length */
uint8_t reserved192[8];
uint8_t reserved256[8];
uint8_t reserved320[4];
uint16_t hl; /* History Length */
uint32_t reserved368 : 1;
uint16_t ho : 15; /* History Offset */
uint32_t cv; /* Check Value */
uint32_t eobs : 15; /* End-of-block Symbol */
uint32_t reserved431: 1;
uint8_t eobl : 4; /* End-of-block Length */
uint32_t reserved436 : 12;
uint32_t reserved448 : 4;
uint16_t cdhtl : 12; /* Compressed-Dynamic-Huffman Table
Length */
uint8_t reserved464[6];
uint8_t cdht[288]; /* Compressed-Dynamic-Huffman Table */
uint8_t reserved[24];
uint8_t ribm2[8]; /* Reserved for IBM use */
uint8_t csb[1152]; /* Continuation-State Buffer */
} ALIGNED_(8);
/*
Extension of inflate_state and deflate_state.
*/
struct dfltcc_state {
struct dfltcc_param_v0 param; /* Parameter block. */
struct dfltcc_qaf_param af; /* Available functions. */
char msg[64]; /* Buffer for strm->msg */
};
typedef struct {
struct dfltcc_state common;
uint16_t level_mask; /* Levels on which to use DFLTCC */
uint32_t block_size; /* New block each X bytes */
size_t block_threshold; /* New block after total_in > X */
uint32_t dht_threshold; /* New block only if avail_in >= X */
} arch_deflate_state;
typedef struct {
struct dfltcc_state common;
} arch_inflate_state;
/*
History buffer size.
*/
#define HB_BITS 15
#define HB_SIZE (1 << HB_BITS)
/*
Sizes of deflate block parts.
*/
#define DFLTCC_BLOCK_HEADER_BITS 3
#define DFLTCC_HLITS_COUNT_BITS 5
#define DFLTCC_HDISTS_COUNT_BITS 5
#define DFLTCC_HCLENS_COUNT_BITS 4
#define DFLTCC_MAX_HCLENS 19
#define DFLTCC_HCLEN_BITS 3
#define DFLTCC_MAX_HLITS 286
#define DFLTCC_MAX_HDISTS 30
#define DFLTCC_MAX_HLIT_HDIST_BITS 7
#define DFLTCC_MAX_SYMBOL_BITS 16
#define DFLTCC_MAX_EOBS_BITS 15
#define DFLTCC_MAX_PADDING_BITS 7
#define DEFLATE_BOUND_COMPLEN(source_len) \
((DFLTCC_BLOCK_HEADER_BITS + \
DFLTCC_HLITS_COUNT_BITS + \
DFLTCC_HDISTS_COUNT_BITS + \
DFLTCC_HCLENS_COUNT_BITS + \
DFLTCC_MAX_HCLENS * DFLTCC_HCLEN_BITS + \
(DFLTCC_MAX_HLITS + DFLTCC_MAX_HDISTS) * DFLTCC_MAX_HLIT_HDIST_BITS + \
(source_len) * DFLTCC_MAX_SYMBOL_BITS + \
DFLTCC_MAX_EOBS_BITS + \
DFLTCC_MAX_PADDING_BITS) >> 3)
#endif

View File

@ -0,0 +1,383 @@
/* dfltcc_deflate.c - IBM Z DEFLATE CONVERSION CALL compression support. */
/*
Use the following commands to build zlib-ng with DFLTCC compression support:
$ ./configure --with-dfltcc-deflate
or
$ cmake -DWITH_DFLTCC_DEFLATE=1 .
and then
$ make
*/
#include "zbuild.h"
#include "deflate.h"
#include "trees_emit.h"
#include "dfltcc_deflate.h"
#include "dfltcc_detail.h"
void Z_INTERNAL PREFIX(dfltcc_reset_deflate_state)(PREFIX3(streamp) strm) {
deflate_state *state = (deflate_state *)strm->state;
arch_deflate_state *dfltcc_state = &state->arch;
dfltcc_reset_state(&dfltcc_state->common);
/* Initialize tuning parameters */
dfltcc_state->level_mask = DFLTCC_LEVEL_MASK;
dfltcc_state->block_size = DFLTCC_BLOCK_SIZE;
dfltcc_state->block_threshold = DFLTCC_FIRST_FHT_BLOCK_SIZE;
dfltcc_state->dht_threshold = DFLTCC_DHT_MIN_SAMPLE_SIZE;
}
static inline int dfltcc_can_deflate_with_params(PREFIX3(streamp) strm, int level, uInt window_bits, int strategy,
int reproducible) {
deflate_state *state = (deflate_state *)strm->state;
arch_deflate_state *dfltcc_state = &state->arch;
/* Unsupported compression settings */
if ((dfltcc_state->level_mask & (1 << level)) == 0)
return 0;
if (window_bits != HB_BITS)
return 0;
if (strategy != Z_FIXED && strategy != Z_DEFAULT_STRATEGY)
return 0;
if (reproducible)
return 0;
/* Unsupported hardware */
if (!is_bit_set(dfltcc_state->common.af.fns, DFLTCC_GDHT) ||
!is_bit_set(dfltcc_state->common.af.fns, DFLTCC_CMPR) ||
!is_bit_set(dfltcc_state->common.af.fmts, DFLTCC_FMT0))
return 0;
return 1;
}
int Z_INTERNAL PREFIX(dfltcc_can_deflate)(PREFIX3(streamp) strm) {
deflate_state *state = (deflate_state *)strm->state;
return dfltcc_can_deflate_with_params(strm, state->level, state->w_bits, state->strategy, state->reproducible);
}
static inline void dfltcc_gdht(PREFIX3(streamp) strm) {
deflate_state *state = (deflate_state *)strm->state;
struct dfltcc_param_v0 *param = &state->arch.common.param;
size_t avail_in = strm->avail_in;
dfltcc(DFLTCC_GDHT, param, NULL, NULL, &strm->next_in, &avail_in, NULL);
}
static inline dfltcc_cc dfltcc_cmpr(PREFIX3(streamp) strm) {
deflate_state *state = (deflate_state *)strm->state;
struct dfltcc_param_v0 *param = &state->arch.common.param;
size_t avail_in = strm->avail_in;
size_t avail_out = strm->avail_out;
dfltcc_cc cc;
cc = dfltcc(DFLTCC_CMPR | HBT_CIRCULAR,
param, &strm->next_out, &avail_out,
&strm->next_in, &avail_in, state->window);
strm->total_in += (strm->avail_in - avail_in);
strm->total_out += (strm->avail_out - avail_out);
strm->avail_in = avail_in;
strm->avail_out = avail_out;
return cc;
}
static inline void send_eobs(PREFIX3(streamp) strm, const struct dfltcc_param_v0 *param) {
deflate_state *state = (deflate_state *)strm->state;
send_bits(state, PREFIX(bi_reverse)(param->eobs >> (15 - param->eobl), param->eobl), param->eobl, state->bi_buf, state->bi_valid);
PREFIX(flush_pending)(strm);
if (state->pending != 0) {
/* The remaining data is located in pending_out[0:pending]. If someone
* calls put_byte() - this might happen in deflate() - the byte will be
* placed into pending_buf[pending], which is incorrect. Move the
* remaining data to the beginning of pending_buf so that put_byte() is
* usable again.
*/
memmove(state->pending_buf, state->pending_out, state->pending);
state->pending_out = state->pending_buf;
}
#ifdef ZLIB_DEBUG
state->compressed_len += param->eobl;
#endif
}
int Z_INTERNAL PREFIX(dfltcc_deflate)(PREFIX3(streamp) strm, int flush, block_state *result) {
deflate_state *state = (deflate_state *)strm->state;
arch_deflate_state *dfltcc_state = &state->arch;
struct dfltcc_param_v0 *param = &dfltcc_state->common.param;
uInt masked_avail_in;
dfltcc_cc cc;
int need_empty_block;
int soft_bcc;
int no_flush;
if (!PREFIX(dfltcc_can_deflate)(strm)) {
/* Clear history. */
if (flush == Z_FULL_FLUSH)
param->hl = 0;
return 0;
}
again:
masked_avail_in = 0;
soft_bcc = 0;
no_flush = flush == Z_NO_FLUSH;
/* No input data. Return, except when Continuation Flag is set, which means
* that DFLTCC has buffered some output in the parameter block and needs to
* be called again in order to flush it.
*/
if (strm->avail_in == 0 && !param->cf) {
/* A block is still open, and the hardware does not support closing
* blocks without adding data. Thus, close it manually.
*/
if (!no_flush && param->bcf) {
send_eobs(strm, param);
param->bcf = 0;
}
/* Let one of deflate_* functions write a trailing empty block. */
if (flush == Z_FINISH)
return 0;
/* Clear history. */
if (flush == Z_FULL_FLUSH)
param->hl = 0;
/* Trigger block post-processing if necessary. */
*result = no_flush ? need_more : block_done;
return 1;
}
/* There is an open non-BFINAL block, we are not going to close it just
* yet, we have compressed more than DFLTCC_BLOCK_SIZE bytes and we see
* more than DFLTCC_DHT_MIN_SAMPLE_SIZE bytes. Open a new block with a new
* DHT in order to adapt to a possibly changed input data distribution.
*/
if (param->bcf && no_flush &&
strm->total_in > dfltcc_state->block_threshold &&
strm->avail_in >= dfltcc_state->dht_threshold) {
if (param->cf) {
/* We need to flush the DFLTCC buffer before writing the
* End-of-block Symbol. Mask the input data and proceed as usual.
*/
masked_avail_in += strm->avail_in;
strm->avail_in = 0;
no_flush = 0;
} else {
/* DFLTCC buffer is empty, so we can manually write the
* End-of-block Symbol right away.
*/
send_eobs(strm, param);
param->bcf = 0;
dfltcc_state->block_threshold = strm->total_in + dfltcc_state->block_size;
}
}
/* No space for compressed data. If we proceed, dfltcc_cmpr() will return
* DFLTCC_CC_OP1_TOO_SHORT without buffering header bits, but we will still
* set BCF=1, which is wrong. Avoid complications and return early.
*/
if (strm->avail_out == 0) {
*result = need_more;
return 1;
}
/* The caller gave us too much data. Pass only one block worth of
* uncompressed data to DFLTCC and mask the rest, so that on the next
* iteration we start a new block.
*/
if (no_flush && strm->avail_in > dfltcc_state->block_size) {
masked_avail_in += (strm->avail_in - dfltcc_state->block_size);
strm->avail_in = dfltcc_state->block_size;
}
/* When we have an open non-BFINAL deflate block and caller indicates that
* the stream is ending, we need to close an open deflate block and open a
* BFINAL one.
*/
need_empty_block = flush == Z_FINISH && param->bcf && !param->bhf;
/* Translate stream to parameter block */
param->cvt = state->wrap == 2 ? CVT_CRC32 : CVT_ADLER32;
if (!no_flush)
/* We need to close a block. Always do this in software - when there is
* no input data, the hardware will not honor BCC. */
soft_bcc = 1;
if (flush == Z_FINISH && !param->bcf)
/* We are about to open a BFINAL block, set Block Header Final bit
* until the stream ends.
*/
param->bhf = 1;
/* DFLTCC-CMPR will write to next_out, so make sure that buffers with
* higher precedence are empty.
*/
Assert(state->pending == 0, "There must be no pending bytes");
Assert(state->bi_valid < 8, "There must be less than 8 pending bits");
param->sbb = (unsigned int)state->bi_valid;
if (param->sbb > 0)
*strm->next_out = (unsigned char)state->bi_buf;
/* Honor history and check value */
param->nt = 0;
if (state->wrap == 1)
param->cv = strm->adler;
else if (state->wrap == 2)
param->cv = ZSWAP32(state->crc_fold.value);
/* When opening a block, choose a Huffman-Table Type */
if (!param->bcf) {
if (state->strategy == Z_FIXED || (strm->total_in == 0 && dfltcc_state->block_threshold > 0))
param->htt = HTT_FIXED;
else {
param->htt = HTT_DYNAMIC;
dfltcc_gdht(strm);
}
}
/* Deflate */
do {
cc = dfltcc_cmpr(strm);
if (strm->avail_in < 4096 && masked_avail_in > 0)
/* We are about to call DFLTCC with a small input buffer, which is
* inefficient. Since there is masked data, there will be at least
* one more DFLTCC call, so skip the current one and make the next
* one handle more data.
*/
break;
} while (cc == DFLTCC_CC_AGAIN);
/* Translate parameter block to stream */
strm->msg = oesc_msg(dfltcc_state->common.msg, param->oesc);
state->bi_valid = param->sbb;
if (state->bi_valid == 0)
state->bi_buf = 0; /* Avoid accessing next_out */
else
state->bi_buf = *strm->next_out & ((1 << state->bi_valid) - 1);
if (state->wrap == 1)
strm->adler = param->cv;
else if (state->wrap == 2)
state->crc_fold.value = ZSWAP32(param->cv);
/* Unmask the input data */
strm->avail_in += masked_avail_in;
masked_avail_in = 0;
/* If we encounter an error, it means there is a bug in DFLTCC call */
Assert(cc != DFLTCC_CC_OP2_CORRUPT || param->oesc == 0, "BUG");
/* Update Block-Continuation Flag. It will be used to check whether to call
* GDHT the next time.
*/
if (cc == DFLTCC_CC_OK) {
if (soft_bcc) {
send_eobs(strm, param);
param->bcf = 0;
dfltcc_state->block_threshold = strm->total_in + dfltcc_state->block_size;
} else
param->bcf = 1;
if (flush == Z_FINISH) {
if (need_empty_block)
/* Make the current deflate() call also close the stream */
return 0;
else {
bi_windup(state);
*result = finish_done;
}
} else {
if (flush == Z_FULL_FLUSH)
param->hl = 0; /* Clear history */
*result = flush == Z_NO_FLUSH ? need_more : block_done;
}
} else {
param->bcf = 1;
*result = need_more;
}
if (strm->avail_in != 0 && strm->avail_out != 0)
goto again; /* deflate() must use all input or all output */
return 1;
}
/*
Switching between hardware and software compression.
DFLTCC does not support all zlib settings, e.g. generation of non-compressed
blocks or alternative window sizes. When such settings are applied on the
fly with deflateParams, we need to convert between hardware and software
window formats.
*/
static int dfltcc_was_deflate_used(PREFIX3(streamp) strm) {
deflate_state *state = (deflate_state *)strm->state;
struct dfltcc_param_v0 *param = &state->arch.common.param;
return strm->total_in > 0 || param->nt == 0 || param->hl > 0;
}
int Z_INTERNAL PREFIX(dfltcc_deflate_params)(PREFIX3(streamp) strm, int level, int strategy, int *flush) {
deflate_state *state = (deflate_state *)strm->state;
int could_deflate = PREFIX(dfltcc_can_deflate)(strm);
int can_deflate = dfltcc_can_deflate_with_params(strm, level, state->w_bits, strategy, state->reproducible);
if (can_deflate == could_deflate)
/* We continue to work in the same mode - no changes needed */
return Z_OK;
if (!dfltcc_was_deflate_used(strm))
/* DFLTCC was not used yet - no changes needed */
return Z_OK;
/* For now, do not convert between window formats - simply get rid of the old data instead */
*flush = Z_FULL_FLUSH;
return Z_OK;
}
int Z_INTERNAL PREFIX(dfltcc_deflate_done)(PREFIX3(streamp) strm, int flush) {
deflate_state *state = (deflate_state *)strm->state;
struct dfltcc_param_v0 *param = &state->arch.common.param;
/* When deflate(Z_FULL_FLUSH) is called with small avail_out, it might
* close the block without resetting the compression state. Detect this
* situation and return that deflation is not done.
*/
if (flush == Z_FULL_FLUSH && strm->avail_out == 0)
return 0;
/* Return that deflation is not done if DFLTCC is used and either it
* buffered some data (Continuation Flag is set), or has not written EOBS
* yet (Block-Continuation Flag is set).
*/
return !PREFIX(dfltcc_can_deflate)(strm) || (!param->cf && !param->bcf);
}
int Z_INTERNAL PREFIX(dfltcc_can_set_reproducible)(PREFIX3(streamp) strm, int reproducible) {
deflate_state *state = (deflate_state *)strm->state;
return reproducible != state->reproducible && !dfltcc_was_deflate_used(strm);
}
/*
Preloading history.
*/
int Z_INTERNAL PREFIX(dfltcc_deflate_set_dictionary)(PREFIX3(streamp) strm,
const unsigned char *dictionary, uInt dict_length) {
deflate_state *state = (deflate_state *)strm->state;
struct dfltcc_param_v0 *param = &state->arch.common.param;
append_history(param, state->window, dictionary, dict_length);
state->strstart = 1; /* Add FDICT to zlib header */
state->block_start = state->strstart; /* Make deflate_stored happy */
return Z_OK;
}
int Z_INTERNAL PREFIX(dfltcc_deflate_get_dictionary)(PREFIX3(streamp) strm, unsigned char *dictionary, uInt *dict_length) {
deflate_state *state = (deflate_state *)strm->state;
struct dfltcc_param_v0 *param = &state->arch.common.param;
if (dictionary)
get_history(param, state->window, dictionary);
if (dict_length)
*dict_length = param->hl;
return Z_OK;
}

View File

@ -0,0 +1,58 @@
#ifndef DFLTCC_DEFLATE_H
#define DFLTCC_DEFLATE_H
#include "deflate.h"
#include "dfltcc_common.h"
void Z_INTERNAL PREFIX(dfltcc_reset_deflate_state)(PREFIX3(streamp));
int Z_INTERNAL PREFIX(dfltcc_can_deflate)(PREFIX3(streamp) strm);
int Z_INTERNAL PREFIX(dfltcc_deflate)(PREFIX3(streamp) strm, int flush, block_state *result);
int Z_INTERNAL PREFIX(dfltcc_deflate_params)(PREFIX3(streamp) strm, int level, int strategy, int *flush);
int Z_INTERNAL PREFIX(dfltcc_deflate_done)(PREFIX3(streamp) strm, int flush);
int Z_INTERNAL PREFIX(dfltcc_can_set_reproducible)(PREFIX3(streamp) strm, int reproducible);
int Z_INTERNAL PREFIX(dfltcc_deflate_set_dictionary)(PREFIX3(streamp) strm,
const unsigned char *dictionary, uInt dict_length);
int Z_INTERNAL PREFIX(dfltcc_deflate_get_dictionary)(PREFIX3(streamp) strm, unsigned char *dictionary, uInt* dict_length);
#define DEFLATE_SET_DICTIONARY_HOOK(strm, dict, dict_len) \
do { \
if (PREFIX(dfltcc_can_deflate)((strm))) \
return PREFIX(dfltcc_deflate_set_dictionary)((strm), (dict), (dict_len)); \
} while (0)
#define DEFLATE_GET_DICTIONARY_HOOK(strm, dict, dict_len) \
do { \
if (PREFIX(dfltcc_can_deflate)((strm))) \
return PREFIX(dfltcc_deflate_get_dictionary)((strm), (dict), (dict_len)); \
} while (0)
#define DEFLATE_RESET_KEEP_HOOK PREFIX(dfltcc_reset_deflate_state)
#define DEFLATE_PARAMS_HOOK(strm, level, strategy, hook_flush) \
do { \
int err; \
\
err = PREFIX(dfltcc_deflate_params)((strm), (level), (strategy), (hook_flush)); \
if (err == Z_STREAM_ERROR) \
return err; \
} while (0)
#define DEFLATE_DONE PREFIX(dfltcc_deflate_done)
#define DEFLATE_BOUND_ADJUST_COMPLEN(strm, complen, source_len) \
do { \
if (deflateStateCheck((strm)) || PREFIX(dfltcc_can_deflate)((strm))) \
(complen) = DEFLATE_BOUND_COMPLEN(source_len); \
} while (0)
#define DEFLATE_NEED_CONSERVATIVE_BOUND(strm) (PREFIX(dfltcc_can_deflate)((strm)))
#define DEFLATE_HOOK PREFIX(dfltcc_deflate)
#define DEFLATE_NEED_CHECKSUM(strm) (!PREFIX(dfltcc_can_deflate)((strm)))
#define DEFLATE_CAN_SET_REPRODUCIBLE PREFIX(dfltcc_can_set_reproducible)
#define DEFLATE_ADJUST_WINDOW_SIZE(n) MAX(n, HB_SIZE)
#endif

View File

@ -0,0 +1,275 @@
#include "zbuild.h"
#include <stdio.h>
#ifdef HAVE_SYS_SDT_H
#include <sys/sdt.h>
#endif
/*
Tuning parameters.
*/
#ifndef DFLTCC_LEVEL_MASK
#define DFLTCC_LEVEL_MASK 0x2
#endif
#ifndef DFLTCC_BLOCK_SIZE
#define DFLTCC_BLOCK_SIZE 1048576
#endif
#ifndef DFLTCC_FIRST_FHT_BLOCK_SIZE
#define DFLTCC_FIRST_FHT_BLOCK_SIZE 4096
#endif
#ifndef DFLTCC_DHT_MIN_SAMPLE_SIZE
#define DFLTCC_DHT_MIN_SAMPLE_SIZE 4096
#endif
#ifndef DFLTCC_RIBM
#define DFLTCC_RIBM 0
#endif
#define static_assert(c, msg) __attribute__((unused)) static char static_assert_failed_ ## msg[c ? 1 : -1]
#define DFLTCC_SIZEOF_QAF 32
static_assert(sizeof(struct dfltcc_qaf_param) == DFLTCC_SIZEOF_QAF, qaf);
static inline int is_bit_set(const char *bits, int n) {
return bits[n / 8] & (1 << (7 - (n % 8)));
}
static inline void clear_bit(char *bits, int n) {
bits[n / 8] &= ~(1 << (7 - (n % 8)));
}
#define DFLTCC_FACILITY 151
static inline int is_dfltcc_enabled(void) {
uint64_t facilities[(DFLTCC_FACILITY / 64) + 1];
Z_REGISTER uint8_t r0 __asm__("r0");
memset(facilities, 0, sizeof(facilities));
r0 = sizeof(facilities) / sizeof(facilities[0]) - 1;
/* STFLE is supported since z9-109 and only in z/Architecture mode. When
* compiling with -m31, gcc defaults to ESA mode, however, since the kernel
* is 64-bit, it's always z/Architecture mode at runtime.
*/
__asm__ volatile(
#ifndef __clang__
".machinemode push\n"
".machinemode zarch\n"
#endif
"stfle %[facilities]\n"
#ifndef __clang__
".machinemode pop\n"
#endif
: [facilities] "=Q" (facilities), [r0] "+r" (r0) :: "cc");
return is_bit_set((const char *)facilities, DFLTCC_FACILITY);
}
#define DFLTCC_FMT0 0
#define CVT_CRC32 0
#define CVT_ADLER32 1
#define HTT_FIXED 0
#define HTT_DYNAMIC 1
#define DFLTCC_SIZEOF_GDHT_V0 384
#define DFLTCC_SIZEOF_CMPR_XPND_V0 1536
static_assert(offsetof(struct dfltcc_param_v0, csb) == DFLTCC_SIZEOF_GDHT_V0, gdht_v0);
static_assert(sizeof(struct dfltcc_param_v0) == DFLTCC_SIZEOF_CMPR_XPND_V0, cmpr_xpnd_v0);
static inline z_const char *oesc_msg(char *buf, int oesc) {
if (oesc == 0x00)
return NULL; /* Successful completion */
else {
sprintf(buf, "Operation-Ending-Supplemental Code is 0x%.2X", oesc);
return buf;
}
}
/*
C wrapper for the DEFLATE CONVERSION CALL instruction.
*/
typedef enum {
DFLTCC_CC_OK = 0,
DFLTCC_CC_OP1_TOO_SHORT = 1,
DFLTCC_CC_OP2_TOO_SHORT = 2,
DFLTCC_CC_OP2_CORRUPT = 2,
DFLTCC_CC_AGAIN = 3,
} dfltcc_cc;
#define DFLTCC_QAF 0
#define DFLTCC_GDHT 1
#define DFLTCC_CMPR 2
#define DFLTCC_XPND 4
#define HBT_CIRCULAR (1 << 7)
#define DFLTCC_FN_MASK ((1 << 7) - 1)
/* Return lengths of high (starting at param->ho) and low (starting at 0) fragments of the circular history buffer. */
static inline void get_history_lengths(struct dfltcc_param_v0 *param, size_t *hl_high, size_t *hl_low) {
*hl_high = MIN(param->hl, HB_SIZE - param->ho);
*hl_low = param->hl - *hl_high;
}
/* Notify instrumentation about an upcoming read/write access to the circular history buffer. */
static inline void instrument_read_write_hist(struct dfltcc_param_v0 *param, void *hist) {
size_t hl_high, hl_low;
get_history_lengths(param, &hl_high, &hl_low);
instrument_read_write(hist + param->ho, hl_high);
instrument_read_write(hist, hl_low);
}
/* Notify MSan about a completed write to the circular history buffer. */
static inline void msan_unpoison_hist(struct dfltcc_param_v0 *param, void *hist) {
size_t hl_high, hl_low;
get_history_lengths(param, &hl_high, &hl_low);
__msan_unpoison(hist + param->ho, hl_high);
__msan_unpoison(hist, hl_low);
}
static inline dfltcc_cc dfltcc(int fn, void *param,
unsigned char **op1, size_t *len1,
z_const unsigned char **op2, size_t *len2, void *hist) {
unsigned char *t2 = op1 ? *op1 : NULL;
unsigned char *orig_t2 = t2;
size_t t3 = len1 ? *len1 : 0;
z_const unsigned char *t4 = op2 ? *op2 : NULL;
size_t t5 = len2 ? *len2 : 0;
Z_REGISTER int r0 __asm__("r0");
Z_REGISTER void *r1 __asm__("r1");
Z_REGISTER unsigned char *r2 __asm__("r2");
Z_REGISTER size_t r3 __asm__("r3");
Z_REGISTER z_const unsigned char *r4 __asm__("r4");
Z_REGISTER size_t r5 __asm__("r5");
int cc;
/* Insert pre-instrumentation for DFLTCC. */
switch (fn & DFLTCC_FN_MASK) {
case DFLTCC_QAF:
instrument_write(param, DFLTCC_SIZEOF_QAF);
break;
case DFLTCC_GDHT:
instrument_read_write(param, DFLTCC_SIZEOF_GDHT_V0);
instrument_read(t4, t5);
break;
case DFLTCC_CMPR:
case DFLTCC_XPND:
instrument_read_write(param, DFLTCC_SIZEOF_CMPR_XPND_V0);
instrument_read(t4, t5);
instrument_write(t2, t3);
instrument_read_write_hist(param, hist);
break;
}
r0 = fn; r1 = param; r2 = t2; r3 = t3; r4 = t4; r5 = t5;
__asm__ volatile(
#ifdef HAVE_SYS_SDT_H
STAP_PROBE_ASM(zlib, dfltcc_entry, STAP_PROBE_ASM_TEMPLATE(5))
#endif
".insn rrf,0xb9390000,%[r2],%[r4],%[hist],0\n"
#ifdef HAVE_SYS_SDT_H
STAP_PROBE_ASM(zlib, dfltcc_exit, STAP_PROBE_ASM_TEMPLATE(5))
#endif
"ipm %[cc]\n"
: [r2] "+r" (r2)
, [r3] "+r" (r3)
, [r4] "+r" (r4)
, [r5] "+r" (r5)
, [cc] "=r" (cc)
: [r0] "r" (r0)
, [r1] "r" (r1)
, [hist] "r" (hist)
#ifdef HAVE_SYS_SDT_H
, STAP_PROBE_ASM_OPERANDS(5, r2, r3, r4, r5, hist)
#endif
: "cc", "memory");
t2 = r2; t3 = r3; t4 = r4; t5 = r5;
/* Insert post-instrumentation for DFLTCC. */
switch (fn & DFLTCC_FN_MASK) {
case DFLTCC_QAF:
__msan_unpoison(param, DFLTCC_SIZEOF_QAF);
break;
case DFLTCC_GDHT:
__msan_unpoison(param, DFLTCC_SIZEOF_GDHT_V0);
break;
case DFLTCC_CMPR:
__msan_unpoison(param, DFLTCC_SIZEOF_CMPR_XPND_V0);
__msan_unpoison(orig_t2, t2 - orig_t2 + (((struct dfltcc_param_v0 *)param)->sbb == 0 ? 0 : 1));
msan_unpoison_hist(param, hist);
break;
case DFLTCC_XPND:
__msan_unpoison(param, DFLTCC_SIZEOF_CMPR_XPND_V0);
__msan_unpoison(orig_t2, t2 - orig_t2);
msan_unpoison_hist(param, hist);
break;
}
if (op1)
*op1 = t2;
if (len1)
*len1 = t3;
if (op2)
*op2 = t4;
if (len2)
*len2 = t5;
return (cc >> 28) & 3;
}
#define ALIGN_UP(p, size) (__typeof__(p))(((uintptr_t)(p) + ((size) - 1)) & ~((size) - 1))
static inline void dfltcc_reset_state(struct dfltcc_state *dfltcc_state) {
/* Initialize available functions */
if (is_dfltcc_enabled()) {
dfltcc(DFLTCC_QAF, &dfltcc_state->param, NULL, NULL, NULL, NULL, NULL);
memmove(&dfltcc_state->af, &dfltcc_state->param, sizeof(dfltcc_state->af));
} else
memset(&dfltcc_state->af, 0, sizeof(dfltcc_state->af));
/* Initialize parameter block */
memset(&dfltcc_state->param, 0, sizeof(dfltcc_state->param));
dfltcc_state->param.nt = 1;
dfltcc_state->param.ribm = DFLTCC_RIBM;
}
static inline void dfltcc_copy_state(void *dst, const void *src, uInt size, uInt extension_size) {
memcpy(dst, src, ALIGN_UP(size, 8) + extension_size);
}
static inline void append_history(struct dfltcc_param_v0 *param, unsigned char *history,
const unsigned char *buf, uInt count) {
size_t offset;
size_t n;
/* Do not use more than 32K */
if (count > HB_SIZE) {
buf += count - HB_SIZE;
count = HB_SIZE;
}
offset = (param->ho + param->hl) % HB_SIZE;
if (offset + count <= HB_SIZE)
/* Circular history buffer does not wrap - copy one chunk */
memcpy(history + offset, buf, count);
else {
/* Circular history buffer wraps - copy two chunks */
n = HB_SIZE - offset;
memcpy(history + offset, buf, n);
memcpy(history, buf + n, count - n);
}
n = param->hl + count;
if (n <= HB_SIZE)
/* All history fits into buffer - no need to discard anything */
param->hl = n;
else {
/* History does not fit into buffer - discard extra bytes */
param->ho = (param->ho + (n - HB_SIZE)) % HB_SIZE;
param->hl = HB_SIZE;
}
}
static inline void get_history(struct dfltcc_param_v0 *param, const unsigned char *history,
unsigned char *buf) {
size_t hl_high, hl_low;
get_history_lengths(param, &hl_high, &hl_low);
memcpy(buf, history + param->ho, hl_high);
memcpy(buf + hl_high, history, hl_low);
}

View File

@ -0,0 +1,191 @@
/* dfltcc_inflate.c - IBM Z DEFLATE CONVERSION CALL decompression support. */
/*
Use the following commands to build zlib-ng with DFLTCC decompression support:
$ ./configure --with-dfltcc-inflate
or
$ cmake -DWITH_DFLTCC_INFLATE=1 .
and then
$ make
*/
#include "zbuild.h"
#include "zutil.h"
#include "inftrees.h"
#include "inflate.h"
#include "dfltcc_inflate.h"
#include "dfltcc_detail.h"
void Z_INTERNAL PREFIX(dfltcc_reset_inflate_state)(PREFIX3(streamp) strm) {
struct inflate_state *state = (struct inflate_state *)strm->state;
dfltcc_reset_state(&state->arch.common);
}
int Z_INTERNAL PREFIX(dfltcc_can_inflate)(PREFIX3(streamp) strm) {
struct inflate_state *state = (struct inflate_state *)strm->state;
struct dfltcc_state *dfltcc_state = &state->arch.common;
/* Unsupported hardware */
return is_bit_set(dfltcc_state->af.fns, DFLTCC_XPND) && is_bit_set(dfltcc_state->af.fmts, DFLTCC_FMT0);
}
static inline dfltcc_cc dfltcc_xpnd(PREFIX3(streamp) strm) {
struct inflate_state *state = (struct inflate_state *)strm->state;
struct dfltcc_param_v0 *param = &state->arch.common.param;
size_t avail_in = strm->avail_in;
size_t avail_out = strm->avail_out;
dfltcc_cc cc;
cc = dfltcc(DFLTCC_XPND | HBT_CIRCULAR,
param, &strm->next_out, &avail_out,
&strm->next_in, &avail_in, state->window);
strm->avail_in = avail_in;
strm->avail_out = avail_out;
return cc;
}
dfltcc_inflate_action Z_INTERNAL PREFIX(dfltcc_inflate)(PREFIX3(streamp) strm, int flush, int *ret) {
struct inflate_state *state = (struct inflate_state *)strm->state;
struct dfltcc_state *dfltcc_state = &state->arch.common;
struct dfltcc_param_v0 *param = &dfltcc_state->param;
dfltcc_cc cc;
if (flush == Z_BLOCK || flush == Z_TREES) {
/* DFLTCC does not support stopping on block boundaries */
if (PREFIX(dfltcc_inflate_disable)(strm)) {
*ret = Z_STREAM_ERROR;
return DFLTCC_INFLATE_BREAK;
} else
return DFLTCC_INFLATE_SOFTWARE;
}
if (state->last) {
if (state->bits != 0) {
strm->next_in++;
strm->avail_in--;
state->bits = 0;
}
state->mode = CHECK;
return DFLTCC_INFLATE_CONTINUE;
}
if (strm->avail_in == 0 && !param->cf)
return DFLTCC_INFLATE_BREAK;
/* if window not in use yet, initialize */
if (state->wsize == 0)
state->wsize = 1U << state->wbits;
/* Translate stream to parameter block */
param->cvt = ((state->wrap & 4) && state->flags) ? CVT_CRC32 : CVT_ADLER32;
param->sbb = state->bits;
if (param->hl)
param->nt = 0; /* Honor history for the first block */
if (state->wrap & 4)
param->cv = state->flags ? ZSWAP32(state->check) : state->check;
/* Inflate */
do {
cc = dfltcc_xpnd(strm);
} while (cc == DFLTCC_CC_AGAIN);
/* Translate parameter block to stream */
strm->msg = oesc_msg(dfltcc_state->msg, param->oesc);
state->last = cc == DFLTCC_CC_OK;
state->bits = param->sbb;
if (state->wrap & 4)
strm->adler = state->check = state->flags ? ZSWAP32(param->cv) : param->cv;
if (cc == DFLTCC_CC_OP2_CORRUPT && param->oesc != 0) {
/* Report an error if stream is corrupted */
state->mode = BAD;
return DFLTCC_INFLATE_CONTINUE;
}
state->mode = TYPEDO;
/* Break if operands are exhausted, otherwise continue looping */
return (cc == DFLTCC_CC_OP1_TOO_SHORT || cc == DFLTCC_CC_OP2_TOO_SHORT) ?
DFLTCC_INFLATE_BREAK : DFLTCC_INFLATE_CONTINUE;
}
int Z_INTERNAL PREFIX(dfltcc_was_inflate_used)(PREFIX3(streamp) strm) {
struct inflate_state *state = (struct inflate_state *)strm->state;
return !state->arch.common.param.nt;
}
/*
Rotates a circular buffer.
The implementation is based on https://cplusplus.com/reference/algorithm/rotate/
*/
static void rotate(unsigned char *start, unsigned char *pivot, unsigned char *end) {
unsigned char *p = pivot;
unsigned char tmp;
while (p != start) {
tmp = *start;
*start = *p;
*p = tmp;
start++;
p++;
if (p == end)
p = pivot;
else if (start == pivot)
pivot = p;
}
}
int Z_INTERNAL PREFIX(dfltcc_inflate_disable)(PREFIX3(streamp) strm) {
struct inflate_state *state = (struct inflate_state *)strm->state;
struct dfltcc_state *dfltcc_state = &state->arch.common;
struct dfltcc_param_v0 *param = &dfltcc_state->param;
if (!PREFIX(dfltcc_can_inflate)(strm))
return 0;
if (PREFIX(dfltcc_was_inflate_used)(strm))
/* DFLTCC has already decompressed some data. Since there is not
* enough information to resume decompression in software, the call
* must fail.
*/
return 1;
/* DFLTCC was not used yet - decompress in software */
memset(&dfltcc_state->af, 0, sizeof(dfltcc_state->af));
/* Convert the window from the hardware to the software format */
rotate(state->window, state->window + param->ho, state->window + HB_SIZE);
state->whave = state->wnext = MIN(param->hl, state->wsize);
return 0;
}
/*
Preloading history.
*/
int Z_INTERNAL PREFIX(dfltcc_inflate_set_dictionary)(PREFIX3(streamp) strm,
const unsigned char *dictionary, uInt dict_length) {
struct inflate_state *state = (struct inflate_state *)strm->state;
struct dfltcc_param_v0 *param = &state->arch.common.param;
/* if window not in use yet, initialize */
if (state->wsize == 0)
state->wsize = 1U << state->wbits;
append_history(param, state->window, dictionary, dict_length);
state->havedict = 1;
return Z_OK;
}
int Z_INTERNAL PREFIX(dfltcc_inflate_get_dictionary)(PREFIX3(streamp) strm,
unsigned char *dictionary, uInt *dict_length) {
struct inflate_state *state = (struct inflate_state *)strm->state;
struct dfltcc_param_v0 *param = &state->arch.common.param;
if (dictionary && state->window)
get_history(param, state->window, dictionary);
if (dict_length)
*dict_length = param->hl;
return Z_OK;
}

View File

@ -0,0 +1,67 @@
#ifndef DFLTCC_INFLATE_H
#define DFLTCC_INFLATE_H
#include "dfltcc_common.h"
void Z_INTERNAL PREFIX(dfltcc_reset_inflate_state)(PREFIX3(streamp) strm);
int Z_INTERNAL PREFIX(dfltcc_can_inflate)(PREFIX3(streamp) strm);
typedef enum {
DFLTCC_INFLATE_CONTINUE,
DFLTCC_INFLATE_BREAK,
DFLTCC_INFLATE_SOFTWARE,
} dfltcc_inflate_action;
dfltcc_inflate_action Z_INTERNAL PREFIX(dfltcc_inflate)(PREFIX3(streamp) strm, int flush, int *ret);
int Z_INTERNAL PREFIX(dfltcc_was_inflate_used)(PREFIX3(streamp) strm);
int Z_INTERNAL PREFIX(dfltcc_inflate_disable)(PREFIX3(streamp) strm);
int Z_INTERNAL PREFIX(dfltcc_inflate_set_dictionary)(PREFIX3(streamp) strm,
const unsigned char *dictionary, uInt dict_length);
int Z_INTERNAL PREFIX(dfltcc_inflate_get_dictionary)(PREFIX3(streamp) strm,
unsigned char *dictionary, uInt* dict_length);
#define INFLATE_RESET_KEEP_HOOK PREFIX(dfltcc_reset_inflate_state)
#define INFLATE_PRIME_HOOK(strm, bits, value) \
do { if (PREFIX(dfltcc_inflate_disable)((strm))) return Z_STREAM_ERROR; } while (0)
#define INFLATE_TYPEDO_HOOK(strm, flush) \
if (PREFIX(dfltcc_can_inflate)((strm))) { \
dfltcc_inflate_action action; \
\
RESTORE(); \
action = PREFIX(dfltcc_inflate)((strm), (flush), &ret); \
LOAD(); \
if (action == DFLTCC_INFLATE_CONTINUE) \
break; \
else if (action == DFLTCC_INFLATE_BREAK) \
goto inf_leave; \
}
#define INFLATE_NEED_CHECKSUM(strm) (!PREFIX(dfltcc_can_inflate)((strm)))
#define INFLATE_NEED_UPDATEWINDOW(strm) (!PREFIX(dfltcc_can_inflate)((strm)))
#define INFLATE_MARK_HOOK(strm) \
do { \
if (PREFIX(dfltcc_was_inflate_used)((strm))) return -(1L << 16); \
} while (0)
#define INFLATE_SYNC_POINT_HOOK(strm) \
do { \
if (PREFIX(dfltcc_was_inflate_used)((strm))) return Z_STREAM_ERROR; \
} while (0)
#define INFLATE_SET_DICTIONARY_HOOK(strm, dict, dict_len) \
do { \
if (PREFIX(dfltcc_can_inflate)((strm))) \
return PREFIX(dfltcc_inflate_set_dictionary)((strm), (dict), (dict_len)); \
} while (0)
#define INFLATE_GET_DICTIONARY_HOOK(strm, dict, dict_len) \
do { \
if (PREFIX(dfltcc_can_inflate)((strm))) \
return PREFIX(dfltcc_inflate_get_dictionary)((strm), (dict), (dict_len)); \
} while (0)
#define INFLATE_ADJUST_WINDOW_SIZE(n) MAX(n, HB_SIZE)
#endif

View File

@ -0,0 +1,14 @@
#include "zbuild.h"
#include "s390_features.h"
#ifdef HAVE_SYS_AUXV_H
# include <sys/auxv.h>
#endif
#ifndef HWCAP_S390_VXRS
#define HWCAP_S390_VXRS HWCAP_S390_VX
#endif
void Z_INTERNAL s390_check_features(struct s390_cpu_features *features) {
features->has_vx = getauxval(AT_HWCAP) & HWCAP_S390_VXRS;
}

Some files were not shown because too many files have changed in this diff Show More