Add support for WebP in DecodeImage

WebP offers both lossless and lossy compression that can be superior to JPEG or
PNG. Requests for WebP support directly in TF have come up before (e.g.,
https://github.com/tensorflow/tensorflow/issues/18250) but has been kept out of
tree in the TFIO (https://github.com/tensorflow/io/pull/43). Unfortunately, the
TFIO implementation hardcodes channels_ = 4, and doesn't have the same
ShapeInferenceFn style of support that DecodeImage does.

Let's just bring native WebP decode support into TF for tf.data
directly. Animation is supported, and similar to decode_gif, decode_webp
produces a tensor of [num_frames, height, width, channels], even for a still
frame (num_frames = 1). To get 3D tensors instead, use decode_image with
expand_animations = False.

Note: libwebp (and maybe the WebP format?) only supports 4-channel animation, so
channels will always be 4 for animations.
PiperOrigin-RevId: 736966060
This commit is contained in:
A. Unique TensorFlower 2025-03-14 13:22:37 -07:00 committed by TensorFlower Gardener
parent c7221f9f46
commit 5aeb2e6531
28 changed files with 740 additions and 64 deletions

View File

@ -2035,6 +2035,7 @@ filegroup(
"//tensorflow/core/lib/gif/testdata:gif_testdata",
# BMP data
"//tensorflow/core/lib/bmp:bmp_testdata",
"//tensorflow/core/lib/webp:testdata",
],
visibility = ["//visibility:public"],
)

View File

@ -28,25 +28,27 @@ END
attr {
name: "expand_animations"
description: <<END
Controls the output shape of the returned op. If True, the returned op will
produce a 3-D tensor for PNG, JPEG, and BMP files; and a 4-D tensor for all
GIFs, whether animated or not. If, False, the returned op will produce a 3-D
tensor for all file types and will truncate animated GIFs to the first frame.
Controls the output shape of the returned op. If True, the returned op
will produce a 3-D tensor for PNG, JPEG, and BMP files; and a 4-D
tensor for all GIFs and WebP images, whether animated or not. If,
False, the returned op will produce a 3-D tensor for all file types
and will truncate animated images to the first frame.
END
}
summary: "Function for decode_bmp, decode_gif, decode_jpeg, and decode_png."
summary: "Function for decode_bmp, decode_gif, decode_jpeg, decode_webp, and decode_png."
description: <<END
Detects whether an image is a BMP, GIF, JPEG, or PNG, and performs the
Detects whether an image is a BMP, GIF, JPEG, WebP, or PNG, and performs the
appropriate operation to convert the input bytes string into a Tensor of type
dtype.
*NOTE*: decode_gif returns a 4-D array [num_frames, height, width, 3], as
opposed to decode_bmp, decode_jpeg and decode_png, which return 3-D arrays
[height, width, num_channels]. Make sure to take this into account when
constructing your graph if you are intermixing GIF files with BMP, JPEG, and/or
PNG files. Alternately, set the expand_animations argument of this function to
False, in which case the op will return 3-dimensional tensors and will truncate
animated GIF files to the first frame.
*NOTE*: decode_gif and decode_webp return a 4-D
array [num_frames, height, width, 3], as opposed to decode_bmp,
decode_jpeg, and decode_png, which always return 3-D arrays [height,
width, num_channels]. Make sure to take this into account when
constructing your graph if you are intermixing animated files with
BMP, JPEG, and/or PNG files. Alternately, set the expand_animations
argument of this function to False, in which case the op will return
3-dimensional tensors and will truncate animations to the first frame.
*NOTE*: If the first frame of an animated GIF does not occupy the entire
canvas (maximum frame width x maximum frame height), then it fills the

View File

@ -0,0 +1,35 @@
op {
graph_op_name: "DecodeWebP"
in_arg {
name: "contents"
description: <<END
0-D. The WebP-encoded image.
END
}
out_arg {
name: "image"
description: <<END
4-D with shape `[num_frames, height, width, channels]`.
END
}
attr {
name: "channels"
description: <<END
Number of color channels for the decoded image.
END
}
summary: "Decode a WebP-encoded image to a uint8 tensor."
description: <<END
The attr `channels` indicates the desired number of color channels for the
decoded image.
Accepted values are:
* 0: Use the number of channels in the WebP-encoded image.
* 3: output an RGB image.
* 4: output an RGBA image.
The number of channels must currently match that of the underlying file.
For WebP animations, only 4-channel RGBA is supported.
END
}

View File

@ -0,0 +1,4 @@
op {
graph_op_name: "DecodeWebP"
visibility: HIDDEN
}

View File

@ -7173,6 +7173,7 @@ cc_library(
"//tensorflow/core/framework:tensor_shape_proto_cc",
"//tensorflow/core/framework:types_proto_cc",
"//tensorflow/core/lib/png:png_io",
"//tensorflow/core/lib/webp:webp_io",
"//tensorflow/core/platform:strong_hash",
"//tensorflow/core/platform:types",
"//tensorflow/core/protobuf:autotuning_proto_cc",

View File

@ -132,6 +132,7 @@ IMAGE_DEPS = [
"//tensorflow/core:lib",
"//tensorflow/core:lib_internal",
"//tensorflow/core/lib/png:png_io",
"//tensorflow/core/lib/webp:webp_io",
"//tensorflow/core:protos_all_cc",
"//tensorflow/core/framework:bounds_check",
"//tensorflow/core/kernels:eigen_helpers",
@ -461,6 +462,7 @@ cc_library(
"//tensorflow/core/framework:types_proto_cc",
"//tensorflow/core/lib/core:status",
"//tensorflow/core/lib/png:png_io",
"//tensorflow/core/lib/webp:webp_io",
"//tensorflow/core/platform:byte_order",
"//tensorflow/core/platform:errors",
"@com_google_absl//absl/strings",

View File

@ -18,8 +18,10 @@ limitations under the License.
#include <cmath>
#include <cstdint>
#include <cstdlib>
#include <functional>
#include <limits>
#include <memory>
#include <string>
#define EIGEN_USE_THREADS
@ -38,6 +40,7 @@ limitations under the License.
#include "tensorflow/core/lib/gtl/cleanup.h"
#include "tensorflow/core/lib/jpeg/jpeg_mem.h"
#include "tensorflow/core/lib/png/png_io.h"
#include "tensorflow/core/lib/webp/webp_io.h"
#include "tensorflow/core/platform/byte_order.h"
#include "tensorflow/core/platform/errors.h"
#include "tensorflow/core/platform/logging.h"
@ -57,6 +60,9 @@ static const char kGifMagicBytes[] = "\x47\x49\x46\x38";
static const char kBmpMagicBytes[] = "\x42\x4d";
// The 4th byte of JPEG is '\xe0' or '\xe1', so check just the first three.
static const char kJpegMagicBytes[] = "\xff\xd8\xff";
// WebP is RIFF????WEBP
static const char kRiffMagicBytes[] = "\x52\x49\x46\x46";
static const char kWebpMagicBytes[] = "\x57\x45\x42\x50";
enum FileFormat {
kUnknownFormat = 0,
@ -64,6 +70,7 @@ enum FileFormat {
kJpgFormat = 2,
kGifFormat = 3,
kBmpFormat = 4,
kWebpFormat = 5,
};
// Classify the contents of a file based on starting bytes (the magic number).
@ -72,12 +79,19 @@ FileFormat ClassifyFileFormat(absl::string_view data) {
if (absl::StartsWith(data, kPngMagicBytes)) return kPngFormat;
if (absl::StartsWith(data, kGifMagicBytes)) return kGifFormat;
if (absl::StartsWith(data, kBmpMagicBytes)) return kBmpFormat;
if (absl::StartsWith(data, kRiffMagicBytes) && data.size() > 12) {
// Move forward by RIFF plus 4 size bytes.
data.remove_prefix(8);
if (absl::StartsWith(data, kWebpMagicBytes)) return kWebpFormat;
}
return kUnknownFormat;
}
// Decode an image. Supported image formats are JPEG, PNG, GIF and BMP. This is
// a newer version of `DecodeImageOp` for enabling image data parsing to take
// place in kernels only, reducing security vulnerabilities and redundancy.
// Decode an image. Supported image formats are JPEG, PNG, GIF, BMP, and WebP.
// This is a newer version of `DecodeImageOp` for enabling image data parsing to
// take place in kernels only, reducing security vulnerabilities and redundancy.
class DecodeImageV2Op : public OpKernel {
public:
explicit DecodeImageV2Op(OpKernelConstruction* context) : OpKernel(context) {
@ -93,7 +107,8 @@ class DecodeImageV2Op : public OpKernel {
OP_REQUIRES(context,
op_type_ == "DecodeJpeg" || op_type_ == "DecodeAndCropJpeg" ||
op_type_ == "DecodePng" || op_type_ == "DecodeGif" ||
op_type_ == "DecodeBmp" || op_type_ == "DecodeImage",
op_type_ == "DecodeBmp" || op_type_ == "DecodeWebP" ||
op_type_ == "DecodeImage",
errors::InvalidArgument("Bad op type ", op_type_));
// Get attributes from `DecodeJpeg` and `DecodeAndCropJpeg` op
@ -218,10 +233,14 @@ class DecodeImageV2Op : public OpKernel {
case kBmpFormat:
DecodeBmpV2(context, input);
break;
case kWebpFormat:
DecodeWebP(context, input);
break;
case kUnknownFormat:
OP_REQUIRES(context, false,
errors::InvalidArgument("Unknown image file format. One of "
"JPEG, PNG, GIF, BMP required."));
OP_REQUIRES(
context, false,
errors::InvalidArgument("Unknown image file format. One of "
"JPEG, PNG, GIF, BMP, WebP required."));
break;
}
}
@ -666,6 +685,93 @@ class DecodeImageV2Op : public OpKernel {
}
}
void DecodeWebP(OpKernelContext* context, absl::string_view input) {
OP_REQUIRES(context, channels_ == 0 || channels_ == 3 || channels_ == 4,
errors::InvalidArgument("WebP only supports 3 or 4 channels"));
OP_REQUIRES(context, data_type_ == DataType::DT_UINT8,
errors::InvalidArgument("WebP only supports uint8 for dtype"));
int width, height, channels;
bool has_animation;
OP_REQUIRES(context,
webp::DecodeWebPHeader(input, &width, &height, &channels,
&has_animation),
errors::InvalidArgument("Failed to decode WebP header."));
// We either wanted auto-detection of channels or that they match the input
// image.
OP_REQUIRES(context, channels_ == 0 || channels_ == channels,
errors::InvalidArgument(
"Number of channels requested does not match input"));
if (!has_animation) {
Tensor* output = nullptr;
// If this is DecodeImage w/ expand_animations_ = False, return a 3D
// tensor. Otherwise, return a 4D tensor with num_frames = 1.
if (expand_animations_) {
OP_REQUIRES_OK(
context,
context->allocate_output(
0, TensorShape({1, height, width, channels}), &output));
} else {
OP_REQUIRES_OK(context,
context->allocate_output(
0, TensorShape({height, width, channels}), &output));
}
// Actually decode the image into the output buffer.
OP_REQUIRES(context,
webp::DecodeWebPImage(input, output->flat<uint8>().data(),
width, height, channels),
errors::InvalidArgument("Failed to decode WebP image."));
// Note: Here we could also perform casting to other dtypes, but users can
// also just convert in their own code.
return;
}
// Handle the animation case.
OP_REQUIRES(
context, channels_ == 0 || channels_ == 4,
errors::InvalidArgument("WebP Animation must be 4 channel RGBA"));
Tensor* output = nullptr;
std::string error_string;
uint8_t* buffer = webp::DecodeWebPAnimation(
input,
[&](int num_frames, int width, int height, int channls) -> uint8_t* {
// If expand_animations is false, we want {height, width, channels}
// otherwise, we want {num_frames, height, width, channels} even if
// it's a single frame.
absl::Status status;
if (expand_animations_) {
status = context->allocate_output(
0, TensorShape({num_frames, height, width, channels}), &output);
} else {
status = context->allocate_output(
0, TensorShape({height, width, channels}), &output);
}
if (!status.ok()) {
VLOG(1) << status;
context->SetStatus(status);
return nullptr;
}
return output->flat<uint8>().data();
},
&error_string, expand_animations_);
OP_REQUIRES(context, buffer != nullptr,
errors::InvalidArgument("Failed to decode WebP Animation: ",
error_string));
// All done, output should have been filled in by DecodeWebPAnimation.
}
private:
void DecodeBMP(const uint8* input, const int row_size, uint8* const output,
const int width, const int height, const int output_channels,
@ -686,6 +792,7 @@ REGISTER_KERNEL_BUILDER(Name("DecodeAndCropJpeg").Device(DEVICE_CPU),
REGISTER_KERNEL_BUILDER(Name("DecodeImage").Device(DEVICE_CPU),
DecodeImageV2Op);
REGISTER_KERNEL_BUILDER(Name("DecodeBmp").Device(DEVICE_CPU), DecodeImageV2Op);
REGISTER_KERNEL_BUILDER(Name("DecodeWebP").Device(DEVICE_CPU), DecodeImageV2Op);
void DecodeImageV2Op::DecodeBMP(const uint8* input, const int row_size,
uint8* const output, const int width,

View File

@ -0,0 +1,36 @@
load("//tensorflow:tensorflow.bzl", "if_google")
load(
"//tensorflow/core/platform:rules_cc.bzl",
"cc_library",
)
package(
# copybara:uncomment default_applicable_licenses = ["//tensorflow:license"],
default_visibility = [
"//tensorflow:__subpackages__",
],
licenses = ["notice"],
)
cc_library(
name = "webp_io",
srcs = ["webp_io.cc"],
hdrs = ["webp_io.h"],
features = ["-layering_check"],
deps = [
"//tensorflow/core/platform:types",
"@com_google_absl//absl/base",
"@com_google_absl//absl/log",
"@com_google_absl//absl/log:check",
"@com_google_absl//absl/strings:string_view",
"@libwebp//:webp",
] + if_google([
"@libwebp//:webp_demux",
]),
)
alias(
name = "testdata",
actual = "//tensorflow/core/lib/webp/testdata:webp_testdata",
visibility = ["//tensorflow/core:__pkg__"],
)

18
tensorflow/core/lib/webp/testdata/BUILD vendored Normal file
View File

@ -0,0 +1,18 @@
# Description:
# WebP test data.
load("//tensorflow:tensorflow.default.bzl", "filegroup")
package(
# copybara:uncomment default_applicable_licenses = ["//tensorflow:license"],
licenses = ["notice"],
)
filegroup(
name = "webp_testdata",
srcs = glob(["*.webp"]),
visibility = [
"//tensorflow/core:__pkg__",
"//tensorflow/core/lib/webp:__pkg__",
],
)

Binary file not shown.

After

Width:  |  Height:  |  Size: 5.9 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 9.1 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 306 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 19 KiB

View File

@ -0,0 +1,142 @@
/* Copyright 2025 The TensorFlow Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
==============================================================================*/
// Functions to read images in WebP format.
#include "tensorflow/core/lib/webp/webp_io.h"
#include <cstddef>
#include <cstdint>
#include <cstring>
#include <functional>
#include <string>
#include "absl/cleanup/cleanup.h"
#include "absl/strings/str_cat.h"
#include "absl/strings/string_view.h"
#include "third_party/libwebp/src/webp/decode.h"
#include "third_party/libwebp/src/webp/demux.h"
#include "third_party/libwebp/src/webp/mux_types.h"
namespace tensorflow {
namespace webp {
bool DecodeWebPHeader(absl::string_view webp_string, int* width, int* height,
int* channels, bool* has_animation) {
const uint8_t* input_data =
reinterpret_cast<const uint8_t*>(webp_string.data());
const size_t input_size = webp_string.size();
WebPBitstreamFeatures features;
if (WebPGetFeatures(input_data, input_size, &features) != VP8_STATUS_OK) {
return false;
}
*width = features.width;
*height = features.height;
*channels = features.has_alpha ? 4 : 3;
*has_animation = features.has_animation;
return true;
}
bool DecodeWebPImage(absl::string_view webp_string, uint8_t* output, int width,
int height, int channels) {
const uint8_t* input_data =
reinterpret_cast<const uint8_t*>(webp_string.data());
const size_t input_size = webp_string.size();
const int row_stride = width * channels * sizeof(uint8_t);
const size_t output_size = height * row_stride;
switch (channels) {
case 3:
return ::WebPDecodeRGBInto(input_data, input_size, output, output_size,
row_stride) != nullptr;
case 4:
return ::WebPDecodeRGBAInto(input_data, input_size, output, output_size,
row_stride) != nullptr;
default:
// Invalid number of channels.
return false;
}
}
uint8_t* DecodeWebPAnimation(
absl::string_view webp_string,
const std::function<uint8_t*(int, int, int, int)>& allocate_output,
std::string* error_string, bool expand_animations) {
WebPData webp_data = {reinterpret_cast<const uint8_t*>(webp_string.data()),
webp_string.size()};
// Use the default decoder options, which is single-threaded RGBA decode.
WebPAnimDecoder* decoder = WebPAnimDecoderNew(&webp_data, nullptr);
if (decoder == nullptr) {
*error_string = "failed to decode WebP Animation";
return nullptr;
}
const auto cleanup =
absl::MakeCleanup([decoder] { WebPAnimDecoderDelete(decoder); });
WebPAnimInfo info;
if (!WebPAnimDecoderGetInfo(decoder, &info)) {
*error_string = "failed to get WebP Animation Info";
return nullptr;
}
const uint32_t width = info.canvas_width;
const uint32_t height = info.canvas_height;
// If we only want the first frame, expand_animations will be false.
const uint32_t num_frames = (expand_animations) ? info.frame_count : 1;
const uint32_t num_channels = 4; /* libwebp only supports RGBA animations */
const size_t bytes_per_frame = width * height * num_channels;
uint8_t* output = allocate_output(num_frames, width, height, num_channels);
if (output == nullptr) {
*error_string = "failed to allocate output for WebP Animation";
return nullptr;
}
size_t frame = 0;
while (WebPAnimDecoderHasMoreFrames(decoder)) {
uint8_t* buffer;
int timestamp_dummy;
if (!WebPAnimDecoderGetNext(decoder, &buffer, &timestamp_dummy)) {
*error_string = absl::StrCat("failed to decode frame: ", frame);
return nullptr;
}
// Copy buffer (owned by decoder) into our output.
uint8_t* frame_output = output + frame * bytes_per_frame;
memcpy(frame_output, buffer, bytes_per_frame);
// Move on to the next frame.
frame++;
// Exit early, if we only want to grab the first frame.
if (!expand_animations) break;
}
// We should have gotten all the frames in num_frames.
if (frame != num_frames) {
*error_string =
absl::StrCat("only read ", frame, " of ", num_frames, " frames");
return nullptr;
}
return output;
}
} // namespace webp
} // namespace tensorflow

View File

@ -0,0 +1,74 @@
/* Copyright 2025 The TensorFlow Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
==============================================================================*/
// Functions to read images in WebP format.
//
// First call DecodeWebPHeader with an input WebP image as a string_view, to get
// the width, height, channels, and whether or not the WebP file is an animation
// (animation decoding is currently not supported beyond the first frame). Then
// call DecodeWebP with an appropriately sized output buffer to hold the decoded
// images as either RGB or RGBA (based on channels)
//
//
// int width, height, channels;
// bool has_animation;
// DecodeWebPHeader(input_bytes, &width, &height, &channels, &has_animation);
//
// if (has_animation) { DecideIfYouWantFrame0(); }
//
// uint8_t* output_bytes = new uint8_t[width * height * channels];
// DecodeWebPImage(input_bytes, output_bytes, width, height, channels);
//
#ifndef TENSORFLOW_CORE_LIB_WEBP_WEBP_IO_H_
#define TENSORFLOW_CORE_LIB_WEBP_WEBP_IO_H_
#include <functional>
#include <string>
#include "absl/strings/string_view.h"
#include "tensorflow/core/platform/types.h"
namespace tensorflow {
namespace webp {
// Given an input encoded in WebP as `webp_string`, extract the width, height,
// number of channels, and whether or not the file is an animation. Return false
// on failure or true for success.
bool DecodeWebPHeader(absl::string_view webp_string, int* width, int* height,
int* channels, bool* has_animation);
// Decode the first image from `webp_string` into the output buffer
// `output`. `output` is assumed to be width * height * channels *
// sizeof(uint8_t) or larger.
bool DecodeWebPImage(absl::string_view webp_string, uint8_t* output, int width,
int height, int channels);
// Decode a sequence of images in the animation from `webp_string` into a
// dynamically allocated output buffer via `allocate_output`. `allocate_output`
// takes the arguments as (num_frames, width, height, channels). The channels is
// (currently) always 4 (RGBA).
//
// Note: Decoding a WebP animation, even to get the number of frames, reads the
// entire image into memory, hence this callback mechanism.
uint8_t* DecodeWebPAnimation(
absl::string_view webp_string,
const std::function<uint8_t*(int, int, int, int)>& allocate_output,
std::string* error_string, bool expand_animations);
} // namespace webp
} // namespace tensorflow
#endif // TENSORFLOW_CORE_LIB_WEBP_WEBP_IO_H_

View File

@ -15,9 +15,11 @@ limitations under the License.
#include <algorithm>
#include "xla/tsl/platform/statusor.h"
#include "tensorflow/core/framework/common_shape_fns.h"
#include "tensorflow/core/framework/op.h"
#include "tensorflow/core/framework/shape_inference.h"
#include "tensorflow/core/platform/errors.h"
namespace tensorflow {
@ -70,22 +72,26 @@ absl::Status ResizeShapeFn(InferenceContext* c) {
c->Dim(input, 3));
}
absl::Status DecodeImageShapeFn(InferenceContext* c) {
ShapeHandle unused;
TF_RETURN_IF_ERROR(c->WithRank(c->input(0), 0, &unused));
DimensionHandle channels_dim;
absl::StatusOr<DimensionHandle> GetChannelsDim(InferenceContext* c) {
int32_t channels;
TF_RETURN_IF_ERROR(c->GetAttr("channels", &channels));
if (channels == 0) {
channels_dim = c->UnknownDim();
} else {
if (channels < 0) {
return errors::InvalidArgument("channels must be non-negative, got ",
channels);
}
channels_dim = c->MakeDim(channels);
return c->UnknownDim();
}
if (channels < 0) {
return errors::InvalidArgument("channels must be non-negative, got ",
channels);
}
return c->MakeDim(channels);
}
absl::Status DecodeImageShapeFn(InferenceContext* c) {
ShapeHandle unused;
TF_RETURN_IF_ERROR(c->WithRank(c->input(0), 0, &unused));
TF_ASSIGN_OR_RETURN(DimensionHandle channels_dim, GetChannelsDim(c));
c->set_output(0, c->MakeShape({InferenceContext::kUnknownDim,
InferenceContext::kUnknownDim, channels_dim}));
return absl::OkStatus();
@ -93,36 +99,26 @@ absl::Status DecodeImageShapeFn(InferenceContext* c) {
absl::Status DecodeImageV2ShapeFn(InferenceContext* c) {
ShapeHandle unused;
int32_t channels;
bool expand_animations;
DimensionHandle channels_dim;
TF_RETURN_IF_ERROR(c->WithRank(c->input(0), 0, &unused));
TF_RETURN_IF_ERROR(c->GetAttr("channels", &channels));
TF_ASSIGN_OR_RETURN(DimensionHandle channels_dim, GetChannelsDim(c));
TF_RETURN_IF_ERROR(c->GetAttr("expand_animations", &expand_animations));
if (channels == 0) {
channels_dim = c->UnknownDim();
} else {
if (channels < 0) {
return errors::InvalidArgument("channels must be non-negative, got ",
channels);
}
channels_dim = c->MakeDim(channels);
}
// `expand_animations` set to true will return 4-D shapes for GIF. 3-D shapes
// will be returned for jpg, png, and bmp. `expand_animations` set to false
// will always return 3-D shapes for all (jpg, png, bmp, gif).
// `expand_animations` set to true will return 4-D shapes for GIF and
// WebP. 3-D shapes will be returned for jpg, png, and
// bmp. `expand_animations` set to false will always return 3-D shapes for all
// (jpg, png, bmp, gif, webp). So we *may* have a mix of 3D and 4D
// shapes. Just return unknown.
if (expand_animations) {
c->set_output(0, c->UnknownShape());
return absl::OkStatus();
} else {
c->set_output(0,
c->MakeShape({InferenceContext::kUnknownDim,
InferenceContext::kUnknownDim, channels_dim}));
return absl::OkStatus();
}
// expand_animations is False. We'll have a 3D tensor.
c->set_output(0, c->MakeShape({InferenceContext::kUnknownDim,
InferenceContext::kUnknownDim, channels_dim}));
return absl::OkStatus();
}
absl::Status EncodeImageShapeFn(InferenceContext* c) {
@ -640,6 +636,7 @@ REGISTER_OP("DecodeBmp")
REGISTER_OP("DecodeGif")
.Input("contents: string")
.Output("image: uint8")
// Always a 4D tensor, and no Alpha support, so channels=3.
.SetShapeFn([](InferenceContext* c) {
ShapeHandle unused;
TF_RETURN_IF_ERROR(c->WithRank(c->input(0), 0, &unused));
@ -649,6 +646,26 @@ REGISTER_OP("DecodeGif")
return absl::OkStatus();
});
// --------------------------------------------------------------------------
REGISTER_OP("DecodeWebP")
.Input("contents: string")
.Attr("channels: int = 0")
// Add this dtype arg for now, even if we don't yet support conversion.
.Attr("dtype: {uint8} = DT_UINT8")
.Output("image: dtype")
.SetShapeFn([](InferenceContext* c) {
ShapeHandle unused;
TF_RETURN_IF_ERROR(c->WithRank(c->input(0), 0, &unused));
TF_ASSIGN_OR_RETURN(DimensionHandle channels_dim, GetChannelsDim(c));
// Always a 4D tensor, but channels is dynamic.
c->set_output(
0, c->MakeShape({InferenceContext::kUnknownDim,
InferenceContext::kUnknownDim,
InferenceContext::kUnknownDim, channels_dim}));
return absl::OkStatus();
});
// --------------------------------------------------------------------------
REGISTER_OP("RGBToHSV")
.Input("images: T")

View File

@ -3223,6 +3223,11 @@ decode_png = tf_export(
'image.decode_png',
v1=['io.decode_png', 'image.decode_png'])(
dispatch.add_dispatch_support(gen_image_ops.decode_png))
decode_webp = tf_export(
'io.decode_webp',
'image.decode_webp',
v1=['io.decode_webp', 'image.decode_webp'],
)(dispatch.add_dispatch_support(gen_image_ops.decode_web_p))
encode_jpeg = tf_export(
'io.encode_jpeg',
@ -3278,17 +3283,18 @@ def decode_image(contents,
expand_animations=True):
"""Function for `decode_bmp`, `decode_gif`, `decode_jpeg`, and `decode_png`.
Detects whether an image is a BMP, GIF, JPEG, or PNG, and performs the
Detects whether an image is a BMP, GIF, JPEG, WebP or PNG, and performs the
appropriate operation to convert the input bytes `string` into a `Tensor`
of type `dtype`.
Note: `decode_gif` returns a 4-D array `[num_frames, height, width, 3]`, as
opposed to `decode_bmp`, `decode_jpeg` and `decode_png`, which return 3-D
arrays `[height, width, num_channels]`. Make sure to take this into account
when constructing your graph if you are intermixing GIF files with BMP, JPEG,
and/or PNG files. Alternately, set the `expand_animations` argument of this
function to `False`, in which case the op will return 3-dimensional tensors
and will truncate animated GIF files to the first frame.
Note: `decode_gif` and `decode_webp` return a 4-D array of
`[num_frames, height, width, 3]`, as opposed to the other image
formats which always return 3-D arrays of the form `[height, width,
num_channels]`. Make sure to take this into account when
constructing your graph if you are intermixing animation with static
images. Alternately, set the `expand_animations` argument of this
function to `False`, in which case the op will return 3-dimensional
tensors and will truncate animations to the first frame.
NOTE: If the first frame of an animated GIF does not occupy the entire
canvas (maximum frame width x maximum frame height), then it fills the
@ -3304,10 +3310,9 @@ def decode_image(contents,
name: A name for the operation (optional)
expand_animations: An optional `bool`. Defaults to `True`. Controls the
shape of the returned op's output. If `True`, the returned op will produce
a 3-D tensor for PNG, JPEG, and BMP files; and a 4-D tensor for all GIFs,
whether animated or not. If, `False`, the returned op will produce a 3-D
tensor for all file types and will truncate animated GIFs to the first
frame.
a 4-D tensor for all GIFs and WebP images, animated or not, and a 3-D
tensor in all other cases. If, `False`, the returned op will produce a 3-D
tensor for all file types and will truncate animations to the first frame.
Returns:
`Tensor` with type `dtype` and a 3- or 4-dimensional shape, depending on

View File

@ -4898,6 +4898,102 @@ class GifTest(test_util.TensorFlowTestCase):
self.assertAllEqual(image[2], frame2)
class WebpTest(test_util.TensorFlowTestCase, parameterized.TestCase):
def _path(self, name):
base = "tensorflow/core/lib/webp/testdata/"
return os.path.join(base, name)
@parameterized.named_parameters([
("_rgbNoise", "RGB_noise_large_pixels_115x115.webp", (1, 115, 115, 3)),
("_lossless", "lossless_raw.webp", (1, 32, 32, 3)),
("_alpha", "lossy_alpha1.webp", (1, 307, 1000, 4)),
])
def testRegularFile(self, filename, expected_dimensions):
# Read a real WebP image, via both APIs and check they're equal.
with self.cached_session():
webp = io_ops.read_file(self._path(filename))
image0 = image_ops.decode_webp(webp)
image1 = image_ops.decode_image(webp)
webp, image0, image1 = self.evaluate([webp, image0, image1])
self.assertEqual(image0.shape, expected_dimensions)
self.assertAllEqual(image0, image1)
def testAnimation(self):
# Read a WebP animation file, via both APIs and check they're equal.
with self.cached_session():
webp = io_ops.read_file(self._path("bouncy_ball.webp"))
expected_dimensions = (15, 450, 450, 4)
image0 = image_ops.decode_webp(webp)
image1 = image_ops.decode_image(webp, expand_animations=True)
webp, image0, image1 = self.evaluate([webp, image0, image1])
self.assertEqual(image0.shape, expected_dimensions)
self.assertAllEqual(image0, image1)
def testAnimationFrame0(self):
# Read a WebP animation file, via both APIs, but drop
# animation. Compare frame 0.
with self.cached_session():
webp = io_ops.read_file(self._path("bouncy_ball.webp"))
expected_anim_dimensions = (15, 450, 450, 4)
expected_still_dimensions = (450, 450, 4)
# decode_webp will return all the frames, but we should get the
# same frame 0 in both cases.
image0 = image_ops.decode_webp(webp)
image1 = image_ops.decode_image(webp, expand_animations=False)
webp, image0, image1 = self.evaluate([webp, image0, image1])
self.assertEqual(image0.shape, expected_anim_dimensions)
self.assertEqual(image1.shape, expected_still_dimensions)
# Compare frame0 of image0 to image1.
self.assertAllEqual(image0[0, ...], image1)
def testChannelsArg(self):
# Shape function requires placeholders and a graph.
with ops.Graph().as_default():
with self.cached_session():
webp = io_ops.read_file(
self._path("RGB_noise_large_pixels_115x115.webp")
)
for channels in 0, 3, 4:
image = image_ops.decode_webp(webp, channels=channels)
self.assertEqual(
image.get_shape().as_list(), [None, None, None, channels or None]
)
def testInvalidChannels(self):
with self.cached_session():
webp = io_ops.read_file(self._path("RGB_noise_large_pixels_115x115.webp"))
# DecodeImage supports grayscale, but WebP does not.
message = "WebP only supports 3 or 4 channels"
with self.assertRaisesRegex(
(errors.InvalidArgumentError, ValueError), message
):
op = image_ops.decode_webp(webp, channels=1)
self.evaluate(op)
@parameterized.named_parameters(
[("_int8", np.int8), ("_int16", np.int16), ("_float32", np.float32)]
)
def testUnsupportedDtypes(self, dtype):
with self.cached_session():
webp = io_ops.read_file(self._path("RGB_noise_large_pixels_115x115.webp"))
message = "WebP only supports uint8"
with self.assertRaisesRegex(
(errors.InvalidArgumentError, ValueError), message
):
# Note: we're testing with decode_image, since decode_webp
# *statically* does not support anything other than uint8.
op = image_ops.decode_image(webp, dtype=dtype)
self.evaluate(op)
class ConvertImageTest(test_util.TensorFlowTestCase):
def _convert(self, original, original_dtype, output_dtype, expected):

View File

@ -72,6 +72,10 @@ tf_module {
name: "decode_png"
argspec: "args=[\'contents\', \'channels\', \'dtype\', \'name\'], varargs=None, keywords=None, defaults=[\'0\', \"<dtype: \'uint8\'>\", \'None\'], "
}
member_method {
name: "decode_webp"
argspec: "args=[\'contents\', \'channels\', \'dtype\', \'name\'], varargs=None, keywords=None, defaults=[\'0\', \"<dtype: \'uint8\'>\", \'None\'], "
}
member_method {
name: "draw_bounding_boxes"
argspec: "args=[\'images\', \'boxes\', \'name\', \'colors\'], varargs=None, keywords=None, defaults=[\'None\', \'None\'], "

View File

@ -100,6 +100,10 @@ tf_module {
name: "decode_raw"
argspec: "args=[\'input_bytes\', \'out_type\', \'little_endian\', \'name\', \'bytes\'], varargs=None, keywords=None, defaults=[\'None\', \'None\', \'True\', \'None\', \'None\'], "
}
member_method {
name: "decode_webp"
argspec: "args=[\'contents\', \'channels\', \'dtype\', \'name\'], varargs=None, keywords=None, defaults=[\'0\', \"<dtype: \'uint8\'>\", \'None\'], "
}
member_method {
name: "deserialize_many_sparse"
argspec: "args=[\'serialized_sparse\', \'dtype\', \'rank\', \'name\'], varargs=None, keywords=None, defaults=[\'None\', \'None\'], "

View File

@ -1236,6 +1236,10 @@ tf_module {
name: "DecodeWav"
argspec: "args=[\'contents\', \'desired_channels\', \'desired_samples\', \'name\'], varargs=None, keywords=None, defaults=[\'-1\', \'-1\', \'None\'], "
}
member_method {
name: "DecodeWebP"
argspec: "args=[\'contents\', \'channels\', \'dtype\', \'name\'], varargs=None, keywords=None, defaults=[\'0\', \"<dtype: \'uint8\'>\", \'None\'], "
}
member_method {
name: "DeepCopy"
argspec: "args=[\'x\', \'name\'], varargs=None, keywords=None, defaults=[\'None\'], "

View File

@ -72,6 +72,10 @@ tf_module {
name: "decode_png"
argspec: "args=[\'contents\', \'channels\', \'dtype\', \'name\'], varargs=None, keywords=None, defaults=[\'0\', \"<dtype: \'uint8\'>\", \'None\'], "
}
member_method {
name: "decode_webp"
argspec: "args=[\'contents\', \'channels\', \'dtype\', \'name\'], varargs=None, keywords=None, defaults=[\'0\', \"<dtype: \'uint8\'>\", \'None\'], "
}
member_method {
name: "draw_bounding_boxes"
argspec: "args=[\'images\', \'boxes\', \'colors\', \'name\'], varargs=None, keywords=None, defaults=[\'None\'], "

View File

@ -80,6 +80,10 @@ tf_module {
name: "decode_raw"
argspec: "args=[\'input_bytes\', \'out_type\', \'little_endian\', \'fixed_length\', \'name\'], varargs=None, keywords=None, defaults=[\'True\', \'None\', \'None\'], "
}
member_method {
name: "decode_webp"
argspec: "args=[\'contents\', \'channels\', \'dtype\', \'name\'], varargs=None, keywords=None, defaults=[\'0\', \"<dtype: \'uint8\'>\", \'None\'], "
}
member_method {
name: "deserialize_many_sparse"
argspec: "args=[\'serialized_sparse\', \'dtype\', \'rank\', \'name\'], varargs=None, keywords=None, defaults=[\'None\', \'None\'], "

View File

@ -1236,6 +1236,10 @@ tf_module {
name: "DecodeWav"
argspec: "args=[\'contents\', \'desired_channels\', \'desired_samples\', \'name\'], varargs=None, keywords=None, defaults=[\'-1\', \'-1\', \'None\'], "
}
member_method {
name: "DecodeWebP"
argspec: "args=[\'contents\', \'channels\', \'dtype\', \'name\'], varargs=None, keywords=None, defaults=[\'0\', \"<dtype: \'uint8\'>\", \'None\'], "
}
member_method {
name: "DeepCopy"
argspec: "args=[\'x\', \'name\'], varargs=None, keywords=None, defaults=[\'None\'], "

View File

@ -40,6 +40,7 @@ load("//third_party/implib_so:workspace.bzl", implib_so = "repo")
load("//third_party/jpeg:workspace.bzl", jpeg = "repo")
load("//third_party/kissfft:workspace.bzl", kissfft = "repo")
load("//third_party/libprotobuf_mutator:workspace.bzl", libprotobuf_mutator = "repo")
load("//third_party/libwebp:workspace.bzl", libwebp = "repo")
load("//third_party/llvm:setup.bzl", "llvm_setup")
load("//third_party/nanobind:workspace.bzl", nanobind = "repo")
load("//third_party/nasm:workspace.bzl", nasm = "repo")
@ -82,6 +83,7 @@ def _initialize_third_party():
jpeg()
kissfft()
libprotobuf_mutator()
libwebp()
ml_dtypes()
nanobind()
nasm()

3
third_party/libwebp/BUILD vendored Normal file
View File

@ -0,0 +1,3 @@
# This empty BUILD file is required to make Bazel treat this directory as a package.
# copybara:uncomment package(default_applicable_licenses = ["//tensorflow:license"])

94
third_party/libwebp/libwebp.BUILD.bazel vendored Normal file
View File

@ -0,0 +1,94 @@
licenses(["notice"])
package(default_visibility = ["//visibility:public"])
C89_FLAGS = select({
"@platforms//cpu:x86_32": [
"-msse4.1",
"-DWEBP_HAVE_SSE41",
],
"@platforms//cpu:x86_64": [
"-msse4.1",
"-DWEBP_HAVE_SSE41",
],
"@platforms//cpu:armv7": [
"-marm",
"-mfpu=neon",
],
"//conditions:default": [],
})
cc_library(
name = "webp",
srcs = glob(
[
"src/enc/*.c",
"src/enc/*.h",
"src/dec/*.c",
"src/dec/*.h",
"src/mux/*.c",
"src/mux/*.h",
"src/demux/*.c",
"src/demux/*.h",
"src/dsp/*.c",
"src/dsp/*.h",
],
),
hdrs = [
"src/webp/decode.h",
"src/webp/demux.h",
"src/webp/encode.h",
"src/webp/format_constants.h",
"src/webp/mux.h",
"src/webp/mux_types.h",
"src/webp/types.h",
],
copts = C89_FLAGS,
include_prefix = "third_party/libwebp",
visibility = ["//visibility:public"],
deps = [
":sharpyuv",
":webp_utils",
],
)
cc_library(
name = "webp_utils",
srcs = glob(["src/utils/*.c"]) + [
"src/dsp/cpu.h",
"src/dsp/dsp.h",
"src/dsp/lossless_common.h",
"src/webp/decode.h",
"src/webp/encode.h",
"src/webp/format_constants.h",
"src/webp/types.h",
],
hdrs = glob(["src/utils/*.h"]),
copts = C89_FLAGS,
)
cc_library(
name = "sharpyuv",
srcs = [
"sharpyuv/sharpyuv.c",
"sharpyuv/sharpyuv_cpu.c",
"sharpyuv/sharpyuv_csp.c",
"sharpyuv/sharpyuv_dsp.c",
"sharpyuv/sharpyuv_dsp.h",
"sharpyuv/sharpyuv_gamma.c",
"sharpyuv/sharpyuv_neon.c",
"sharpyuv/sharpyuv_sse2.c",
"src/dsp/cpu.h",
"src/webp/types.h",
],
hdrs = [
"sharpyuv/sharpyuv.h",
"sharpyuv/sharpyuv_cpu.h",
"sharpyuv/sharpyuv_csp.h",
"sharpyuv/sharpyuv_gamma.h",
],
copts = C89_FLAGS,
textual_hdrs = [
"src/dsp/cpu.c",
],
)

13
third_party/libwebp/workspace.bzl vendored Normal file
View File

@ -0,0 +1,13 @@
"""Point to the libwebp repo on GitHub."""
load("//third_party:repo.bzl", "tf_http_archive", "tf_mirror_urls")
def repo():
# Use the same libwebp release as tensorstore
tf_http_archive(
name = "libwebp",
strip_prefix = "libwebp-1.4.0",
sha256 = "12af50c45530f0a292d39a88d952637e43fb2d4ab1883c44ae729840f7273381",
urls = tf_mirror_urls("https://github.com/webmproject/libwebp/archive/v1.4.0.tar.gz"),
build_file = "//third_party/libwebp:libwebp.BUILD.bazel",
)