Steganography in animated emoji using self-reference

Zhu, Zhiying; Ying, Qichao; Qian, Zhenxing; Zhang, Xinpeng

doi:10.1007/s00530-020-00723-z

Steganography in animated emoji using self-reference

Special Issue Paper
Open access
Published: 06 January 2021

Volume 27, pages 331–340, (2021)
Cite this article

Download PDF

You have full access to this open access article

Multimedia Systems Aims and scope Submit manuscript

Steganography in animated emoji using self-reference

Download PDF

Zhiying Zhu¹,
Qichao Ying¹,
Zhenxing Qian¹ &
…
Xinpeng Zhang¹

2604 Accesses
1 Citation
Explore all metrics

Abstract

Animated emoji is a kind of GIF image, which is widely used in online social networks (OSN) for its efficiency in transmitting vivid and personalized information. Aiming at realizing covert communication in animated emoji, this paper proposes an improved steganography framework in animated emoji. We propose a self-reference algorithm to improve the steganography security. Meanwhile, the relations between adjacent frames of the cover GIF image are considered to further improve the distortion function. After that we embed the secret message into the GIF image using the popular framework of Syndrome Trellis Coding (STC). Experimental results show that the proposed method can provide better security performances than state-of-the-art works.

Voicing of animated GIF by data hiding

Article 12 March 2015

Client-driven animated GIF generation framework using an acoustic feature

Article Open access 12 February 2021

Animation Zero Watermarking Algorithm Based on Edge Feature

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Different from the image formats like JPEG or TIFF, the GIF (Graphics Interchange Format) image is composed of a color palette and a set of index values. For its prevalence in social network applications, it is quite suitable for covert communication by hiding secret data into GIF images. GIF images can be divided into two categories, namely, the static GIF images and the dynamic GIF images. The dynamic is more popular in online social networks (OSN). Animated GIF is a type of dynamic GIF, it is often used to enrich people’s social expressions and emotional performance. The most popular GIF is the animated emoji, which is now widely used in OSN like WeChat, Twitter, and Weibo.

Steganography is a method of embedding secret messages into digital covers without introducing serious distortion [1]. In the early times, there were many steganography methods, such as LSB (Least Significant Bits) replacement, F5 [2], etc. LSB replacement is the simplest steganography method [3]. By modifying the least significant binary of pixel to store information, the human eye cannot perceive the changes. F5 uses matrix embedding to hide secret messages into the JPEG images. However, these methods are fragile against modern steganography analysis. Recently, the Syndrome Trellis Coding (STC) framework is popular for steganography [4], which tries to minimize the additive distortion between the cover and the stego using a predefined distortion function. Generally, the distortion function assigns different distortion costs to different elements of cover. For the spatial images, HILL (High-pass, Low-pass, and Low-pass) [5], WOW (Wavelet Obtained Weights) [6], and SUNIWARD (Spatial UNIversal Wavelet Relative Distortion) [7, 8] are widely proposed. Meanwhile, JUNIWARD (JPEG UNIversal WAvelet Relative Distortion) [7, 8], UED (UNIversal WAvelet Relative Distortion) [9], UERD (Uniform Embedding Revisited Distortion) [10] and HDS (Hybrid Distortion Steganography) [11] are widely used for JPEG images, in which the coefficients of the transfer domain are modified according to the distortion costs. Besides, adaptive methods are proposed to guide the modification direction of the coefficients, which can often improve the security of the modified image [12,13,14,15].

As an adversary, steganalysis is used to break steganography, which analyzes the features of an image to determine whether it contains secret messages [16]. The rapid development of steganalysis has brought a great challenge to steganography. Generally, the steganalysis is a kind of classifier that learns the differences of the features between the cover and the stego [17]. The security of a steganography method can be evaluated by observing the accuracy of the classifier. There are many feature extraction methods, e.g., SPAM (subtractive pixel adjacency model) [18], SRM (Spatial Rich Model) [19], DCTR (Discrete Cosine Transform Residual) [20], and GFR (Gabor Filters Residual) [21].

As the GIF images are widely used, researchers are paying more attention to GIF steganography. In [22], the first steganography algorithm is proposed for indexed images such as static GIF. The scheme searches for the closest color in the palette to reduce the distortion caused by data hiding. In [23], adaptive strategies are proposed to determine which pixels should be modified to embed data. To the best of our knowledge, the first method of embedding data into dynamic GIF images is proposed in [24]. Subsequently, more steganography approaches for animated GIF are proposed [25,26,27,28]. In [28], the researchers propose a framework to embed data into animated GIF using the difference between adjacent pixels in the same frame. In [29], a method is proposed to hide data into the animated emoji GIF using the STC framework, in which the distortion functions are improved to achieve better security.

In this paper, we propose a steganography scheme in animated emoji using self-reference, in which we integrate the data embedding impacts for the intra frames and the inter frames. We also provide an algorithm of generating a reference image for guiding the data embedding. With these algorithms, we can achieve a better performance of countering steganalysis. The rest of this paper is organized as follows. We introduce the backgrounds of GIF steganography in Section II. The proposed framework is described in Section III. Section IV shows the experimental results and analysis. Section V concludes the whole paper.

2 Preliminaries

Let there be $K$ frames in an emoji GIF image. Each frame is a color index matrix I with the size of $M \times N$. The image contains a color palette, in which a limited amount of colors are represented, e.g., 256 colors for an 8-bit palette.

The pixels are represented by ${\varvec{I}}_{ij}$, where $i \in \left\{ {1, 2, \ldots , M} \right\}$ and $j \in \left\{ {1, 2, \ldots , N} \right\}$. The value of each pixel is represented by an index $l$ defined in the palette ${\varvec{C}}_{l}$, where $l \in \left\{ {0,1, 2, \ldots ,255} \right\}$. Accordingly, an RGB value ($R_{ij} ,G_{ij} ,B_{ij}$) can be constructed from the index ${\varvec{I}}_{ij}$. Figure 1 illustrates the composition of an emoji GIF image.

We denote the cover and the stego images as X and Y, respectively. The pixels are represented by ${\varvec{X}}_{ij}$ and ${\varvec{Y}}_{ij}$. After embedding data into any pixel ${\varvec{X}}_{ij}$ in X, we obtain the pixel ${\varvec{Y}}_{ij}$ and the stego Y. The modification is either binary or ternary. In ternary embedding, each pixel in the stego is ${\varvec{Y}}_{ij} \in \left\{ {{\varvec{X}}_{ij} + 1, {\varvec{X}}_{ij} , {\varvec{X}}_{ij} - 1} \right\}$.

To minimize the change of RGB values caused by data modification, the method in [29] proposes a palette sorting algorithm. First, it calculates the square sum of the pixel values of the RGB channels corresponding to the $l$-th index value in the palette $C_{l}$.

$$t_{\left( l \right)} = R\left( l \right)^{2} + G\left( l \right)^{2} + B\left( l \right)^{2}$$

(1)

After ascendingly sorting the obtained values, we obtain a sorted palette $C_{l} ^{\prime}$. Based on the new palette, we can regenerate a new index matrix ${\varvec{I}}_{k} ^{\prime}$, where $k$ represent the $k$-th frame.

Let the embedding costs of ternary embedding be ${ }\rho_{ij}^{ + }$, $\rho_{ij}$ and $\rho_{ij}^{ - }$, respectively, where $\rho_{ij}$ = 0, $\rho_{ij}^{ + } \in \left( {0, + \infty } \right)$, $\rho_{ij}^{ - } \in \left( {0, + \infty } \right)$. The generated additive distortion function $D\left( {{\mathbf{X}}, {\mathbf{Y}}} \right)$ is the sum of the embedding costs of all pixels.

$$D\left( {{\varvec{X}}, {\varvec{Y}}} \right) = \mathop \sum \limits_{i = 1, j = 1}^{i = M,j = N} \rho_{ij} \left( {{\varvec{X}}_{ij} ,{\varvec{Y}}_{ij} } \right)$$

(2)

To embed a secret message into cover X, the framework of Syndrome Trellis Coding (STC) requires a modification probability $p_{ij}$ for each pixel. According to [32], the modification probability $p_{ij}$ and embedding cost $\rho_{ij}$ can be calculated by (3).

$$p_{ij}^{\left( I \right)} = \frac{{e^{{ - \lambda \rho_{ij} (I)}} }}{{\mathop \sum \nolimits_{{I \in \left\{ { + 1, 0, - 1} \right\}}} e^{{ - \lambda \rho_{ij} (I)}} }}$$

(3)

In (3), when the embedding is binary, $\left| I \right|$= 2; and when it is ternary, $\left| I \right|$= 3. Because the embedding cost $\rho_{ij}$ is known, $p_{ij}$ can be placed in (4) to obtain the parameter λ, where the m is the amount of secrete data to be embedded by the data-hider.

$$H\left( p \right) = - \mathop \sum \limits_{i = 1, j = 1}^{i = M,j = N} p_{ij} \log p_{ij} = m$$

(4)

3 Proposed framework

The proposed scheme is depicted in Fig. 2. First, after the sort the palette, we decompose the animated GIF into several frames. For each frame, we retrieve the RGB values of every pixel according to the GIF color palette and convert each fame ${{\varvec{I}}}_{k}^{\prime}$ into a color image ${F}_{k}$, where k is the frame index. Then, we construct a reference frame ${\widehat{F}}_{k}$ for each frame. With ${\widehat{F}}_{k}$ we optimize the embedding costs. We further improve the inter-frame distortion using the previous frame as a reference. After embedding data into each frame, we obtain a stego GIF.

A.
Improved bipolar embedding.

Since GIF is a compressed format, it can be regarded as a 256-color image compressed from a true-color image. Therefore, we can use the original content of the image before GIF compression to improve Bipolar Embedding. We convert the GIF fames into the color images ${ }{\varvec{F}} = \left\{ {{\varvec{F}}_{1} , \ldots ,{\varvec{F}}_{{\text{K}}} } \right\}$ according to the palette, where K is the number of frames. For each frame, every pixel B = ($R_{ij} ,G_{ij} ,B_{ij}$) has a corresponding pixel A = $\left( {\hat{R}_{ij} ,\hat{G}_{ij} ,\hat{B}_{ij} } \right)$ at the same location before GIF compression. It is equivalent to shift the RGB value from point A to point B. Vector AB stands for the distortion during compression.

When we embed secret messages into ${\varvec{I}}_{ij}$, the pixels are either added or subtracted by one. In (5), a refers to the Hamming distance between A and B. We also use C = ($R_{ij}^{ + } , G_{ij}^{ + } , B_{ij}^{ + }$) or ($R_{ij}^{ - } , G_{ij}^{ - } , B_{ij}^{ - }$) as the corresponding pixel at the same location after embedding. In (6) and (7), the Hamming distances caused by + 1 and − 1 operation are defined as b⁺ and b⁻, respectively.

$${\varvec{a}} = \left[ {\left( {R_{ij} - \hat{R}_{ij} } \right),\left( {G_{ij} - \hat{G}_{ij} } \right),\left( {B_{ij} - \hat{B}_{ij} } \right)} \right]$$

(5)

$${\varvec{b}}^{ + } { } = \left[ {\left( {R_{ij}^{ + } - R_{ij} } \right),\left( {G_{ij}^{ + } - G_{ij} } \right),\left( {B_{ij}^{ + } - B_{ij} } \right)} \right]$$

(6)

$${\varvec{b}}^{ - } = \left[ {\left( {R_{ij}^{ - } - R_{ij} } \right),\left( {G_{ij}^{ - } - G_{ij} } \right),\left( {B_{ij}^{ - } - B_{ij} } \right)} \right]$$

(7)

Subsequently, we define the modification angle between a and b⁺ or b⁻ as $\theta^{ + }$ or $\theta^{ - }$ in (8) and (9). The operator $\left| \cdot \right|$ stands for the module of vector.

$$\theta^{ + } = \arccos \left( {\frac{{\user2{a } \cdot { }{\varvec{b}}^{ + } }}{{\left| {\varvec{a}} \right| \cdot \left| {{\varvec{b}}^{ + } } \right|}}} \right)$$

(8)

$$\theta^{ - } = \arccos \left( {\frac{{\user2{a } \cdot { }{\varvec{b}}^{ - } }}{{\left| {\varvec{a}} \right| \cdot \left| {{\varvec{b}}^{ - } } \right|}}} \right)$$

(9)

Figure 3 illustrates the cases that the modification angles are acute or obtuse, respectively. Vector AB stands for the distortion during compression and vector BC stands for the distortion caused by data embedding.

In Fig. 3a, when $\theta$ between the two vectors AB and BC is acute, the compression direction is the same as the embedding direction. In the other words, the extra error would be the embedding error “plus” the compression error. In Fig. 3b, when $\theta$ is an obtuse angle, the compression direction and embedding direction are the opposite. Then, the extra error would be the embedding error “minus” the compression error. We denote the extra error AC in Fig. 3 as c⁺ and c⁻, in (10).

$$\left\{ {\begin{array}{*{20}c} {{\varvec{c}}^{ + } = a + {\varvec{b}}^{ + } } \\ {{\varvec{c}}^{ - } = a + {\varvec{b}}^{ - } } \\ \end{array} } \right.$$

(10)

In summary, when $\theta^{ + } \in \left( {0, \pi /2} \right)$ or $\theta^{ - } \in \left( {0, \pi /2} \right)$, $\left| {{\varvec{c}}^{ + } } \right|$ or $\left| {{\varvec{c}}^{ - } } \right|$ is larger than each element in $\left\{ {\left| {\varvec{a}} \right|,\left| {{\varvec{b}}^{ + } } \right|,\left| {{\varvec{b}}^{ - } } \right|} \right\}$. When $\theta^{ + } \in \left( {\pi /2, \pi } \right)$ or $\theta^{ - } \in \left( {\pi /2, \pi } \right)$, c⁺ or c⁻ is smaller than anyone in $\left\{ {\left| {\varvec{a}} \right|,\left| {{\varvec{b}}^{ + } } \right|,\left| {{\varvec{b}}^{ - } } \right|} \right\}$. Therefore, when $\theta^{ + }$ or $\theta^{ - }$ is obtuse, $\left| {\varvec{c}} \right|$ would be smaller. Therefore, we prefer the pixel modification when the angle $\theta^{ + }$ or $\theta^{ - }$ be obtuse, and restrict data embedding when $\theta^{ + }$ or $\theta^{ - }$ is acute.

This algorithm reduces the extra error caused by embedding modification and makes the compressed image closer to the original image. Thus, it can improve the security of steganography effectively.

B.
Reference construction.

According to the aforementioned analysis, if a data-hider have the original content of the image before GIF compression, a better performance of a security can be achieved by modifying the pixel values toward the original values. However, in most cases, the data hider does not have the original content of the images before compression. Therefore, we use an algorithm to construct a reference image for each frame.

To achieve a satisfactory performance, the constructed reference images should be close to the original image before GIF compression. Inspired by [30], we can treat the image compression as a procedure of adding noise into the original content. To remove this kind of noise, we propose to use the DnCNN model proposed in [31] to construct a reference image. This model has been proved to be useful in many denoising tasks. Different from the existing denoising methods that are defined for additive white Gaussian noise at a certain noise level, the DnCNN model is able to handle Gaussian denoising with the unknown noise level. Besides, this model is able to handle multiple general image denoising tasks, such as Gaussian denoising, single image super-resolution, and JPEG image deblocking.

Denote the luminance of the original image as Y_Ori and the GIF-compressed image as Y_Comp. As shown in Fig. 4, the residual image is the difference between the original image and compressed image in the luminance channel. We denote Y_Res as the residual image, where Y_Ori = Y_Comp − Y_Res. In other words, the residual image can be regarded as a kind of image noise. With a residual learning strategy, the residual image can be estimated [31]. DnCNN is trained on the luminance channel because human perception is more sensitive to changes in brightness than changes in chrominance.

In Fig. 5, an animated emoji is decomposed into several frames, then the GIF-compressed images are converted from RGB space to the YCbCr space. The DnCNN network is trained to detect the residual images from the luminance of the color frames. Three different colors represent three types of layers. For the first layer, marked in yellow, 64 filters are used to generate 64 feature maps. ReLU is nonlinearity activation function. For layers 2∼(D − 1), marked in blue, 64 filters sized 3 × 3 × 64 are used. The batch normalization is added between convolution and ReLU. It is incorporated to speed up training as well as boost the denoising performance. The orange layer is the last layer, in which filters sized 3 × 3 × 64 are used to reconstruct the output. D is the depth of DnCNN. For an image denoising task, the depth of network (the number of convolution layers) is generally specified as 20.

With the model of DnCNN, we can reconstruct an undistorted version of a compressed frame by subtracting the compressed luminance channel from a residual image, and then converts the image back to the RGB color space.

We denote the $k$-th RGB frame of GIF as $F_{k}$, which is constructed from ${\varvec{I}}_{k} ^{\prime}$ and $C_{l} ^{\prime}$. Let the reference frame be $\hat{F}_{k}$. $Y_{{\hat{F}_{k} }}$ and $Y_{{F_{k} }}$ are the luminance channel of $\hat{F}_{k}$ and $F_{k}$. The reference frame $Y_{{\hat{F}_{k} }}$ can be calculated by (11)

$$Y_{{\hat{F}_{k} }} = Y_{{F_{k} }} - {\text{Dn}}\_{\text{CNN}}\left( {Y_{{F_{k} }} } \right)$$

(11)

where ${\text{Dn}}\_{\text{CNN}}\left( \cdot \right)$ represents the DnCNN denoising network. Concatenate the denoised luminance channel $Y_{{\hat{F}_{k} }}$ with the original chrominance channels to obtain the denoised image in the YCbCr space. We further convert the YCbCr image to RGB to generate the reference image $\hat{F}_{k}$. The reference image $\hat{F}_{k}$ is close to the original image $\tilde{F}_{k}$.

C.
Distortion function improvement.

Let $\rho_{ij}^{ + }$ and $\rho_{ij}^{ - }$ be the embedding cost for + 1 and − 1 in the intra-frame embedding, respectively, where $i \in \left\{ {1, \ldots ,M} \right\}, j \in \left\{ {1, \ldots ,N} \right\}$. In many steganography methods based on STC, $\rho_{ij}^{ + }$ is identical to $\rho_{ij}^{ - }$. In the proposed method, we improve the embedding cost function according to the RGB value ($\hat{R}_{ij} ,\hat{G}_{ij} ,\hat{B}_{ij}$) of the reference image $\hat{F}$.

We first initialize the original costs $\rho_{ij}$ for the pixels in each frame using the traditional distortion functions like HILL [5], WOW [6] and UNIWARD [7, 8]. For each pixel, we multiply the original cost $\rho_{ij}$ by a factor ${\upalpha }$. There are two cases for the factor ${\upalpha }$ when performing $\pm$ 1 operations, namely, $\alpha^{ + }$ or $\alpha^{ - }$, which are depicted in (12) and (13). We adjust the distortion function in (14) and (15) by combining three optimization factors, in which wetCost represents a very large value, e.g., 10⁸ in the experiments.

$$\alpha^{ + } = \left| {{\varvec{a}} + {\varvec{b}}^{ + } } \right|/\left( {\left| {\varvec{a}} \right| + \left| {{\varvec{b}}^{ + } } \right|} \right)$$

(12)

$$\alpha^{ - } = \left| {{\varvec{a}} + {\varvec{b}}^{ - } } \right|{ }/{ }\left( {\left| {\varvec{a}} \right| + \left| {{\varvec{b}}^{ - } } \right|} \right)$$

(13)

$$\rho_{ij}^{ + } = \left\{ {\begin{array}{ll} {{\text{wetCost }} \quad {\text{if}}\, \theta^{ + } \in \left( {0, \pi /2} \right)} \\ {\rho_{ij} \quad {\text{if}}\, \theta^{ + } = \pi /2} \\ {\alpha^{ + } \cdot \rho_{ij} \quad {\text{if}}\, \theta^{ + } \in \left( {\pi /2, \pi } \right)} \\ \end{array} } \right.$$

(14)

$$\rho_{ij}^{ - } = \left\{ {\begin{array}{ll} {{\text{wetCost }} \quad {\text{if}}\, \theta^{ - } \in \left( {0, \pi /2} \right)} \\ {\rho_{ij} \quad {\text{if}}\, \theta^{ - } = \pi /2} \\ {\alpha^{ - } \cdot \rho_{ij} \quad {\text{if}}\, \theta^{ - } \in \left( {\pi /2, \pi } \right)} \\ \end{array} } \right.$$

(15)

On the other hand, during data hiding, there would be differences between adjacent frames. Therefore, we must consider the impact of inter-frame embedding. For each frame $F_{k}$, we use the previous frame $F_{k - 1}$ as reference. The RGB values ($R_{ij}^{k - 1} ,G_{ij}^{k - 1} ,B_{ij}^{k - 1}$) from $F_{k - 1}$ is used to guide the modification of the current frame. The procedures of inter-frame embedding are similar except that a is redefined in (16) and the cost for inter-frame embedding are redefined as $\mathop {\rho_{ij}^{ + } }\limits^{ \cdot }$ and $\mathop {\rho_{ij}^{ - } }\limits^{ \cdot }$ in (17) and (18). Unlike a defined in (5), a redefined in (16) represents the change of direction of RGB values at the same position of adjacent frames. Then we get $\alpha^{ + }$ and $\alpha^{ - }$ by applying (12) and (13), and update the distortion function as (17) and (18).

$${\varvec{a}} = \left[ {\left( {R_{ij} - R_{ij}^{k - 1} } \right),\left( {G_{ij} - G_{ij}^{k - 1} } \right),\left( {B_{ij} - B_{ij}^{k - 1} } \right)} \right]$$

(16)

$$\mathop {\rho_{ij}^{ + } }\limits^{ \cdot } = \left\{ {\begin{array}{ll} {{\text{wetCost }}\quad {\text{if}}\, \theta^{ + } \in \left( {0, \pi /2} \right)} \\ {\rho_{ij}\quad {\text{if}}\, \theta^{ + } = \pi /2} \\ {\alpha^{ + } \cdot \rho_{ij}\quad {\text{if}}\, \theta^{ + } \in \left( {\pi /2, \pi } \right)} \\ \end{array} } \right.$$

(17)

$$\mathop {\rho_{ij}^{ - } }\limits^{ \cdot } = \left\{ {\begin{array}{ll} {{\text{wetCost }}\quad {\text{if}}\, \theta^{ - } \in \left( {0,\pi /2} \right)} \\ {\rho_{ij}\quad {\text{if}}\, \theta^{ - } = \pi /2} \\ {\alpha^{ - } \cdot \rho_{ij}\quad {\text{if}}\, \theta^{ - } \in \left( {\pi /2,\pi } \right)} \\ \end{array} } \right.$$

(18)

Finally, we combine the cost in intra-frame and inter-frame embedding, and obtain the final distortion function in (19).

$$\left\{ {\begin{array}{*{20}c} {\overline{{\rho_{ij}^{ + } }} = \mathop {\rho_{ij}^{ + } }\limits^{ \cdot } \times \rho_{ij}^{ + } } \\ {\overline{{\rho_{ij}^{ - } }} = \mathop {\rho_{ij}^{ - } }\limits^{ \cdot } \times \rho_{ij}^{ - } } \\ \end{array} } \right.$$

(19)

D.
Payload allocation.

For most animated GIF, the patterns on the different frame are different. To improve data security of each frame, we adaptively allocate different payloads to the frames according to their characteristics. We adopt the algorithm proposed in [32] for the purpose, in which an $m$-bit secret message is embedded into $n$ covers with a minimized distortion. The distortion is calculated in (20),

$$D_{{{\text{min}}}} \left( {m,n,\rho } \right) = \mathop \sum \limits_{i = 1,j = 1}^{i = M,j = N} \rho_{ij} p_{ij}$$

(20)

and the optimization problem is defined in (21),

$$\begin{gathered} {}_{{D_{{{\text{min}}}} }}^{{{\text{min}}}} D_{{{\text{min}}}} \left( {m,n,\rho } \right) \hfill \\ {\text{subject to}} \, H\left( p \right) = m \hfill \\ \end{gathered}$$

(21)

In this paper, the optimization problem is defined as (22),

$$\begin{gathered} {}_{{D_{{{\text{min}}}} }}^{{{\text{min}}}} D_{{{\text{min}}}} \left( {m,n,\overline{\rho }} \right) = \mathop \sum \limits_{k = 1}^{n} \overline{\rho }_{k} \overline{p}_{k} { } \hfill \\ {\text{subject to}} \mathop \sum \limits_{k = 1}^{K} H\left( {\overline{p}_{k} } \right) = m \hfill \\ \end{gathered}$$

(22)

where $n$ is the sum of the number of all selected GIF frames, $\overline{\rho }_{k}$ is the embedding cost of the $k$-th frame, and $\overline{p}_{k}$ is the embedding possibility of the $k$-th frame. After calculating the distortion function, we input the embedding costs and the payloads of all frames into the constraints in (22), and obtain the modification probability of each frame. The embedding payload of each frame can be calculated by (23)

$$m_{k} = \mathop \sum \limits_{k = 1}^{n} H\left( {\overline{p}_{k} } \right)$$

(23)

4 Experimental results

To verify the proposed framework, we have conducted many experiments on the emoji GIF dataset provided by [29] that contains 560 animated GIFs. Several examples are shown in Table 1. These GIF images are in 8-bit palette format where each image contains 256 colors.

Table 1 Several examples of the dataset

Full size table

We use binary pseudo-random sequences as the hidden data, i.e., the possibilities for zero and one are identical. We use the popular HILL, UNIWARD and WOW as initial distortion functions. The reference images are generated by DnCNN. We name the proposed steganography based on the improved versions of HILL, UNIWARD and WOW as PD-HILL, PD-UNIWARD and PD-WOW, respectively.

The embedding tasks are done by the STC framework. The capacity of secret data embedded in each frame are set as 600, 700, 800, 900, 1000, and 1100 bits, respectively. Subsequently, we also use the payloads of 0.05bpp, 0.1bpp, 0.15bpp, 0.2bpp and 0.25bpp for further comparisons.

For steganalysis, we use the ensemble classifier and the feature sets of SPAM and SRMQ1. Half of the cover and stego are used for training and the others are for testing data. The minimal total error $P_{{\text{E}}}$ is used as the criterion to evaluate the performances of steganography. In (24), $P_{FA}$ is the false alarm rate and $P_{MD}$ is the missed detection rate. The average $P_{{\text{E}}}$ by 10 random tests is used to evaluate the performance [17].

$$P_{{\text{E}}} = {}_{{P_{FA} }}^{{{\text{min}}}} \left( {\frac{{P_{FA} + P_{MD} }}{2}} \right)$$

(24)

When testing the security of intro-frame steganography, we convert every fame of GIF into a colorful image and transform them into gray images. SPAM or SRMQ1 are used to extract the features. When testing the security of inter-frame steganography, we calculate the difference between two frames, which is used for the subsequent steganalysis.

Table 2 provides the embedding test for an emoji image with different payloads and algorithms. The original HILL method is not effective in embedding a large payload in GIF due to texture simplicity, as obvious pepper and salt noises can be found in the smooth areas. While embedding data using the method in [29], the pepper and salt noises appear on the edges of the image. With our method, there are no obvious noises in the edge and texture areas.

Table 2 Embedding test for an animated GIF under different payloads and algorithms

Full size table

To show the effectiveness of the proposed framework, we use the same experiment setting as [29]. The proposed method PD-HILL, PD-WOW and PD-UNIWARD are used to embed the same amount of message into the same dataset. Table 3 show the testing errors of the PD-HILL, [29] and HILL against SPAM and SRMQ1. The results show that the proposed method has better visual quality as well as security.

Table 3 Testing errors of the PD-HILL, [29] and HILL against SPAM and SRMQ1 under low capacity

Full size table

We further apply larger embedding payloads, i.e., 0.05 bpp ~ 0.25 bpp. Many GIF images cannot accommodate large amounts of secret messages when using HILL. Therefore, we only compare our method with [29]. In Fig. 6, we use different initial distortion functions. The results show that the proposed method outperforms [29] in most cases.

Finally, we conduct the inter-frame security experiments, which are compared with [29]. We use ensemble classifier to calculate the ${P}_{\mathrm{E}}$ between frames. Table 4 shows the inter-frame testing errors of the PD-HILL and [29]. It can be seen that the proposed method has achieved better performance.

Table 4 The testing errors on inter-frame of the PD-HILL and [29] against SPAM and SRMQ1

Full size table

5 Conclusions

In this paper, we propose an improved steganography method for animated emoji using self-reference. We first construct the reference images by DnCNN network. Guided by the reference images, we adaptively modify the pixels according to the Hamming distances in RGB color space after conducting + 1 and − 1 operations. We further use the current frame as a reference to improve the security of steganography between frames. Several typical loss functions such as HILL are used and the embedding is done by the STC framework. Experimental results show that the security performances of the proposed method outperform state-of-the-art steganography method for animated emoji images.

References

Wang, Z., Zhang, X., Yin, Z.: Joint cover-selection and payload-allocation by steganographic distortion optimization. IEEE Signal Process. Lett. 25(10), 1530–1534 (2018)
Article Google Scholar
Westfeld, A.: F5-A steganographic algorithm-high capacity despite better steganalysis. In: Proceedings of 4th International Workshop on Information hiding, Pittsburgh, PA, USA, pp. 289–302 (2001)
Petitcolas, F.A.P., Anderson, R.J., Kuhn, M.G.: Information hiding-a survey. Proc. IEEE 87(7), 1062–1078 (1999)
Article Google Scholar
Filler, T., Judas, J., Fridrich, J.: Minimizing additive distortion in steganography using syndrome-trellis codes. IEEE Trans. Inf. Forensics Secur. 6(3), 920–935 (2011)
Article Google Scholar
Li, B., Wang, M., Huang, J., Li, X.: A new cost function for spatial image steganography. In: IEEE International Conference on Image Processing (ICIP), Paris, pp. 4206–4210 (2014)
Holub, V., Fridrich, J.: Designing steganographic distortion using directional filters. In: IEEE International Workshop on Information Forensics and Security (WIFS), Tenerife, pp. 234–239 (2012)
Holub, V., Fridrich, J.: Digital image steganography using universal distortion. In Proceedings of the first ACM workshop on Information hiding and multimedia security, New York, NY, USA pp. 59–68 (2013)
Holub, V., Fridrich, J., Denemark, T.: ‘Universal distortion function for steganography in an arbitrary domain.’ EURASIP J. Inf. Sec. 2014(1), 1–13 (2014)
Article Google Scholar
Guo, L., Ni, J., Shi, Y.Q.: Uniform embedding for efficient JPEG steganography. IEEE Trans. Inf. Forensics Sec. 9(5), 814–825 (2014)
Article Google Scholar
Guo, L., Ni, J., Su, W., Tang, C., Shi, Y.: Using statistical image model for JPEG steganography: uniform embedding revisited. IEEE Trans. Inf. Forensics Sec. 10(12), 2669–2680 (2015)
Article Google Scholar
Wei, Q., Yin, Z., Wang, Z., Zhang, X.: Distortion function based on residual blocks for JPEG steganography. Multimed. Tools Appl. 14, 17875–17888 (2018)
Article Google Scholar
Denemark, T., Fridrich, J.: Side-informed steganography with additive distortion. In: 2015 IEEE International Workshop on Information Forensics and Security (WIFS), Rome, pp. 1-6 (2015)
Denemark, T., Fridrich, J.: Model based steganography with precover. Electron. Imaging 2017(7), 56–66 (2017)
Article Google Scholar
Denemark, T., Fridrich, J.: Steganography with multiple JPEG images of the same scene. IEEE Trans. Inf. Forensics Sec. 12(10), 2308–2319 (2017)
Article Google Scholar
Wang, Z., Qian, Z., Zhang, X., Yang, M., Ye, D.: On improving distortion functions for JPEG steganography. IEEE Access 6, 74917–74930 (2018)
Article Google Scholar
Li, F., Wu, K., Zhang, X., Yu, J., Lei, J., Wen, M.: Robust batch steganography in social networks with non-uniform payload and data decomposition. IEEE Access 6, 29912–29925 (2018)
Article Google Scholar
Kodovsky, J., Fridrich, J., Holub, V.: Ensemble classifiers for steganalysis of digital media. IEEE Trans. Inf. Forensics Sec. 7(2), 432–444 (2012)
Article Google Scholar
Pevny, T., Bas, P., Fridrich, J.: Steganalysis by subtractive pixel adjacency matrix. IEEE Trans. Inf. Forensics Sec. 5(2), 215–224 (2010)
Article Google Scholar
Fridrich, J., Kodovsky, J.: Rich models for steganalysis of digital images. IEEE Trans. Inf. Forensics Sec. 7(3), 868–882 (2012)
Article Google Scholar
Holub, V., Fridrich, J.: Low-complexity features for JPEG steganalysis using undecimated DCT. IEEE Trans. Inf. Forensics Sec. 10(2), 219–228 (2015)
Article Google Scholar
Song, X., Liu, F., Yang, C., Luo, X., Zhang, Y.: Steganalysis of adaptive JPEG steganography using 2D Gabor filters. In: Proceedings of the 3rd ACM workshop on information hiding and multimedia security, pp.15–23 (2015)
Samaratunge, S.G.K.D.N.: New steganography technique for palette based images. In: 2007 International Conference on Industrial and Information Systems, Penadeniya, Sri Lanka, pp. 335–339 (2007)
Fathurohman, I.T., Purboyo, T.W., Nugrahaeni, R.A.: Comparative analysis of steganography using LSB and adaptive method on GIF image. Int. J. Appl. Eng. Res. 12(21), 10999–11006 (2017)
Google Scholar
Munir R.: Chaos-based modified “EzStego” algorithm for improving security of message hiding in GIF image. In: 2015 International Conference on Computer, Control, Informatics and Its Applications, Bandung, Indonesia, pp. 80–84 (2015)
Munir, R.: Application of the modified EzStego algorithm for hiding secret messages in the animated GIF images. In: 2016 2nd International Conference on Science in Information Technology, Balikpapan, Indonesia, pp. 58–62 (2016)
Juzar, M.T., Munir, R.: Message hiding in animated GIF using multibit assignment method. In: 2016 International Symposium on Electronics and Smart Devices, Bandung, Indonsia, pp. 225229 (2016)
Munir, R.: Visual cryptography of animated GIF image based on XOR operation. In: 2017 International Conference on Advanced Computing and Applications, Ho Chi Minh City, Vietnam, pp. 117–121 (2017)
Ratan, K.B., Kousik, D., Paramartha, D.: Steganography in grey scale animated GIF using hash based pixel value differencing. In: 2018 Fourth International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN), Kolkata, India, pp. 248–252 (2018)
Shi, L., Wang, Z., Qian, Z., et al.: Distortion function for emoji image steganography. Comput. Mater. Contin. 58, 943–953 (2019)
Article Google Scholar
Kutter, M., Winkler, S.: A vision-based masking model for spread-spectrum image watermarking. IEEE Trans. Image Process. 11(1), 16–25 (2002)
Article Google Scholar
Zhang, K., Zuo, W., Chen, Y., Meng, D., Zhang, L.: Beyond a Gaussian denoiser: residual learning of deep CNN for image denoising. IEEE Trans. Image Process. 26(7), 3142–3155 (2017)
Article MathSciNet Google Scholar
Filler, T., Fridrich, J.: Gibbs construction in steganography. IEEE Trans. Inf. Forensics Sec. 5(4), 705–720 (2010)
Article Google Scholar

Download references

Acknowledgements

This research was supported by National Science Foundation of China (Grant U20B205).

Author information

Authors and Affiliations

Shanghai Institute of Intelligent Electronics and Systems, School of Computer Science, Fudan University, Shanghai, 200433, China
Zhiying Zhu, Qichao Ying, Zhenxing Qian & Xinpeng Zhang

Authors

Zhiying Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Qichao Ying
View author publications
You can also search for this author in PubMed Google Scholar
Zhenxing Qian
View author publications
You can also search for this author in PubMed Google Scholar
Xinpeng Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Zhenxing Qian or Xinpeng Zhang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Zhu, Z., Ying, Q., Qian, Z. et al. Steganography in animated emoji using self-reference. Multimedia Systems 27, 331–340 (2021). https://doi.org/10.1007/s00530-020-00723-z

Download citation

Received: 19 July 2020
Accepted: 19 November 2020
Published: 06 January 2021
Issue Date: June 2021
DOI: https://doi.org/10.1007/s00530-020-00723-z

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Steganography in animated emoji using self-reference

Abstract