Advertisement

Multimedia Tools and Applications

, Volume 66, Issue 2, pp 247–266 | Cite as

A practical design of high-volume steganography in digital video files

  • Po-Chyi Su
  • Ming-Tse Lu
  • Ching-Yu Wu
Article

Abstract

In this research, we consider exploiting the large volume of audio/video data streams in compressed video clips/files for effective steganography. By observing that most of the distributed video files employ H.264 Advanced Video Coding (AVC) and MPEG Advanced Audio Coding (AAC) for video/audio compression, we examine the coding features in these data streams to determine appropriate data for modification so that the reliable high-volume information hiding can be achieved. Such issues as the perceptual quality, compressed bit-stream length, payload of embedding, effectiveness of extraction and efficiency of execution will be taken into consideration. First, the effects of using different coding features are investigated separately and three embedding profiles, i.e. High, Medium and Low, which indicate the amount of payload, will then be presented. The High profile is used to embed the maximum amount of hidden information when the high payload is the only major concern in the target application. The Medium profile is recommended since it is designed to achieve a good balance among several requirements. The Low profile is an efficient implementation for faster information embedding. The performances of these three profiles are reported and the suggested Medium profile can hide more than 10% of the compressed video file size in common Flash Video (FLV) files.

Keywords

Steganography H.264/AVC MPEG AAC Information hiding 

1 Introduction

Digital videos are widely available nowadays thanks to the fast advances of increasingly cheaper yet powerful computer facilities and broadband internet technologies. It is now possible to stream high-quality videos on the Internet and such web sites as YouTube, Yahoo! Video, DailyMotion, etc. offer free video viewing and sharing services. Watching videos anytime and anywhere may become people’s daily activity as the portable devices may become more and more popular. As a result, digital videos are ubiquitous and will be the major circulated multimedia content. Before the transmission or storage of digital videos, compression has to be applied to reduce their large data volume. Since human’s perceptual models are not perfect, lossy compression is usually preferred to increase the coding efficiency of digital videos without affecting human’s perception. In other words, there exists certain redundancy in digital video files. From the viewpoint of communication, this redundancy can serve as an “invisible” channel and, if one can make good use of it, the high-volume message may be transmitted by using digital videos as the camouflage. The secret communication is also termed “steganography”, which means “cover writing”, and can be applied to transmit sensitive information between trusted parties or when encryption is not allowed or safe in the normal communication channel. There are several requirements in steganography, including the high payload of hidden information, imperceptible distortion, security and reliability. To achieve the practical covert communication, digital video files can serve as good hosts, especially when these files are available to most of the people and their transmission is increasingly popular. This research aims at developing a steganographic scheme for popular digital video files.

H.264/Advanced Video Coding (AVC) is the state-of-the-art video codec and its decent coding performance lends itself to become the major coding mechanism in various applications. The most popular digital video formats/containers for file sharing nowadays, including FLV (Flash Video), MKV (Matroska Multimedia Container), AVI (Audio Video Interleave) and MP4, etc., support H.264/AVC so we choose the video files employing H.264/AVC as the “stego” host. In addition, since FLV has become very popular in video file sharing these days, we wrap the resulting H.264/AVC video bit-stream into a FLV file for the subsequent transmission. As FLV files contain both video and audio data streams, we employ both the video and audio data streams to embed as much secret information as possible. The chosen audio format is MPEG Advanced Audio Coding (AAC), which is usually adopted by FLV files. The scenario considered in this research is as follows. First, a user may acquire a video file compressed by a popular video codec, such as MPEG2 or MPEG4, and will use this video content as the stego host. Since our scheme will generate the stego H.264/AVC-based video in FLV format, the incoming video file will be transcoded into an H.264/AVC bit-stream so that the secret information can be embedded during the re-encoding process. In other words, both a popular video decoder and an H.264/AVC encoder will be required for the information hiding. It should be noted that the expanding of the compressed video bit-stream into raw frames is still necessary even when the input video was already encoded by H.264/AVC. However, in this case, the re-encoding process along with the embedding procedure may be speeded up since the existing coding parameters in the incoming video file can be referenced. The stego FLV video will then be transmitted to the trusted party and the hidden information should be extracted efficiently and reliably from the partially decoded bit-stream. To achieve the high-volume steganography and to retain the properties of the original audio/visual data, we investigate combinations of embedding methods to satisfy most of the requirements or restrictions. This paper is organized as follows. Some previous works will be described in Section 2 and our proposed scheme is detailed in Section 3. Experimental results will be shown in Section 4 to validate the trade-offs that we make among several different requirements. The conclusive remarks are given in Section 5.

2 Review of the related works

Unlike digital watermarking for copyright protection, in which the embedded information is usually required to withstand some common attacks such as the transcoding at a different bit-rate, random video frame dropping and resizing, etc., the high-volume steganography emphasizes more on the payload, reliability and the difficulty of detection with steganalysis [2], which is a process for revealing the existence of certain hidden information in a suspicious video. The quality of video should always be maintained to avoid affecting its target applications. Many data hiding schemes in digital videos [3, 4, 6, 9, 10, 11] have been proposed. As a video file consists of a large number of frames, similar data hiding techniques of still images may also be applied on videos. The most widely used technique to hide the information in digital images or video data is modifying the Least-Significant Bit (LSB) of samples [1, 8], which are usually coefficients or quantization indices if compressed images/videos are considered. JStego, F3, F4 and F5 [8] are popular approaches for the high-volume steganography. In the original design of JStego, the LSB’s of DCT coefficients in JPEG images are overwritten with the binary secret message consisting of “0” and “1”. JStego skips samples equal to 0 and 1 from the embedding operation to avoid generating zero values, which will cause the ambiguity to the extraction of hidden information. Other values are grouped into pairs, e.g. ...( − 2, − 1), (2, 3), (4, 5)...for assigning the binary information, (0, 1). The advantage of JStego is that the payload can be known in advance and the information embedding can be applied by changing the LSB directly. In F3, the LSB of non-zero coefficients will be matched with the secret message after the information embedding, which decreases the absolute values of coefficients. If the coefficient becomes zero after this modification, we will embed this bit once again in the next sample. The advantage of F3 is that the resultant file size of JPEG may be reduced. F4 is developed to complement some weaknesses of JStego and F3. The original quantized coefficients in a “clean” JPEG image should have more odd values than the even values and the histogram shows that the number of coefficients in each bin will decrease along with the increasing bin index. However, after the information embedding by JStego or F3, this property will be affected. In F4, the representation of binary hidden information will be presented inversely for negative coefficients so that the above-mentioned property can still be preserved after the information hiding. In F5, the permutating and matrix coding are applied to improve the perceptual quality and enhance the security level.

In addition to directly operating on the coefficients, some researchers employ the characteristics of popular video compression standards for information hiding. Fang et al. [3] proposed to embed the data into motion vectors’ phase angles in the inter-frames. Wang et al. [7] utilized motion vectors in P and B-pictures as the data carriers for hiding the copyright information. The magnitudes of motion vectors will decide whether the information hiding will be applied and the directions of motion vectors will determine the modification operations. Yang et al. [12] employed the intra-prediction modes and matrix coding. They map the two secret message bits to every three intra 4×4 blocks by the matrix coding. Kim et al. [2] proposed to embed the information bits by modifying the sign bits of the trailing ones in the context-adaptive variable length coding (CAVLC) of H.264/AVC bit-stream. The transcoding process may thus be avoided but the drift errors resulting from the different reference frame content may appear and degrade the video quality.

Compared with the existing works, our scheme tries to employ all the available coding features in video/audio data streams to maximize the capacity/payload of embedded data in popular video files. The constraints of perceptual quality, the resulting file size and the possibility of detection may help us determine a good combination of tools to achieve the effective steganography.

3 The proposed scheme

3.1 System overview

The block diagram of the proposed scheme is shown in Fig. 1. A video file is parsed first to extract the video and audio data streams. As the transcoding will be applied, the video and audio decoder will expand the compressed bit-streams into raw data. H.264/AVC and MPEG AAC encoders will compress the raw data right after certain parts of them are available and the embedding procedure will be triggered. If the input video file was already processed by H.264/AVC, an efficient implementation may be achievable since the coding modes in the original H.264/AVC file can be referenced. Some procedures of determining the coding modes in the transcoding can thus be skipped for speeding up the whole process. The stego audio-visual data will be packed to form the popular FLV files for transmission or storage. The extraction of hidden information is executed by reading the data from the partially decoded video and audio bit-streams. It is worth noting that our scheme is designed to simplify the extraction of hidden information in the decoder. Since the data volume of the video stream is much larger and most of the secret message will be embedded in it, we will discuss the issues of steganography in H.264/AVC first. Then, we will describe the procedures of information hiding in MPEG AAC as we try to leverage every piece of data in the video file to achieve the high-volume information hiding.
Fig. 1

The flowchart of the proposed scheme

As we know, H.264/AVC outperforms most of the existing video codecs significantly due to its various encoding tools. As previous video coding standards, H.264/AVC is based on motion compensated and DCT-like transform coding. Each picture is compressed by partitioning it into one or more slices; each slice consists of macroblocks, which are blocks of 16 × 16 luma samples with the corresponding chroma samples. Each macroblock may also be divided into sub-macroblock partitions for motion prediction. The prediction partitions can have seven different sizes, i.e. 16 × 16, 16 × 8, 8 × 16, 8 × 8, 8 × 4, 4 × 8 and 4 × 4. The large variety of partition shapes and the quarter sample compensation provide enhanced prediction accuracy. In intra-coded slices, 4 × 4 or 16 × 16 intra spatial prediction based on neighboring decoded pixels in the same slice will be applied. The spatial transform, which is an approximate DCT and can be implemented with integer operations with a few additions/shifts, will be calculated for the residual data. The point by point multiplication in the transform step will be combined with the quantization step and implemented by simple shifting operations to achieve the efficiency. CAVLC or CABAC (Context-Adaptive Binary Arithmetic Coding) will be used for lossless coding the data. Our video embedding scheme is integrated into the H.264/AVC encoding process as the quantization, intra prediction and motion estimation procedures in the encoder are modified. The effects of modifying each feature are analyzed as follows.

3.2 Steganography by employing quantized coefficients

H.264/AVC employs the intra and inter prediction to reduce the prediction residuals that have to be further processed by the entropy coding. However, the prediction residuals still occupy the largest portion of the data in the compressed video stream. We first consider the information hiding in the quantized coefficients of prediction residuals for the high-volume steganography without severely affecting the perceptual quality. In our scheme, both intra and inter residuals in Luma and Chroma will be processed and popular methodologies of JPEG-based schemes, such as JStego, F3, F4 and F5, may be employed here. As JStego and F3 may modify the histogram of quantized coefficients and a careful steganalysis may reveal the existence of hidden information, F4 and F5 will be preferred. In our scheme, F4 illustrated in Algorithm 1 is adopted. For each non-zero quantized AC coefficient coe, if it is a positive number and its LSB is not equal to BitToEmbed, its absolute value is decreased by one. On the other hand, for a negative quantized coefficient with its LSB equal to BitToEmbed, the modification has to be applied instead and the quantized coefficient is added by 1. In other words, the LSB of the negative value should be equal to the inverted target bit. It should be noted that, after the embedding operations, we have to check whether the resultant index becomes 0. If yes, the bit will be skipped by the decoder so we must embed it once again in the following sample.

The reason of choosing F4 over F5 is as follows. F5 is considered to be a better approach by using the permutation and matrix coding so that less data will need to be modified. However, as we would like to embed the information during the encoding process to achieve the efficiency, we have to finish coding the data in a subblock before we can proceed to encode the next subblock. In the matrix encoding, we need to collect 2 m  − 1 samples to embed m bits by changing only one sample. Since the prediction mechanism of H.264/AVC performs well, a lot of zero indices will appear and several subblocks may thus be required for collecting 2 m  − 1 nonzero samples for the information hiding. As the efficiency of information hiding is one of our major concerns, we choose the relatively straightforward F4 algorithm. It should also be noted that, if we require a higher degree of security in F4, some quantized coefficients may be skipped for embedding given that both the embedder and detector know the rule. This is eventually a trade-off between the security and payload of hidden information.

The information hiding by F4 has a positive side-effect in video coding since the magnitudes of the resultant coefficients tend to become smaller. That is, when we use a fixed Quantization Parameter (QP) to encode a video, the video size may be reduced after the information embedding. We list some statistics of the information hiding in the intra and inter residuals in Table 1 to show the degree of file-size decreases from the modification of quantized coefficients. The test CIF (Common Intermediate Format) videos, each with 300 frames, are “Container”, “Hall Monitor”, “Stefan” and “Mobile”, which are shown in Fig. 2. We use a fixed QP = 30 and the size of GOP, in which only I and P frames are used, is equal to 15 frames. The first and third columns show the numbers of embedded data measured in bytes for the intra and inter residuals respectively. The second and fourth columns show the size reduction of the intra and inter residuals respectively. Since “Container” and “Hall Monitor” are relatively static videos, the number of embedded bits are smaller but we can still embed a few hundred bits per frame in average. The active content in “Stefan” and “Mobile” will allow more nonzero coefficients to be modified, especially in the inter residuals. It should be noted that here we embed the information in the intra and inter residuals separately to show their effects. The file size reduction is around 20% of the original data size of intra and inter residuals in average. This positive side-effect from the modification of quantized indices may help to alleviate the problem of video file size increment from the information hiding in the motion vectors and intra modes described later. In fact, the information embedding in the intra residuals will eventually increase the size of inter residuals because of the added noises in the referenced I frames. When the rate control mechanism is enabled, different QP values may be assigned along with the encoding process. F4 algorithm helps to save some bits in the current frame so that the following frames can be assigned with smaller QP values, which may not only preserve the frame content better but also generate more nonzero indices to even raise the payload of hidden information.
Table 1

The data size decreases from the information embedding in quantized coefficients

Video

Intra

Inter

Payload (bytes)

Size (%)

Payload (bytes)

Size (%)

Container

15,306

−19.01

12,735

−18.71

Hall monitor

9,893

−15.14

12,779

−20.91

Stefan

70,979

−12.59

64,691

−20.13

Mobile

48,146

−17.90

119,762

−19.86

Fig. 2

The four test videos: a Container, b Hall monitor, c Stefan and d Mobile

If the size of resultant stego video is not a major concern, we may consider to generate more nonzero indices for information hiding. Some indices may be just quantized into zero because its value is barely smaller than half the current quantization step-size. We may leverage those coefficients to increase the payload. For example, if QP c is currently adopted in this subblock, we may try a smaller QP s  = QP c  − 1 to see whether there are zero coefficients/indices becoming nonzero after this test. If yes, the index may be changed to ±1 so that they can be considered for the information hiding.

3.3 Steganography by employing inter prediction

The inter prediction makes use of the temporal redundancy by providing references from one or more previously encoded frames for effective encoding in videos. In order to acquire the precise motion vectors, H.264/AVC adopts the quarter-pixel precision for motion compensation and the last two bits of the motion vectors represent the quarter units of motion vectors. In our scheme, we basically employ the last bit of motion vectors for effective information hiding and try to avoid affecting the coding performance severely. Because the motion vectors of neighboring partitions are often highly correlated, after determining the motion vectors by motion estimation, H.264/AVC predicts the motion vector from the nearby, previously coded partitions to obtain a predicted motion vector, MV predicted. The difference between the current motion vector, MV current, and MV predicted will be calculated and encoded. The motion vector difference is termed ”MVD” and formed in the same way at the decoder.

A straightforward implementation of information hiding by using inter prediction is to substitute the quarter unit of motion vector directly. However, this approach may lead to the degradation of coding performance considerably because of the increased inter residuals by the relatively inaccurate motion compensation. Since the transcoding is always applied in our scheme for the information embedding, the Sum of Absolute Difference (SAD) of the investigated/candidate motion vectors with their LSB being equal to the hidden bit will be available. Therefore, a more delicate method is to examine these SAD values to determine a good motion vector for hiding the two bits in the horizontal and vertical direction. Here, we illustrate the performance of this approach by using a fixed QP = 30 to encode the 300-frame CIF video “Container”. Table 2 shows the effect of bit-stream size increment of encoded elements. Although the payload of embedding can reach 24,920 bytes, the size of increment in such elements as “Luma from Inter” and “MVD” is still huge. The enlarged “MVD” indicates that the resulting motion vectors carrying the secret message are less regular with their neighboring motion vectors so more bits are required for coding MVD. The data size of “Else” in Table 2, which indicates the data volume in the headers, is decreased by around 40% instead because the coding structure is affected by the increased MVD.
Table 2

The data size increment of each element caused by MV embedding (container video; fixed QP = 30)

Encoded element

Number of bytes in transcoded file

Number of bytes in embedded file

Increment (%)

Luma from intra

136,490

147,443

8.02

Luma from inter

143,390

528,442

268.53

Chroma

15,569

17,898

14.95

MVD

16,395

74,070

351.78

Intra 4 × 4 mode

16,994

18,172

6.93

Else

40,485

24,659

−39.09

All

369,323

839,549

127.32

In our scheme, we first skip the information embedding when the motion vectors are zero vectors as there should be little content change at the same location of partition in adjacent frames. Besides, we choose to examine the data of MVD for information embedding and will skip the partition with its MVD equal to 0 from embedding. This strategy can limit the file size increment and keep the original property of the correlated motion vectors. In our implementation, we will avoid generating zero MVD after the information embedding to maintain the payload of hidden information. If only the motion vector whose MVD becomes zero after the information embedding has a reasonably small SAD, we will choose this motion vector and embed the same bit once again in the following partition. Table 3 demonstrates the result of MVD embedding in “Container”, in which the payload of embedding is around 2,218 bytes. By comparing Tables 2 and 3, we can see that the size increments in other coded elements are controlled. Although the data size of inter residuals is still increased by 58.84%, we think that the increased nonzero indices in the resultant inter residuals may raise the payload of the information hiding in quantized coefficients, which will reduce the data size as an offset that we described before. In addition, the data size changes of “MVD” and “Else” are limited as shown in Table 3.
Table 3

The data size increment of each element caused by MVD embedding (container video; fixed QP = 30)

Encoded element

Number of bytes in transcoded file

Number of bytes in embedded file

Increment (%)

Luma from intra

136,490

136,576

0.06

Luma from inter

143,390

227,761

58.84

Chroma

15,569

15,895

2.09

MVD

16,395

18,689

13.99

Intra 4 × 4 mode

16,994

16,994

0.00

Else

40,485

40,616

0.32

All

369,323

456,531

23.61

3.4 Steganography by employing intra prediction modes

In H.264/AVC, the intra prediction in the luma and chroma of a frame is quite important for reducing the coding redundancy since a coding block is usually related to its neighbors. In the baseline profile of H.264/AVC, four 16 × 16 or nine 4 × 4 intra prediction modes can be applied on the luma while four 8 × 8 prediction modes are available for the chroma. Our proposed scheme only utilizes the nine 4 × 4 intra prediction modes for information hiding. Figure 3a shows the 4 × 4 intra prediction. The encoded/reconstructed samples above and to the left (labeled as A to M) of the current block are available to both the encoder and decoder and nine prediction modes, including eight directions and one DC prediction, can thus be calculated. If the neighboring upper or left block of the current block is not available, the number of available modes is reduced. For instance, if the upper block is available while the left block is not, only “Horizontal”, “DC”, and “Horizontal-Up” modes can be chosen. Compared with 16 × 16 luma and 8 × 8 chroma prediction modes, 4 × 4 intra prediction modes offer finer prediction so the modification of them will affect the coding performance less. Besides, the content of these blocks is relatively richer and may thus be more suitable for imperceptible information hiding.
Fig. 3

a The labeling of prediction samples and b the directions of 4 × 4 intra prediction

To embed the information, one approach is to classify the nine modes into two groups, i.e. one representing “0” and one representing “1”, and let the Rate Distortion Optimization (RDO) decide a better intra prediction mode from the target group. One bit can thus be embedded in each 4 × 4 subblock. This process is termed as “Intra Prediction Mode (IPM) Embedding”. Again, we take “Container” as an example. By doing so, the payload can reach 8,330 bytes and the total bit-stream length is enlarged by 6.85%. The main reason of this bit-stream size increase comes from the fact that the correlation in the intra prediction modes of adjacent blocks is not taken into account. Table 4 shows the data size increments of elements. We can see that “Luma from Intra” and “Intra 4 × 4 Mode” are affected most. In our opinions, this file increment may be acceptable but a security concern may arise. Similar to the inter prediction, in H.264/AVC coding, the mode of the current block is first predicted by the minimum of the prediction modes of its two neighbors, i.e. the upper and left blocks. If the mode matches with the predicted one, only one flag bit called “Most Probable Mode” (MPM) is asserted and sent. Otherwise, this flag bit will be set as “0” and three extra bits have to be sent to signal which of the remaining eight modes is used. A more delicate method is thus to modify the intra modes when the flag bit is “0” since this block may be quite different from its neighbors and suitable for the modification. Besides, the use of MPM appears quite often in normal videos and we should keep this property in our stego video. We divide the eight modes into two groups to represent the binary secret information and the division or classification is applied according to Fig. 3. If the DC mode is not MPM, we replace the direction of MPM by the DC mode and then assign “0” and “1” to the prediction directions, which are known by the embedder and detector. The RDO is still required to determine a better prediction mode. The results of the four test videos are listed in Table 5 and we call this process the “Enhanced IPM Embedding”. For “Container”, although the payload becomes 3,074 bytes, the increment of file size is limited to around 3%. The data size increments of “Container” are shown in Table 6. We can see that the “Luma from Intra” is affected most and, again, the information hiding in the intra residuals may help to alleviate this increment of data size. The data volume of “Intra 4 × 4 Mode” is decreased because we avoid the modification if MPM is used so that the reasonable number of MPM’s can still be preserved as the original compressed bit-stream.
Table 4

The data size increment of each element caused by IPM embedding (container video; fixed QP = 30)

Encoded element

Number of bytes in transcoded file

Number of bytes in embedded file

Increment (%)

Luma from intra

136,490

154,738

13.36

Luma from inter

143,390

144,512

0.78

Chroma

15,569

15,545

−0.15

MVD

16,395

16,448

0.32

Intra 4 × 4 mode

16,994

22,723

33.71

Else

40,485

40,676

0.47

All

369,323

394,642

6.85

Table 5

The payload and corresponding file size increment of two embedding strategies in intra prediction modes

Video

IPM

Enhanced IPM

Payload (bytes)

Increment (%)

Payload (bytes)

Increment (%)

Container

8,330

6.85

3,074

2.95

Hall monitor

10,974

10.16

3,596

4.01

Stefan

41,922

8.67

21,180

3.79

Mobile

16,234

1.95

8,354

1.11

Table 6

The data size increment of each element caused by enhanced IPM embedding (container video; fixed QP = 30)

Encoded element

Number of bytes in transcoded file

Number of bytes in embedded file

Increment (%)

Luma from intra

136,490

147,460

8.03

Luma from inter

143,390

144,251

0.60

Chroma

15,569

15,560

−0.05

MVD

16,395

16,399

0.02

Intra 4 × 4 mode

16,994

16,043

−5.59

Else

40,485

40,797

0.77

All

369,323

380,510

3.02

3.5 Profiles of information hiding

After analyzing the three coding features for information hiding, we would like to define the profiles of steganography in the video stream, i.e. High, Medium and Low, which are related to the amount of payload and may be used in different applications or requirements. The High profile uses the enhanced residual embedding, which may make use of zero indices as described before, the MV embedding with RDO for choosing a more appropriate motion vector, and the 4 × 4 intra coding mode. The High profile is targeted at embedding the maximum amount of hidden information, which will mainly act as a reference for comparison or be used when the high payload is the only major concern. The Medium profile includes the information embedding in residuals, the MVD embedding, which will skip zero MVD and zero MV, and the 4 × 4 intra coding mode, which takes MPM into account. The Medium profile is the recommended embedding mode since it is designed to achieve a good balance among several requirements. The Low profile will only employ the intra and inter residuals. The Low profile is also an efficient implementation for faster information embedding, i.e. a mode copy scheme, and we describe its details as follows.

Our embedding methods require a transcoding process. If the input video is already an H.264/AVC bit-stream, we may avoid applying the time-consuming mode-decision again by referencing the modes in the input H.264 video, as long as the settings of the video, including the GOP structure and the bit-rate, etc. are kept the same. In our implementation of the Low profile embedding, the coding information that we record are the frame type, macroblock type, intra- and inter-prediction modes and motion vectors. After decoding a frame, the video encoder assign those modes/data directly to speed up the whole transcoding process. We compare the typical transcoding and mode-copy encoding in Table 7, where Frame Per Second (FPS) is used as the performance measurement. No information embedding method is applied in both cases and the rate control is enabled. The bit-rate is set as 500 Kbps (Bits Per Second). As the quality measured in Peak Signal to Noise Ratio (PSNR) is maintained, we can see that employing the mode-copy procedure is competent to the typical transcoding while substantial amount of time is saved.
Table 7

The comparison of the typical transcoding and mode-copy encoding (container: Bit-rate = 500 Kbps)

Typical transcoding

Transcoding with mode-copy

PSNR (dB)

FPS

PSNR (dB)

FPS

37.77

11.99

37.51

18.42

For the information embedding, we may use both the intra- and inter-prediction modes as the references and try to modify the modes directly without using RDO. If the speed is the major concern, this approach is feasible. To be more specific, in the information hiding on the intra-prediction modes, we can replace the prediction direction with DC (if DC mode is not MPM) and then group the modes into known pairs according to the prediction directions. For the information hiding in MVD, we may simply change the bits according to the incoming motion vectors. However, according to our previous discussion, the RDO is quite important during the transcoding for determining a suitable mode and maintaining the reasonably good rate-distortion performance of the stego video. As the main objective of the Low profile is to increase the efficiency of execution, we omit the information embedding of both motion vectors and intra modes and make use of the residuals only. The intra- and inter-prediction modes can thus be copied directly from the original H.264 video stream and used for the subsequent encoding.

3.6 Information hiding in advanced audio coding

MPEG AAC is a standardized compression scheme for digital audio and designed as the successor of the MP3 format. AAC makes use of many advanced coding techniques available at the time of its development to provide high quality multi-channel audios. Therefore, it becomes the kernel algorithm of encoding sound in the audio-visual compression standards. At the beginning of encoding, the filter bank is employed to transform the time domain signal to the frequency domain signal. An iteration loop is applied to quantize the spectral coefficients. The scalefactors of subbands are obtained and multiplied by all of the coefficients in the corresponding scalefactor bands. The number of required bits and the related information are determined to control the trade-off between the audio distortion and payload. The entropy encoding is followed according to the 12 pre-defined Huffman tables. Since the data of scalefactors and spectral coefficients occupy a significant part in the coded audio streams, we will make use of them to embed the information.

The scalefactors have been used for the effective information embedding purposes [5]. In our implementation, the scalefactors equal to zero will be skipped from the embedding and the scalefactor bands that use pseudo codebooks in the intensity stereo are also skipped. For nonzero scalefactors, the message bit is embedded by making the LSB of each scalefactor equal to the hidden binary information. The embedding payload of scalefactors in bytes is shown in Table 8, in which two different audio clips are employed. Music A is a classical music clip while Music B is an electronic music clip. Two target bit rates, i.e. High: 256 Kbps and Low: 128 Kbps, are used to encode the audio streams. We can see that the payload of the embedding can achieve around 1–3% of the total audio stream size.
Table 8

The payload of information embedding in the audio streams

Audio

Scalefactor

Quantization index

High (%)

Low (%)

High (%)

Low (%)

A(5:06)

1.55

2.86

7.10

6.79

B(3:51)

1.26

2.52

11.90

7.72

For the spectral coefficients after quantization, we apply the F4 algorithm to embed the information. Table 8 also shows the payload of using the quantization indices embedding and the average payload is around 6–12% of the audio stream size. It can be seen that “Music B” has a larger payload than “Music A” because Music B has more transient signals than Music A and more non-zero coefficients exist.

4 Experimental results

Our results are demonstrated in two parts, i.e. the information embedding in video and audio streams. The payload, quality and increment of bit-stream size are the major concerns in our designs. In the video embedding part, we first use the Medium profile and evaluate its performance on the four CIF videos with 300 frames, i.e. “Container”, “Hall Monitor”, “Stefan” and “Mobile”. Again, the GOP is set as 15 and only I and P frames are used. The proposed scheme is implemented in Intel Integrated Performance Primitives (IPP) version 5.2, which is an optimized run-time library and supports the efficient H.264/AVC coding.

First, we set a fixed QP value equal to 30 within all frames in the test videos to see the effects of different embedding methods. When a fixed QP is employed, the trade-off between the hidden information payload and the increment of bit-stream size is the major consideration. For each video, we record the payload and the increments of bit-stream in Table 9. We can find that “Mobile” video provides the largest payload because of the large motions in the content, which lead to more intra blocks. More quantization indices and intra prediction modes can thus be embedded. In our opinions, the video file size increment is limited as the file size changes are within 10%.
Table 9

The payload and file size change

Video

Payload (byte)/(%)

File size change (%)

Container

45,400/11.32

8.56

Monitor

33,094/9.43

−5.12

Stefan

216,703/12.81

2.54

Mobile

282,762/13.39

6.20

Using a fixed QP value is only an experiment to observe the trade-off among various embedding modes. We then enable the rate control to simulate the scenario of the real applications. Under a given target bit-rate, the issue that we will discuss is the trade-off between the payload and quality of video. In Fig. 4, the payloads of four test videos are shown. We can see that “Container” achieves the best payload performance at all the bit-rates since smaller QP values may be employed and more residuals can be embedded. In fact, the payloads of IPM and MVD tend to be independent from the target bit-rate since they are usually more related to the frame resolution.
Fig. 4

The payload of information embedding in videos under various bit-rates

Then, we consider the quality of embedded videos under various target bit-rates. We present the PSNR values of the embedded video and the transcoded videos. The values of quality decreases of four videos can be reflected from Table 10. We can see that the quality of transcoded videos with more active content is not as high as that of other static videos under the same bit-rate because the active video content may lead to quite a lot of inter-block residuals for compensating the large variations in video frames. It can be observed that the PSNR of videos with larger content-variations, such as “Mobile”, is decreased a lot after the information embedding. The reason could be that modifying motion vectors of such active videos seriously cause inaccuracy in the motion compensation. In the lower bit-rate, the difference between the transcoded and embedded videos is not as obvious as that in the higher bit-rate. Despite of that, the payload of our scheme can usually achieve more than 10% of the video size, as shown in Fig. 4.
Table 10

PSNR (dB) of stego videos and transcoded videos under various bit-rates

Video

500 Kbps

1 Mbps

2 Mbps

Transcoded

Stego

Transcoded

Stego

Transcoded

Stego

Container

37.77

35.78

41.37

39.10

47.10

42.95

Hall monitor

39.58

38.34

42.03

40.70

45.70

43.52

Stefan

28.13

26.67

31.91

29.47

37.69

33.24

Mobile

26.72

25.03

30.25

27.59

35.51

30.80

We next examine the performance of Low profile, which skips the most time-consuming procedures, i.e. mode decisions, in the video encoder, to increase the efficiency of execution. First, we record the time of embedding process, with and without employing the mode-copy procedures. Table 11 shows the ratio of the increased efficiency in execution time and we can see that the efficiency can be enhanced by more than 27% when the mode-copy strategy is employed. The lower the bit-rate, the more improvement can be achieved. Figure 5 shows the payload of embedding. The dotted lines in Fig. 5 show the payload of the hidden information and the solid lines demonstrate the payload of the Medium profile excluding the information embedding in the data of intra and inter predictions. We can see that the payload of the Low profile is slightly higher, especially in the lower bit-rate.
Table 11

The ratio of execution time decrease when the mode-copy strategy is adopted

Video

500 Kbps (%)

1 Mbps (%)

2 Mbps (%)

Container

53

39

30

Hall monitor

48

40

35

Stefan

35

32

30

Mobile

46

35

27

Fig. 5

The embedding payload under various bit-rates. Solid lines: Using Medium profile excluding the information embedding in the data of intra and inter predictions. Dotted lines: Employing the Low profile

We may expect that the proposed Low profile skips lots of searching steps so it may degrade the quality of video frames severely. Figure 6 shows the PSNR values of the four test videos. The dotted lines represent the PSNR values when the Low profile is used. We can see that the quality of these videos is not affected by much. In fact, its average PSNR may even be higher than that in the transcoded scheme, i.e. modifying only the residuals in the Medium profile, especially in the higher bit-rates. The possible reason is that, after the information is embedded in the previous frames, the number of intra-coded blocks in the current frame may be increased when the mode decisions are re-applied in the transcoding process. As the intra-coded blocks consume more bits, the QP values assigned in the following frames may become higher and the average PSNR values of the stego frames may thus be lower, when compared with those in the Low profile.
Fig. 6

The video quality under various bit-rates. Solid lines: Using Medium profile excluding the information embedding in the data of intra and inter predictions. Dotted lines: Employing the Low profile

The performances of the three profiles are demonstrated in Table 12, in which “Payload (bytes)” and “Payload (%)” indicate the volume of hidden information measured in bytes and its ratio to the resulting file size, respectively. “Size Increase” shows the percentage of file size increment compared with the size of the H.264 compressed video without information hiding. When a fixed QP is used, we focus on the payload and the file size increment. We can see that the file size increment of the High profile is magnificent as the sizes of “Container” and “Hall Monitors” are even doubled. The Low profile, which employs the residuals only, surely decreases the file sizes. The Medium profile tries to achieve the balance among these two factors. The average PSNR values of frames are also shown as the reference. The PSNR in High profile is the lowest among the three but may not be as low as what we expected since the most negative effect is the file size increment when a fixed QP is employed. Then, we enable the rate control mechanism, which will be applied in most of the applications. The target bit-rate is set as 2 Mbps. Since the file size is controlled, we focus on the frame quality and the payload here. From the average PSNR values and the payloads listed in Table 12, again, we think that the Medium profile helps to maintain the video quality and the high payload. It should be noted that the performance of the Low profile is reasonably good when the rate control is used so we may consider using the Low profile if the execution efficiency is the most important requirement.
Table 12

The performances of three profiles

 

Container

Monitor

Stefan

Mobile

Fixed QP = 30

  High profile

    Payload (bytes)

106,949

97,911

257,764

324,207

    Payload (%)

14.09

13.70

14.20

14.18

    Size increase (%)

105.52

93.11

9.96

14.89

    PSNR (dB)

35.21

36.28

31.64

30.40

  Medium profile

    Payload (bytes)

45,400

33,094

216,703

282,762

    Payload (%)

11.32

9.43

12.81

13.39

    Size increase (%)

8.56

−5.12

2.54

6.20

    PSNR (dB)

35.74

36.49

31.84

31.00

  Low profile

    Payload (bytes)

32,208

23,899

151,587

204,177

    Payload (%)

9.22

7.43

10.21

11.69

    Size increase (%)

−5.41

−13.10

−10.07

−12.23

    PSNR (dB)

35.70

36.71

32.01

31.03

Rate control enabled (2 Mbps)

  High profile

    Payload (bytes)

367,763

327,075

367,072

367,688

    Payload (%)

14.70

13.07

14.60

14.62

    Size increase (%)

0.03

−0.02

0.66

0.62

    PSNR (dB)

38.53

40.59

32.55

30.11

  Medium profile

    Payload (bytes)

346,244

286,006

340,980

346,233

    Payload (%)

13.92

11.59

13.69

13.90

    Size increase (%)

−0.04

−0.02

0.98

0.54

    PSNR (dB)

42.95

43.52

33.24

30.80

  Low profile

    Payload (bytes)

333,818

273,218

300,878

317,069

    Payload (%)

13.34

10.92

12.01

12.66

    Size increase (%)

−0.04

−0.02

0.89

0.98

    PSNR (dB)

43.90

44.01

34.90

32.68

For the information embedding in audio, we select some audio clips from the EBU SQAM (Sound Quality Assessment Material) CD, including “abba”, ”speech”, “baird” and “bach”. All the audio clips from EBU SQAM CD are encoded in lossless compression format (FLAC) and we transcode those clips into FLV as the input to our scheme. The audio encoder parameters are set to retain the quality of audio as much as possible. We employ the “target quality mode” in Nero AAC encoder to preserve the quality of original audio clips.

We first investigate the payload of embedding. The payload unit that we use is bits per “frame” (bpf), in which the “frame” is the basic element to collect the sampling data. Table 13 shows the ratio of short windows, the payload of scalefactors embedding, quantization indices embedding, and the embedding ratio of payload to the encoded audio size. All the audio clips are encoded at 192 Kbps. The ratio of short windows shows the characteristics of the audio presentation. The transient signal is the short-duration signal that contains more non-periodic or high-frequency components. It can be seen that “abba” and “speech” have more transient signals. By comparing the ratio of short windows and the payload, we can find that the ratio of short windows is more related to the embedding of the quantization indices. Figure 7 shows the values of the payload under different bit-rates. We can see that the higher the target bit-rate is set, the more payload of quantization indices embedding can be achieved because the number of coefficients in subbands increases. The payload of audio embedding can reach around 10% of the audio stream size in average.
Table 13

The payload of the audio embedding (192 Kbps)

Audio

Ratio of short windows (%)

Payload (bpf)

Ratio (%)

SF

QC

abba

14.78

76

341

10.55

speech

11.42

77

339

10.61

baird

0.11

92

271

8.20

bach

2.60

72

284

9.42

Fig. 7

The payload of quantization indices embedding in various bit-rates

Finally, some stego FLV files, in which both of the video and audio data streams are embedded with a large amount of information, can be found in our research webpage.1 The payload of each FLV file can reach more than 10% of the FLV file size and the perceptual quality is reasonably maintained.

5 Conclusion

We develop a high-volume steganographic scheme for FLV files. Both video and audio streams in FLV files are employed and several coding features are taken into account to achieve effective steganography. Three profiles of information hiding are presented and the users may select a suitable profile to meet the requirements of their applications. The payload of hidden information, the perceptual quality of audio-visual data, the file size increment and the security should be the major concerns in the design of such an information-hiding scheme. Experimental results demonstrate that the payload can reach more than 10% of the total file size when a good tradeoff is achieved.

Footnotes

References

  1. 1.
    Bhaumik AK, Choi M, Robles RJ, Balitanas MO (2009) Data hiding in video. Int J Database Theory ApplGoogle Scholar
  2. 2.
    Budhia U, Kundur D, Zourntos T (2006) Digital video steganalysis exploiting statistical visibility in the temporal domain. IEEE Trans Inf Forensic Secur 1(4):502–516CrossRefGoogle Scholar
  3. 3.
    Fang DY, Chang LW (2006) Data hiding for digital video with phase of motion vector. In: IEEE international symposium on circuits and systemsGoogle Scholar
  4. 4.
    Kapotas SK, Varsaki EE, Skodras AN (2007) Data hiding in H. 264 encoded video sequences. In: IEEE 9th workshop on multimedia signal processing, pp 373–376Google Scholar
  5. 5.
    Kirbiz S, Lemma AN, Celik MU, Katzenbeisser S (2007) Decode-time forensic watermarking of AAC bitstreams. IEEE Trans Inf Forensics Secur 2(4):683–696CrossRefGoogle Scholar
  6. 6.
    Liu Z, Lang H, Niu X, Yang YX (2004) A robust video watermarking in motion vectors. In: International conference on signal processing, pp 2358–2361Google Scholar
  7. 7.
    Wang HX, Li YN, Lu ZM, Sun SH (2005) Compressed domain video watermarking in motion vector. In: Knowledge-based intelligent information and engineering systems. Springer, New York, pp 580–586CrossRefGoogle Scholar
  8. 8.
    Westfeld A (2001) F5—a steganographic algorithm. In: Information hiding. Springer, New York, pp 289–302CrossRefGoogle Scholar
  9. 9.
    Wu M, Liu B (2003) Data hiding in image and video: part I—fundamental issues and solutions. IEEE Trans Image Process 12(6):685–695CrossRefGoogle Scholar
  10. 10.
    Wu M, Yu H, Liu B (2003) Data hiding in image and video: part II—designs and applications. IEEE Trans Image Process 12(6):696–705CrossRefGoogle Scholar
  11. 11.
    Xu C, Ping X, Zhang T (2006) Steganography in compressed video stream. In: First international conference on innovative computing, information and control. IEEE, Piscataway, pp 269–272Google Scholar
  12. 12.
    Yang G, Li J, He Y, Kang Z (2010) An information hiding algorithm based on intra-prediction modes and matrix coding for H.264/AVC video stream. AEU-Int J Electron CommunGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  1. 1.Department of Computer Science and Information EngineeringNational Central UniversityJhongliRepublic of China

Personalised recommendations