Real-time motion estimation based video steganography with preserved consistency and local optimality

Mohamed, Hassan; Elliethy, Ahmed; Abdelaziz, Amr; Aly, Hussein

doi:10.1007/s11042-024-18651-9

Real-time motion estimation based video steganography with preserved consistency and local optimality

Open access
Published: 27 March 2024

(2024)
Cite this article

Download PDF

You have full access to this open access article

Multimedia Tools and Applications Aims and scope Submit manuscript

Real-time motion estimation based video steganography with preserved consistency and local optimality

Download PDF

473 Accesses
1 Altmetric
Explore all metrics

Abstract

This paper introduces a novel steganographic technique for H.264 video that achieves an outstanding performance against state-of-the-art steganalysis techniques while maintaining real-time encoding performance constraints. The proposed technique embeds the secret message by altering the motion vectors (MVs) while preserving their local optimality and consistency feature to withstand the recently emerged steganalysis methods. Thanks to its macro-block (MB) basis architecture, the proposed technique satisfies the real-time constraints, eliminating the need to wait for the whole frame or group of pictures (GOP) and avoiding the need to perform any additional re-encoding step(s). Additionally, altering the MVs is performed in the motion estimation (ME) sub-pixel-refinement stage through a rule-based scheme that ensures each MB’s compatibility for embedding without detection by the aforementioned steganalysis methods. The proposed technique is integrated with the OpenH264 real-time video encoder and evaluated on widely used video sequences. The results prove that the proposed technique achieves a significant security performance against the steganalysis methods while maintaining an acceptable embedding rate, outperforming other state-of-the-art MV-based steganographic methods in real-time constrained environments. The proposed technique adds about $1-2\%$ overhead beyond the encoder running time. The source code is publicly available here: https://github.com/HassanMohamedGit/OpenH264-RealTime-steg.

Motion vector-based video steganography with preserved local optimality

Article 30 June 2015

A Motion Vector-Based Steganographic Algorithm for HEVC with MTB Mapping Strategy

Undetectable video steganography by considering spatio-temporal steganalytic features in the embedding cost function

Article 14 March 2020

Find the latest articles, discoveries, and news in related topics.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Video steganography embeds a secret message into an ordinary, innocent-looking cover digital video. The rapid growth of multimedia applications, such as internet video streaming, video telephony, or video conference, that communicate compressed videos between parties make compressed video streams an attractive choice for steganography [1, 2]. By hiding the secret message in a compressed video, a large hiding capacity can be achieved with hard to prove the presence of the message [3].

H.264, standardized as the Advanced Video Codec (AVC) [4], is the most widely used video codec [5]. Despite being obsolete by H.265 [6], H.264 is still supported by more than 65% of security monitoring devices [7]. Regarding software, recent statistics show that 67% of network videos use H.264 codec [7]. Moreover, H.264 is one of only three video codec choices used for YouTube live streaming [8]. The H.264 compressed videos contain various possible hiding spots within the structure of the used compression scheme [2]. For example, within the H.264 compression scheme, many coding stages can be used as hiding spots, such as the coded DCT coefficients [9,10,11,12], some flexible coding parameters (chosen by the encoder such as Macro-Block partitioning and quantization parameters) [13,14,15], entropy coding stage [16, 17] and motion estimation stage [18,19,20,21,22,23]. by introducing some modifications to the motion vectors (MVs) according to the secret message. The modifications are performed for the MV’s attributes, such as magnitude and phase angle, for every MV (for example, using LSB embedding) or selectively over specific candidate motion vectors (CMVs) that satisfy specific criteria.

The MV-based steganographic approaches attracted researchers because they have little impact on the reconstructed frames’ quality and the coefficient’s statistical properties. In other words, the statistical properties of video frames’ spatial/frequency coefficients will not significantly change after embedding. Thus, unlike steganographic schemes that use direct spatial/frequency coefficients, MV-based steganography is harder to spot [24]. However, existing MV-based steganographic approaches that deal with H.264 video codec fail to satisfy real-time constraints imposed by several emerging new applications such as live streaming or conferencing. As we indicate shortly, the embedding process in these approaches either constitutes complex calculations or requires the entire frame or group of pictures (GOP) to be presented before embedding. According to [25], the MV-based steganographic approaches can be classified into three generations.

The first-generation MV-based steganographic approaches share a basic idea of choosing CMVs for modification based on predetermined selection rules, such that when modified by predetermined modification methods (such as LSB), it will not add much distortion to the compressed video. For example, the approach in [23] mark fast MVs as the CMVs based on the assumption that fast MVs (with magnitude larger than a threshold) are more likely to be erroneous and their associated macro-block prediction error (MBPE) is expected to be large and thus, modifying the CMVs, in this case, will not add much distortion. The selection of CMVs based on their magnitude allows both encoder and decoder to agree on a fixed threshold. Therefore, the approach constitutes simple calculations for embedding, which can be implemented in real-time. However, as indicated in [18], the relation between the magnitude of the MV and its associated MBPE is not always true. A more accurate distortion-related selection criterion was proposed in [18], in which the CMVs are selected if their associated MBPE (measured by PSNR [26]) is higher than a threshold. The embedding stage alters the CMVs to embed the secret message. Due to the lossy compression of the MBPE, the lossy compression decompression step is performed at the encoder to determine if the MBPE is still larger than the threshold at the decoder. Suppose the associated MBPE of an altered CMV in a frame becomes below the threshold (after lossy compression-decompression). In that case, the embedding stage decreases the threshold and starts over for the whole frame. Thus, this approach constitutes complex calculations associated with the iterative search for the threshold, which limits the application of this approach for real-time steganography. The primary disadvantage of these first-generation MV-based steganographic approaches is the selection criteria and the failure to maintain the statistical properties of MVs. Specifically, the original MVs should be coherent, i.e., the neighboring MVs tended to have the same magnitude and direction. However, due to the changes introduced by the embedding, the MVs become not coherent and are susceptible to some primitive targeted steganalysis techniques [27, 28]. For example, in [27], the embedding procedure is modeled as an additive operation of the cover video signal with an independent noise signal added to the XY components of the MVs. The statistical evaluation of the spatial and temporal correlations between MVs can reveal the presence of hidden data. Another example in [28] in which the relation between the received and the recompressed (decompress and compress again) video streams are used to create a 15-dimensional feature set called motion vector reversion-based for steganalysis to indicate the presence of hidden messages.

The second-generation MV-based steganographic methods utilize another framework based on minimizing embedding distortion. As the steganographic undetectability and the embedding rate (bits per cover element) are two contradicting objectives for any steganographic scheme, Wet Paper Codes [29] and Syndrom-trellis-Codes [30] (STC) provide a theoretically-proven optimization framework mitigating this conflict. Accordingly, [31, 32] utilizes the frameworks in [29, 30], respectively for minimizing the overall embedding distortion. However, since the frameworks in [29, 30] require the whole cover to be present first in order to do its optimization calculations or at least a large portion of the cover (typically $10^6$ of cover elements as described in [30]), these frameworks are not suitable for real-time MV-based steganographic applications. Based on the concept that the ME algorithms choose the locally optimal MV, i.e., corresponds to the least cost among all neighbors, any small change of any MV will violate this concept with a high probability. Authors in [33] used the SAD (Sum of Absolute Differences) distortion measure for determining the probability of MVs being locally optimal and proposed the Add-or-Subtract-One steganalysis technique with a 9-dimensions feature set.

The third-generation MV-based steganographic schemes utilize two basic ideas to compromise the Add-or-Subtract-One technique by attacking the SAD-based distortion features. The first idea is based on the fact that the SAD-based features are affected by lossy compression like quantization [34]. Accordingly, if the steganography process chooses a non-locally optimal MV, the resultant distortion to the DCT components after the lossy quantization step will not (with a high probability) preserve the evidence of the non-locally optimal choice of the stego MV [34]. Thus, after the inverse quantization process^{Footnote 1}, the reconstructed 8-neighbors around the stego MV at the decoder side will have almost the same SAD-based cost. Hence, the local optimality feature will be preserved with a high probability. The second idea utilizes the STC technique to minimize the local-optimality-based-SAD distortion further [34, 35]. Thus, the steganographic schemes in [34, 35] could overcome the Add-or-Subtract-One steganalyzer.

Nonetheless, a steganalysis scheme called Near-Perfect Estimation for Local Optimality (NPELO) is proposed in [25] to avoid SAD shortage by utilizing an additional distortion measure, the SATD (Sum of Absolute Transformed Differences), which neutralizes the effect of the lossy compression as SATD takes the quantization step into account. The reader can find the details about the difference between SAD and SATD here [36]. Accordingly, the NPELO method [25] provided a 36-dimensions feature set (18 for SAD and 18 for SATD) to overcome SAD shortage to attack the steganographic schemes [34, 35]. An enhanced steganographic technique in [37] is proposed to overcome the detectability by the NPELO method [25] by considering SATD features for designing the cost function. Combining STC with their cost function, the scheme in [37] increases the security against the NPELO method.

The third-generation steganographic schemes have a major drawback. These schemes did not consider a new feature called Motion Vector Consistency (MVC) [38]. The MVC feature is based on the observation that, for any video codec, the MVs of the sub-MBs in the same MB are typically different with high probability. Hence, an additional detection feature should be considered besides the local optimality and coherency when designing secure steganographic techniques. An enhanced steganographic scheme in [39] is proposed to overcome both local optimality and the MVC features. However, as it uses STC, which requires the cover length to be more than $10^6$ elements (MVs in our case) for reliable coding performance [30], it is unsuitable for real-time applications.

This paper proposes an enhanced third-generation MV-based video steganography approach designed to attack the steganalysis built upon the local optimality, MVs coherency, and MVs consistency features in addition to performing the embedding/extraction in real-time. The proposed technique achieves real-time performance by operating on a per macro-block (MB) basis, eliminating the need to wait for the entire frame or GOP and avoiding additional re-coding steps. Furthermore, the alteration of motion vectors (MVs) for embedding occurs during the motion estimation (ME) sub-pixel-refinement stage by employing a rule-based strategy to ensure compatibility for embedding in each MB. The compatibility is verified concerning the local optimality, coherency, and consistency of MVs. Moreover, the proposed technique achieves a relatively higher embedding rate compared with other steganographic schemes that attack the same steganalysis features.

It is worth mentioning that a recent video steganography category has emerged due to the rapid development of deep learning, which is the generative adversarial networks (GAN) based steganography techniques. It becomes more attractive to researchers for application in both image [40] and video [41] domains. These approaches are end-to-end, which does not need to design or adaptively select features for information hiding manually. These approaches train the embedding and extracting process simultaneously while a third adversary network that plays a role in steganalysis is utilized. Despite the recent success of these approaches, they still need some crucial aspects. First, the adversary is trained for a specific steganalysis technique; thus, it is not easily generalized for other techniques. Second, due to the encoding process that usually happens for videos, these approaches still need to be more robust against video compression compared with the classical approaches discussed above that are perfectly designed to align with a video encoder pipeline. Therefore, our work in this paper enhances the discussed classical third-generation techniques.

The rest of the paper is organized as follows. The next section describes some essential preliminaries required for the rest of the paper. In Section 3, we present the proposed technique and its implementation in detail. Section 4 presents the experimental results of the proposed implementation. Finally, we conclude the paper and give some insights about the future work in Section 5.

We gather all acronyms in Table 1 for better readability of the paper.

Table 1 Acronyms used in the paper

Full size table

2 Preliminaries

2.1 Motion estimation for H.264 encoder

Figure 1 describes the general coding steps for H.264 video encoder [26]. The ME module estimates an MV for every MB in the current frame by searching for the best match for the MB corresponding to minimum SAD (sum of absolute differences) or SATD (Sum of Absolute Transformed Differences) in a search window in the previously reconstructed frame. Then, the motion compensation module (MC) uses the estimated MVs to reconstruct the predicted macro-block (P). Then, the prediction error $P_E$ is calculated by subtracting P from the current MB ($I_n$). The $P_E$ is then coded by transformation (T) (which is usually a linear transformation such as DCT), then quantized by the quantization stage (Q) and fed to the entropy coder (EntC) module.

The H.264 standard [4, 42, 43] introduces a sub-pixel motion estimation strategy as an optional quality-enhancement stage for video coding. After the ME completes its task, an additional refinement search is executed after performing sub-pixel (half or quarter pixel) interpolation. Figure 2 describes this procedure, where the cost, which can be SAD or SATD, is calculated for full-pixel (F-Pel), half-pixel (H-Pel), and quarter-pixel (Q-Pel) search setups. This cost is denoted by C_X, where X is the specific MV-search-setup. Then, the encoder decides according to C_X. Specifically, it chooses between F-Pel, H-Pel, and Q-Pel according to the minimum value of C_X. As shown in the figure, if C_H-Pel < C_F-Pel, the encoder sets the new search center as the H-Pel position. Otherwise, the search center position remains intact, and the Q-Pel search is performed around the selected search center position. If C_Q-Pel < C_H-Pel, the encoder uses the Q-Pel as a final result; otherwise, the final result will be the search center (H-Pel or F-Pel). As we discuss shortly, we exploit the above sub-pixel refinement stage in designing our steganography approach by introducing some modifications to this stage.

2.2 Coherency and consistency of MVs

The coherency of MVs means the tendency of the neighboring MVs of the MBs to have the same magnitude and direction, as shown in Fig. 3. In other words, MV-differences for neighboring MVs tend to be zero-mean distributed [37]. Several steganalysis approaches detect MV-based steganography by checking the coherency of the neighboring MVs. For example, the method in [27] utilizes 12 features based on the coherency of the MVs and built a Support-Vector-Machine (SVM) classifier to detect the MV-based steganography for MPEG2 videos [44].

On the other hand, the consistency of MVs is another concept that was introduced in [38] by observing that the H.264 encoder decides to either keep the larger block or divide it into smaller sub-MBs with different MVs after comparing the cost of the two cases and takes action with minimum cost. Thus, if the encoder decides to divide a larger block, the resultant neighboring sub-MBs will, with a high probability, have non-coherent MVs. Alternatively, suppose the MVs of the sub-MBs within the same larger block have the same MVs. In that case, evidence for a steganographic modification can be triggered, as the cost of using a single MV for one large block will be smaller than having the same MVs multiple times (for each sub-MB).

Thus, the concept of MV consistency indicates the non-coherent nature of the MVs of the sub-MBs in regular (non-modified) original videos, as shown in Fig. 3. Therefore, the MVs consistency can be used to detect MV-based steganography [38]. Both features, coherency, and consistency, can be used together for enhanced steganographic detection as coherency is applicable only for MBs of type $16\times 16$, and consistency is applicable only for the sub-MBs.

2.3 Local optimality

The ME algorithms choose the MV corresponding to the least cost among all neighbors because it corresponds to the least number of bits required to represent the MV and its corresponding error signal calculated by SAD or SATD. Thus, the MV should be locally optimal w.r.t its neighbors [33]. When any steganographic technique modifies the MVs for embedding, there is a high probability that the modified MVs become not locally optimal, which means that there exists another MV that corresponds to a smaller cost.

Based on the local optimality of the MVs, authors in [33] proposed an algorithm that calculates the SAD errors for the 8-neighbors surrounding each MV. If any MV of the eight neighbors achieves a SAD error less than the initially received MV, it will be treated as an indication for steganographic modification. The authors designed a 9-dimensions feature set based on the local optimality and trained an SVM for prediction. Their method was extended in [25] with a 36-dimensions feature set for better accuracy.

3 Proposed video steganography approach and its implementation

In this section, we present our proposed video steganography technique designed to attack the steganalysis built upon the detection of local optimality, MV coherency, and MVs consistency criteria in addition to performing the embedding/extraction in real time. The proposed technique is designed to satisfy the above constraints by operating per-MB in the ME sub-pixel-refinement stage without waiting for the whole frame, GOP, or performing any additional ME or re-coding step. For each MB, the proposed technique checks whether the MB is suitable for embedding through our designed MB compatibility criteria. The compatibility of the MB for being modified is carefully checked to ensure that the steganalysis methods built on the MV local optimality, coherency, and consistency features do not detect the embedding. In the following, we present the proposed technique followed by its implementation details.

3.1 Macro-blocks compatibility criteria

The proposed technique operates on an MB basis. For each MB, the proposed technique checks its compatibility for embedding. Then, the embedding is performed by modifying the MV of the MB according to the current message bit(s) denoted by Msg. The flowchart of this checking is depicted in Fig. 4. As shown in the figure, all $16\times 16$ MBs are unsuitable for embedding by the proposed technique. In other words, the proposed technique embeds the Msg by modifying the MV associated only with the sub-MBs. It does not introduce MV-related changes associated with the $16\times 16$ MBs. This makes the neighboring $16\times 16$ MB have their original MVs, and thus, the output video stream from our proposed embedding technique neutralizes the coherency features.

For all sub-MBs, we first check if the MV corresponding to the sub-MB for H-Pel is locally optimal. If so, the sub-pixel refinement process typically performs the Q-Pel search and examines its output. Suppose the MV corresponding to Q-Pel is locally optimal, and the value of the MV at this Q-Pel matches the Msg. In that case, the encoder uses the Q-Pel without any modification (indicated as Q-Pel-done-embd in the figure). This way, we make sure that the local optimality is preserved. On the other hand, if the value of the MV at this Q-Pel does not match the Msg, the encoder discards the Q-Pel search result and returns to the H-Pel or F-Pel even if the Q-Pel is locally optimal, preventing the Q-Pel motion vector from being transmitted. Thus, the encoder uses the H-Pel (or F-Pel) in this case. The MV related to this H-Pel (or F-Pel) is locally optimal with respect to the selected pixel resolution. As shown in the figure, Msg is embedded by altering an MV in a single case indicated by Q-Pel-Emb-mod(Msg) block. This happens when there is no locally optimal MV in the Q-Pel neighborhoods of the sub-MB. Thus, when any MV is selected from the neighborhood to be used to embed the Msg, this modification will not be detected by consistency, coherency, and local optimality-based steganalyzers.

The detailed operation of Q-Pel-Emb-mod(Msg) block in Fig. 4 is described in Fig. 5. Before presenting the detailed operation of the Q-Pel-Emb-mod(Msg) block, let us denote class-1 sub-MBs for the sub-MBs that result from dividing a larger MB into four parts with sizes $8\times 8$ or $4\times 4$ sub-MB types. Similarly, let us denote class-2 sub-MBs for the sub-MBs that result from dividing a larger MB into two parts with sizes $16\times 8$, $8\times 16$, $8\times 4$, or $4\times 8$ sub-MB types. Both class-1 and class-2 are illustrated in Fig. 6.

To avoid being detectable by the consistency features, we need to maintain the randomness of the MVs of the sub-MBs. To do so, we perform the embedding unconditionally in class-1 sub-MBs without worrying about the consistency of their MVs, while we embed only in the first sub-MB in class-2 sub-MBs. The reason is that the message data is usually modeled as a uniform random sequence^{Footnote 2} [45]. Thus, the probability $P_{c2}$ of embedding the same message symbol a specific number of consecutive times in class-2 sub-MBs is higher than the probability $P_{c1}$ of embedding in class-1 sub-MBs. Typically, $P_{c1} = P_{c2}^{3}$. Thus, in the case that each modified MV can hold a symbol of 2-bits of Msg, the probability of obtaining an identical symbol for all child sub-MBs (i.e., two consecutive times for class-2 and four consecutive times for class-1) will be 0.25 for class-2 and $0.25^3$ for class-1. Please see Table 2 for other cases of embedding bits of Msg. Please note that, in practical scenarios, the repeated symbols may be distributed across non-neighboring sub-MBs. Thus, the probabilities listed in Table 2 can be considered an upper limit.

When embedded in class-2 sub-MBs, we ensure that this first sub-MB’s MV differs from the second one’s MV by a simple check. If the MV of the second sub-MB matches the first one after the modification introduced by the embedding, we modify the second MV randomly. This check is essential to avoid making the modified MV of the second sub-MB for class-2 sub-MBs the same as the first modified one, thus making it prone to detection by the consistency features. This procedure is represented by the (KeepConsistency block).

After determining the compatibility of the sub-MB for embedding, the next step is to perform data embedding and extraction, which will be described in the following subsection.

Table 2 The probability of randomly getting the same embedding symbol for all sub-MBs of class-1 and class-2 sub-MBs for different embedding settings

Full size table

3.2 Embedding and extraction

As we indicated, the embedding is performed by modifying the MV associated with all class-1 sub-MBs and the first sub-MB in class-2 sub-MBs. The modification is performed when there is no locally optimal MV in the Q-Pel neighborhoods of the sub-MB, as indicated by Q-Pel-Emb-mod(Msg) block in Fig. 4. We came to the Q-Pel-Emb-mod(Msg) block either when the F-Pel or H-Pel is used as the refinement search center (indicated by the gray shaded rectangle in Fig. 4). The proposed technique modifies the MV according to (a) the Msg and (b) the refinement search center (either F-Pel or H-Pel). The modification is performed for F-Pel or H-Pel refinement search centers, as described in Fig. 8(a), (b), (c), and (d), respectively. As shown in the figure, the proposed embedding technique takes 3 bits as Msg and modifies the MV accordingly. For example, if the Msg is 001 and the refinement search center is F-Pel, then the MV is modified to point to the upper position. Another example is if the Msg is 111 and the refinement search center is one of the diagonal H-Pels, then the chosen position is the right one.

As illustrated in Fig. 8, the embedding diagram differs for the search centers of the F-Pel or any of the different H-Pel positions (horizontal, vertical, or diagonal). To explain this rationale, let us consider the location designated by the shaded Q in Fig. 7 as an example. Since the decoder only receives the MV pointing to the location marked by the shaded Q, this location can be interpreted at the decoder side with four different possibilities:

Upper right direction w.r.t the F-Pel.
Lower left direction w.r.t the upper right diagonal H-Pel.
Lower right direction w.r.t the upper vertical H-Pel.
Upper left direction w.r.t the right horizontal H-Pel.

This means that three H-Pel positions are considered for one F-pel, which results in decoder confusion. To remove the confusion at the decoder side and allow a uniquely decodable embedding, we have used different embedding diagrams for each search center to ensure that all these mapping possibilities are mapped to the same Msg. Thus, for the shaded Q in Fig. 7 to be mapped according to the mapping rules in Fig. 8, the mapping result (as shown in Fig. 9) indicates that the shaded Q can be mapped only to the symbol 010 for all previously described four possibilities. Hence, the position of the shaded Q in Fig. 7 is uniquely decodable to the symbol 010.

The decoding process is straightforward. For the received MV with MVx and MVy representing the X- and Y-components, respectively, the decoder calculates the modulo 4 for both MVx and MVy first, then directly extracts the Msg by using mod(MVx,4) and mod(MVy,4) as coordinates for Msg in the table in Fig. 10. For example, consider an MV with (27,35) components (after interpolation). The coordinates of the extracted message will be (27 mod 4, 35 mod 4), i.e., (3,3), and these coordinates will index the table in Fig. 10, and hence the message will be 000. Please note that the values denoted by X in the table in Fig. 10 represent the F-Pel and H-Pel search centers, which cannot be used for embedding.

The following section presents our embedding and extraction schemes using an open-source implementation (OpenH264 [46]) for the H.264 codec.

3.3 Implementation of the proposed video steganography within OpenH264

We have implemented the proposed video technique using the OpenH264 [46] software encoder implementation for H.264. OpenH264 encoder achieves real-time performance because it exploits the SIMD^{Footnote 3} instruction set for X86 and ARM architectures. However, the implementation has different combinations of optimizations and modifications, such as:

1.
The ME algorithm used is the Diamond search algorithm [47].
2.
The encoder does not support Bi-directional frames (only P-frames are supported).
3.
For the ME sub-pixel-refinement stage, only 4-positions are checked. These positions are marked as (F, H, Q) in Fig. 11. In other words, the OpenH264 ME sub-pixel-refinement stage checks only 4-positions (up, down, right, and left positions with no diagonal position checked).

As OpenH264 uses only four search positions in both H-Pel and Q-Pel, we had to adopt the generalized proposed technique in the previous subsections to be implemented inside the OpenH264 encoder. Accordingly, the modified embedding technique embeds 2 bits per MV instead of 3 bits. Thus, only two mapping diagrams are utilized instead of four, as described in Fig. 12. The same process is performed on the decoder side as the generalized proposed technique, except the extraction table is replaced by the table in Fig. 13.

4 Experimental results

This section compares the proposed technique with the MVMPLO [35] approach. The comparison is focused on the detectability of local optimality using the NPELO technique [25], coherency [27], and MV-consistency [38] based detectors for both approaches. The MVMPLO approach was chosen for comparison among other steganographic methods. It achieves the best security performance against local optimality and consistency analyzers according to [25] and [38]. Additionally, we provide another comparison with the methods proposed in [39].

This section is organized as follows. First, we present our experimental setup, including the video dataset, the steganalyzers used in our experiments, and the metrics used to measure the performance. Then, we present our experimental comparison with the MVMPLO [35] method. Finally, we present our experimental comparison with the methods in [39].

4.1 Experimental setup

The video dataset used in our comparison contains 44 videos from [48, 49]. The dataset contains 5, 11, and 28 videos with 1080p, 720p, and cif ($352\times 288$) resolutions, respectively. The dataset has a large diversity of motion dynamics. To show this, we compute the Motion-Activity-Indices [50] (MAI) for each video in the dataset. The MAI is a number between 1 and 5 that describes the dynamics of the video, where 1 implies very low dynamics while 5 implies very high dynamics videos. We show the summary of our dataset in Table 3.

The proposed technique is implemented and integrated into the OpenH264 [46] real-time video encoder. We have utilized the MVMPLO implementation in [38] with the x264 [51] video encoder.

All used steganalysis methods first extract some features from the videos and then employ these features to build a classifier that classifies the video as cover (original) or stego (has hidden data embedded). For the local optimality-based steganalyzer in [25], we utilized the feature-extraction tool [52] implemented by the authors. For the consistency-based steganalyzer in [38], we also used the authors’ feature-extraction tool that produces 12 features. The coherency-based steganalyzer method [27] was implemented by us as there is no implementation available from the authors.

Table 3 Our video dataset

Full size table

After feature extraction of each steganalysis method, we used the SVM tool in Matlab-2020a© to build our SVM classifier for each method. We used a 5-fold cross-validation procedure to get the best kernels of the SVM with its best parameters. We found the fine Gaussian kernel with a kernel-scale parameter of 1.5 for the local optimality-based steganalyzer [25] achieves the best accuracy results. Also, the Gaussian kernel with a kernel-scale parameter of 0.87 achieves the best accuracy results for the consistency-based steganalyzer [38] and the coherency-based steganalyzer [27]. As recommended by each steganalyzer method, the dataset is fed to the steganalyzers divided into samples of GOPs: 12 frames for local optimality-based and coherency-based steganalyzers and six frames for consistency-based steganalyzer per each GOP.

Please note that the coherency-based steganalyzer [27] recommends SVM with linear kernel, but the original method was designed for MPEG2, which does not support sub-MB and skipped 16 $\times $ 16 MB, unlike H264 we used here. We experimentally found that using the Gaussian kernel achieves better performance than the linear kernel. So, this kernel change comes in favor of the method’s performance.

As indicated, the proposed steganography scheme is implemented within the OpenH264 encoder, but the MVMPLO uses the x264 encoder. However, the two implementations are for the same H.264 standard, and when fed by the same settings, both implementations should produce the same encoded videos. Thus, to ensure a fair comparison, we have performed the following:

1.
We applied the same H.264 configuration settings for all encoders in all experiments and used the same video dataset.
2.
We embed the same number of bits for both MVMPLO and the proposed techniques in our comparison against the steganalyzers (see Fig. 14 below).
3.
We train two versions of each steganalyzer method separately, one for each steganography technique. For each steganography scheme (MVMPLO and the proposed one) and steganalyzer method, we used the exact configurations of the video encoders to get the output videos (without embedding). Then, we apply each steganography scheme to get the stego videos. We keep the training settings fixed in the two versions. Thus, we obtain six steganalyzers in total (two steganography techniques and three steganalyzers).
4.
In our comparisons, as we indicate shortly, we have used a relative performance metric (for PSNR and bit rate) to measure the difference in metrics between each technique’s original and stego versions.

A final notice is that the MVMPLO approach uses the STC framework to select the suitable MVs for modification [30] that minimize a cost function, comprises the embedding distortion, and take care of the local optimality features, and hence achieve minimal detectability w.r.t. the local optimality steganalyzer. The STC requires processing more than $10^6$ MVs to obtain reliable performance. Therefore, in our experiment, all MVs of each video are extracted first and then processed by STC at once to maximize its security against the steganalyzers^{Footnote 4} and achieve the recommended reliable STC performance. For both steganography schemes, we set the quantization parameter^{Footnote 5} (QP) to 25.

4.2 Comparison against the MVMPLO technique

This section compares the proposed technique and the MVMPLO scheme regarding several metrics, which include the number of embedded bits per video, the detection performance of the several steganalyzers used, the reduction in PSNR due to the embedding process, and the running time overhead of each scheme.

First, as our proposed technique uses rule-based MV selection, the number of embedded bits depends mainly on the video contents. In other words, we can not pre-determine the number of embedded bits in our scheme (more on this is explained in the Section 4.4), unlike MVMPLO, which can have a pre-determined number of embedded bits as it can be pre-configured for a particular distortion. Thus, to provide a fair comparison between the two schemes, we had to run the proposed technique and calculate the number of embedded bits first, then try to embed the same number of bits per video with MVMPLO. Figure 14 compares the number of embedded bits for the proposed technique and the bits of the MVMPLO scheme. The figure shows that both schemes have almost the same number of embedded bits per video.

$$\begin{aligned} \textrm{Recall} = \frac{T_P}{T_P + F_N}. \end{aligned}$$

(1)

$$\begin{aligned} \textrm{Prec} = \frac{T_P}{T_P + F_P}. \end{aligned}$$

(2)

$$\begin{aligned} \mathrm {F_1} = 2\times \frac{\textrm{Recall} \times \textrm{Prec}}{\textrm{Recall} + \textrm{Prec}}. \end{aligned}$$

(3)

$$\begin{aligned} \mathrm {A_{CC}} = \frac{T_P + T_N}{T_P + T_N + F_P + F_N}. \end{aligned}$$

(4)

Second, for each steganalyzer, we treat the extracted features from the original cover videos as negative class samples. In contrast, the positive class samples comprise the features of the stego videos. We compute the recall, precision, F1-score, and accuracy according to (1), (2), (3), and (4), respectively, shown below. We summarize these values in Table 4. The table shows that the proposed technique achieves almost the same security performance as the MVMPLO approach in attacking the NPELO technique while greatly outperforming the MVMPLO approach in attacking the consistency-based steganalyzer [38]. Additionally, although both steganographic schemes neutralize the coherency steganalyzer [27], the proposed technique achieves better results as it preserves the $16\times 16$ MBs intact, unlike the MVMPLO approach.

Table 4 Results of steganalyzers against MVMPLO and the proposed technique

Full size table

Both steganographic schemes achieve a relatively high-security performance in attacking the coherency steganalyzer due to a particular MB type in the H264 encoder called skipped 16 $\times $ 16 MB. The H264 standard utilizes the skipped 16 $\times $ 16 MB type to achieve higher compression performance by excluding the coherent-neighboured-MBs from the output bit stream. Instead, the decoder estimates these MBs using the surrounding MBs. Accordingly, this feature reduces the overall video size. However, this poses a significant challenge for MV-coherency steganalyzers when applied against steganographic methods implemented within the H264 encoder that retain the motion vectors (MVs) associated with the skipped 16 $\times $ 16 MBs and operate only on the non-skipped 16 $\times $ 16 MBs. The reason is that the MVs related to the non-skipped 16 $\times $ 16 MBs represent the average incoherent ratio of MVs that describes the typical irregular portion of the motion field within the video frame. Hence, it will highly reduce the detection performance of the MV-coherency-steganalysis like [27], as described in Table 4.

Also, it is worth mentioning that the proposed technique performs slightly worse than MVMPLO in attacking the NPELO technique. The reason for this performance can be justified by Fig. 4, which shows the flowchart for MB compatibility checking for embedding. The proposed technique may violate the local-optimality constraint when the value of the MV at a certain Q-Pel does not match the Msg, where the encoder discards the Q-Pel search result and returns to the H-Pel or F-Pel, which with some portability may not be locally optimal. However, in practice, this results in a small performance drop, as we showed in Table 4.

Third, we compare the proposed technique with the MVMPLO regarding the embedding effect on both the bit rate and PSNR. We demonstrate PSNR and bit rate variations between the cover and stego videos for our setup using the $\Delta _{100}$ score described as

$$\begin{aligned} \Delta _{100} (x,y) = 100\times \frac{x-y}{y}. \end{aligned}$$

(5)

Using (5), we substitute y with the original cover value (PSNR or bit rate) and x with the corresponding stego value. We present this comparison in Figs. 15 and 16 for bit rate and PSNR, respectively. The Figures show that the effect of the proposed technique on PSNR and bit rate is almost the same as MVMPLO.

Finally, we compare the time overhead of the proposed and the MVMPLO schemes. The proposed technique adds a little overhead ($1-2\%$) beyond the encoder running time. This indicates that the proposed technique preserves the real-time performance of the encoder. In contrast, the MVMPLO scheme uses STC coding to minimize the embedding distortion. Using STC reliably requires processing more than $10^6$ MVs to select the suitable MVs for modification [30]. Hence, it is required to re-calculate the new residuals corresponding to the new MVs; it cannot achieve the real-time constraint. In our experiment, all MVs^{Footnote 6} of each video are extracted and processed by STC-encoder to enhance its security, achieving a fair comparison with the proposed technique.

4.3 Additional comparative analysis

Here, we extend our experimental comparison by cloning the experiments’ setup in [39]. Specifically, we used the same dataset and H.264 coding parameters in [39] and applied the cloned setup for the proposed technique, which allows us to use the results provided by [39] to introduce further comprehensive comparison between the proposed technique and recent steganographic techniques in the literature.

As the real-time design of the proposed technique does not allow predefined multiple embedding rates (as we indicated in the previous subsection), we have chosen specific entries of Table 1 and Fig. 3 in [39] that match the obtained score of the average bpnsmv (Bits Per Non-Skipped Motion Vector) of our proposed technique.

Table 5 provides a performance comparison between the proposed technique and five techniques listed in [39]. The methods Tar1, Tar2, and Tar3 in the table refer to [18], MVMPLO, and [32], respectively. Also, dMVC and dMVC+LO refer to the technique proposed in [39]. Additionally, the scores of the steganalyzers (NPELO [25] and MVC [38]) are represented in terms of the minimum average prediction error (as described in [39]), not the accuracy. Finally, $\Delta Bitrate$ represents the percentage change of bit rate due to embedding. It can be concluded from Table 5 that the proposed technique achieves acceptable security and performance margins while solely preserving the real-time constraints. It should be noted that the coding performance (PSNR and $\Delta Bitrate$) of OpenH264 is slightly lower than JM 19.0 [54] (utilized in [39]) due to the optimizations described in Section 3.3.

Table 5 Comparison between the proposed technique against other techniques using Table 1 and Fig. 3 in [39]

Full size table

4.4 Discussion

Generally speaking, two challenges are associated with MV-based video steganography for achieving real-time performance. First, selecting the best MVs to be modified is performed through a cost function that typically requires the whole frame or GOP, as in [31, 32, 34, 35, 39, 55]. Second, as MV-based video steganography techniques modify the MVs to insert the secret data, the MVs consequently require re-encoding to calculate the new DCT residuals. Otherwise, it will severely affect the quality of the resulting videos. Thus, performing MV-based video steganography in real-time is challenging. To our knowledge, the only available solution to achieve real-time performance with MV-based video steganography is the scheme in [15]. The scheme exploits the partition mode and establishes a mapping rule between message bits and the PMs that allows the modification of the PMs according to the message bits. Accordingly, the scheme forces the choice of the partition mode modes during ME process of selected frames according to the message bits. A frame is selected for this modification only if it contains a scene change. However, the modifications performed by this scheme are easily detected as shown in [38], by the consistency steganalyzer in [38] even for a very low embedding capacity (detection accuracy of $93.83\%$ for 0.05 bits per MB). Although the proposed technique bears some similarity with the scheme in [15] in establishing a mapping rule between message bits and the modifications performed to the MVs, our scheme shows excellent performance against the consistency steganalyzer, thanks to our carefully designed compatibility criteria.

Achieving real-time constraints with the proposed technique is realized but with a sacrifice: the proposed technique can not pre-determine the number of embedded bits. For the proposed technique to meet the real-time constraint, it performs the embedding on the fly, i.e., without waiting for the whole video or a large number of frames to be available. Specifically, the proposed technique is designed to operate per-MB in the ME sub-pixel-refinement stage without waiting for the whole frame, GOP, or performing any additional ME or re-coding step. Therefore, we can not set a specific number of bits to be embedded in advance. Comparatively, other techniques in literature can pre-determine the number of embedded bits as they either employ an optimization strategy to select the best modifications in several frames or GOP to meet this pre-determined number of bits or re-perform the ME after the modifications. Despite the commitment to embed a certain number of bits, these techniques can not perform the aforementioned embedding strategies in real-time due to the waiting time for several frames or the GOP.

Another point that we want to highlight is that the proposed technique keeps the encoder’s most resources-consuming-processes intact. Specifically, according to [56], ME and interpolation (for sub-pixel) consumes about $68.7\%$ and $18.6\%$ respectively of the whole encoder resources with total $87.3\%$ for the complete ME process. Although the technique approach operates within ME and the sub-pixel refinement stage, the embedding process affects only the last decision. No interaction with SAD, SATD, or interpolation filters is performed. That is why only $1-2\%$ resources overhead is achieved.

Finally, the proposed technique does not allow embedding in non-optimal MVs as no error-coding scheme is employed on the MB basis like the STC framework. Thus it is more restrictive than MVMPLO. However, the proposed technique can embed up to 3 bits per MV, thus relaxing its restrictive nature. As we show in Section 4.2, even for the case of embedding 2 bits per MV, the proposed appraoch has a comparable embedding capacity with the MVMPLO approach, which is expected to increase when we embed 3 bits per MV instead of 2.

5 Conclusion

In this paper, a new approach for real-time video steganography has been introduced. The proposed technique is integrated smoothly into the H.264 standard and achieves an outstanding performance against state-of-the-art steganalysis techniques while maintaining real-time encoding performance constraints. The proposed technique embeds the secret message by altering the motion vectors (MVs) while preserving their local optimality, coherency, and consistency to withstand the recently emerged steganalysis methods. The proposed technique is implemented within the OpenH264 encoder, and the experimental results demonstrate that the proposed technique offers excellent performance in attacking local optimality, coherency, and consistency steganalyzers, compared with state-of-the-art techniques. Specifically, the proposed technique significantly reduces the performance of the associated classifiers of the steganalyzers regarding precision, recall, and accuracy. Additionally, the proposed technique performs the embedding in real time and only adds a little overhead ($1-2\%$) beyond the encoder running time.

In the future, we will consider an important practical application of the proposed technique. Specifically, we intend to implement the proposed technique as an extension of WebRTC [57]. This can add real-time steganography capabilities within the video-communication solutions that can work on top of open web standards.

Code Availability

The code is available at: https://github.com/HassanMohamedGit/OpenH264-RealTime-steg

Notes

See the Q block and the $Q^{-1}$ block in Fig. 1.
Message data is mostly compressed and/or encrypted.
Single-Instruction-Multiple-Data.
As processing all MVs of the videos at once allows STC to check the largest available pool of MVs, which may be $>10^6$.
This value has been chosen as the subjective sane range of QP according to FFMPEG documentation in [53] is between $17-28$ and all experiments in [38] had been performed with QP of 15 and 25.
Except the skipped MVs related to MBs of type 16$\times $16.

References

Jenifer JM, Ratna SR, Loret JS, Gethsy DM (2018) A survey on different video steganography techniques. In: 2018 2nd International conference on trends in electronics and informatics (ICOEI), pp 627–632. https://doi.org/10.1109/ICOEI.2018.8553847
Tew Y, Wong K (2014) An overview of information hiding in H.264/AVC compressed video. IEEE Trans Circuits Syst Video Technol 24(2):305–319. https://doi.org/10.1109/TCSVT.2013.2276710
Korgaonkar VV, Gaonkar MN (2017) A DWT-DCT combined approach for video steganography. In: 2017 2nd IEEE International conference on recent trends in electronics, information communication technology (RTEICT) pp 421–424. https://doi.org/10.1109/RTEICT.2017.8256631
ITU-T (2013) Advanced video coding for generic audiovisual services, vol E38445
Barman N, Martini MG (2022) User generated hdr gaming video streaming: dataset, codec comparison, and challenges. IEEE Trans Circuits Syst Video Technol 32(3):1236–1249. https://doi.org/10.1109/TCSVT.2021.3077384
Article Google Scholar
ITU-T (2016) High efficiency video coding, vol E41298
Jiao S, Luan L, Qu H, Zhang M (2019) Improvement of fast intra prediction mode selection algorithm for h264. In: 2019 IEEE 19th International conference on communication technology (ICCT), pp 510–513. https://doi.org/10.1109/ICCT46805.2019.8947097
Google LLC: Live encoder settings, bitrates, and resolutions. https://support.google.com/youtube/answer/2853702. Accessed: 2023-12-27
Suresh M, Shatheesh Sam I (2018) High secure video steganography based on shuffling of data on least significant dct coefficients. In: 2018 Second international conference on intelligent computing and control systems (ICICCS) pp 877–882. https://doi.org/10.1109/ICCONS.2018.8662920
Nan X, Pei Z, Li Z (2015) A steganography algorithm based on ${\pm }$1 dct coefficients for H.264/AVC. In: 2015 Ninth international conference on frontier of computer science and technology, pp 259–263. https://doi.org/10.1109/FCST.2015.65
Mstafa RJ, Elleithy KM (2016) A novel video steganography algorithm in dct domain based on hamming and bch codes. In: 2016 IEEE 37th sarnoff symposium, pp 208–213. https://doi.org/10.1109/SARNOF.2016.7846757
Mstafa RJ, Elleithy KM, Abdelfattah E (2017) A robust and secure video steganography method in dwt-dct domains based on multiple object tracking and ecc. IEEE Access 5:5354–5365
Google Scholar
Wong K, Tanaka K, Takagi K, Nakajima Y (2009) Complete video quality-preserving data hiding. IEEE Trans Circuits Syst Video Technol 19(10):1499–1512. https://doi.org/10.1109/TCSVT.2009.2022781
Article Google Scholar
Kapotas SK, Varsaki EE, Skodras AN (2007) Data hiding in h. 264 encoded video sequences. In: 2007 IEEE 9th Workshop on multimedia signal processing, pp 373–376
Kapotas SK, Skodras AN (2008) A new data hiding scheme for scene change detection in H.264 encoded video sequences. In: 2008 IEEE international conference on multimedia and Expo, pp 277–280. https://doi.org/10.1109/ICME.2008.4607425
Ke N, Weidong Z (2013) A video steganography scheme based on H.264 bitstreams replaced. In: 2013 IEEE 4th International conference on software engineering and service science, pp 447–450. https://doi.org/10.1109/ICSESS.2013.6615345
Liao K, Ye D, Lian S, Guo Z, Wang J (2009) Lightweight information hiding in H.264/AVC video stream. In: 2009 International conference on multimedia information networking and security, vol 1, pp 578–582. https://doi.org/10.1109/MINES.2009.205
Aly HA (2011) Data hiding in motion vectors of compressed video based on their associated prediction error. IEEE Trans Inf Forensics Secur 6(1):14–18. https://doi.org/10.1109/TIFS.2010.2090520
Article Google Scholar
Zhang J, Li J, Zhang L (2001) Video watermark technique in motion vector. In: Proceedings XIV Brazilian symposium on computer graphics and image processing, pp 179–182. https://doi.org/10.1109/SIBGRAPI.2001.963053
Cao Y, Zhang H, Zhao X, Yu H (2015) Covert communication by compressed videos exploiting the uncertainty of motion estimation. IEEE Commun Lett 19(2):203–206. https://doi.org/10.1109/LCOMM.2014.2387160
Article Google Scholar
Rezagholipour K, Eshghi M (2016) Video steganography algorithm based on motion vector of moving object. In: 2016 Eighth international conference on information and knowledge technology (IKT) pp 183–187. https://doi.org/10.1109/IKT.2016.7777764
Zhu H, Wang R, Xu D (2010) Information hiding algorithm for H.264 based on the motion estimation of quarter-pixel. In: 2010 2nd International conference on future computer and communication, vol 1, pp 1–4231427. https://doi.org/10.1109/ICFCC.2010.5497754
Xu C, Ping X, Zhang T (2006) Steganography in compressed video stream. In: First international conference on innovative computing, information and control - Volume I (ICICIC’06) vol 1, pp 269–272. https://doi.org/10.1109/ICICIC.2006.158
Bouzegza M, Belatreche A, Bouridane A, Tounsi M (2022) A comprehensive review of video steganalysis. IET Image Process 16(13):3407–3425. https://doi.org/10.1049/ipr2.12573
Article Google Scholar
Zhang H, Cao Y, Zhao X (2017) A steganalytic approach to detect motion vector modification using near-perfect estimation for local optimality. IEEE Trans Inf Forensics Secur 12(2):465–478. https://doi.org/10.1109/TIFS.2016.2623587
Article Google Scholar
Bovik AC (2009) The essential guide to video processing, $1^{st}$ edn
Su Y, Zhang C, Zhang C (2011) A video steganalytic algorithm against motion-vector-based steganography. Signal Process 91(8):1901–1909. https://doi.org/10.1016/j.sigpro.2011.02.012
Article Google Scholar
Cao Y, Zhao X, Feng D (2012) Video steganalysis exploiting motion vector reversion-based features. IEEE Signal Process Lett 19(1):35–38. https://doi.org/10.1109/LSP.2011.2176116
Article Google Scholar
Fridrich J, Goljan M, Lisonek P, Soukal D (2005) Writing on wet paper. IEEE Trans Signal Process 53(10):3923–3935. https://doi.org/10.1109/TSP.2005.855393
Article MathSciNet Google Scholar
Filler T, Judas J, Fridrich J (2011) Minimizing additive distortion in steganography using syndrome-trellis codes. IEEE Trans Inf Forensics Secur 6(3):920–935. https://doi.org/10.1109/TIFS.2011.2134094
Article Google Scholar
Cao Y, Zhao X, Feng D, Sheng R (2011) Video steganography with perturbed motion estimation. In: Filler T, Pevný T, Craver S, Ker A (eds) Information Hiding, pp 193–207. Springer
Yao Y, Zhang W, Yu N, Zhao X (2015) Defining embedding distortion for motion vector-based video steganography. Multimed Tools Appl 74(24):11163–11186. https://doi.org/10.1007/s11042-014-2223-8
Article Google Scholar
Wang K, Zhao H, Wang H (2014) Video steganalysis against motion vector-based steganography by adding or subtracting one motion vector value. IEEE Trans Inf Forensics Secur 9(5):741–751. https://doi.org/10.1109/TIFS.2014.2308633
Article Google Scholar
Cao Y, Zhang H, Zhao X, Yu H (2015) Video steganography based on optimized motion estimation perturbation. In: Proceedings of the 3rd ACM workshop on information hiding and multimedia security. IH &MMSec ’15, pp 25–31. Association for Computing Machinery. https://doi.org/10.1145/2756601.2756609
Zhang H, Cao Y, Zhao X (2016) Motion vector-based video steganography with preserved local optimality. Multimed Tools Appl 75(21):13503–13519. https://doi.org/10.1007/s11042-015-2743-x
Article Google Scholar
Abdelazima A, Varleya M, Ait-Boudaoudb D (2010) Effect of the hadamard transform on motion estimation of different layers in video coding. International Archives of Photogrammetry, Remote Sensing and Spatial Information Sciences 38(Part 5)
Zhu B, Ni J (2018) Uniform embedding for efficient steganography of H.264 video. In: 2018 25th IEEE International conference on image processing (ICIP) pp 1678–1682. https://doi.org/10.1109/ICIP.2018.8451214
Zhai L, Wang L, Ren Y (2020) Universal detection of video steganography in multiple domains based on the consistency of motion vectors. IEEE Trans Inf Forensics Secur 15:1762–1777. https://doi.org/10.1109/TIFS.2019.2949428
Article Google Scholar
Liu Y, Ni J, Zhang W, Huang J (2022) A novel video steganographic scheme incorporating the consistency degree of motion vectors. IEEE Trans Circuits Syst Video Technol 32(7):4905–4910. https://doi.org/10.1109/TCSVT.2021.3135384
Article Google Scholar
Tan J, Liao X, Liu J, Cao Y, Jiang H (2022) Channel attention image steganography with generative adversarial networks. IEEE Trans Netw Sci Eng 9(2):888–903. https://doi.org/10.1109/TNSE.2021.3139671
Article Google Scholar
Chai H, Li Z, Li F, Zhang Z (2022) An end-to-end video steganography network based on a coding unit mask. Electronics 11(7). https://doi.org/10.3390/electronics11071142
Richardson IE (2010) The H.264 advanced video compression standard, 2$^{nd}$ edn
Wiegand T, Sullivan GJ, Bjontegaard G, Luthra A (2003) Overview of the H.264/AVC video coding standard. IEEE Trans Circuits Syst Video Technol 13(7):560–576. https://doi.org/10.1109/TCSVT.2003.815165
ITU-T (2002) Information technology - generic coding of moving pictures and associated audio information: video, vol E38569
Westfeld A (2001) F5–a steganographic algorithm. In: Moskowitz IS (ed) information hiding. Springer, pp 289–302
Cisco: OpenH264 encoder/decoder for H.264 codec. https://www.openh264.org. Accessed 18 Dec 2023
Zhu S, Ma K-K (2000) A new diamond search algorithm for fast block-matching motion estimation. IEEE Trans Image Process 9(2):287–290. https://doi.org/10.1109/83.821744
Article Google Scholar
Xiph.org: Video Test Media [derf’s collection]. https://media.xiph.org/video/derf/
pexels.com: Free Stock Photos, Royalty Free Stock Images Copyright Free Pictures. https://www.pexels.com
Jeannin S, Divakaran A (2001) Mpeg-7 visual motion descriptors. IEEE Trans Circuits Syst Video Technol 11(6):720–724. https://doi.org/10.1109/76.927428
Article Google Scholar
VideoLAN: H264, an implementation for H.264/AVC encoder. https://www.videolan.org/developers/x264.html. Accessed 17 Dec 2023
Zhang H: Feature-Extractors-for-Video-Steganalysis. https://github.com/zhanghong863/Feature-Extractors-for-Video-Steganalysis. Accessed 21 Dec 2023
ffmpeg: H.264 Video Encoding Guide. https://trac.ffmpeg.org/wiki/Encode/H.264. Accessed 17 Dec 2023
JM encoder/decoder for H264 codec. https://iphome.hhi.de/suehring/tml/doc/lenc/html/index.html. Accessed 17 Dec 2023
Figueira G, Barradas D, Santos N (2022) Stegozoa: Enhancing webrtc covert channels with video steganography for internet censorship circumvention. In: Proceedings of the 2022 ACM on Asia conference on computer and communications security. ASIA CCS ’22, pp. 1154–1167. Association for Computing Machinery. https://doi.org/10.1145/3488932.3517419
Shengfa Y, Zhenping C, Zhaowen Z (2006) Instruction-level optimization of H.264 encoder using SIMD instructions. In: 2006 International conference on communications, circuits and systems, vol 1, pp 126–129. https://doi.org/10.1109/ICCCAS.2006.284601
WebRTC. https://webrtc.org. Accessed 20 Dec 2023

Download references

Acknowledgements

We thank Dr. Liming Zhai, the author of [38], for making available the implementations of MVMPLO and MVC12 used in this research.

Funding

Open access funding provided by The Science, Technology & Innovation Funding Authority (STDF) in cooperation with The Egyptian Knowledge Bank (EKB).

Author information

Authors and Affiliations

Computer Engineering, Military Technical College, Cairo, 11766, Egypt
Hassan Mohamed, Ahmed Elliethy, Amr Abdelaziz & Hussein Aly

Authors

Hassan Mohamed
View author publications
You can also search for this author in PubMed Google Scholar
Ahmed Elliethy
View author publications
You can also search for this author in PubMed Google Scholar
Amr Abdelaziz
View author publications
You can also search for this author in PubMed Google Scholar
Hussein Aly
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ahmed Elliethy.

Ethics declarations

Competing of interest

The author declares no competing interests

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Mohamed, H., Elliethy, A., Abdelaziz, A. et al. Real-time motion estimation based video steganography with preserved consistency and local optimality. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-18651-9

Download citation

Received: 15 September 2023
Revised: 29 December 2023
Accepted: 09 February 2024
Published: 27 March 2024
DOI: https://doi.org/10.1007/s11042-024-18651-9

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Real-time motion estimation based video steganography with preserved consistency and local optimality

Abstract

Similar content being viewed by others

Motion vector-based video steganography with preserved local optimality

A Motion Vector-Based Steganographic Algorithm for HEVC with MTB Mapping Strategy

Undetectable video steganography by considering spatio-temporal steganalytic features in the embedding cost function

1 Introduction