1 Introduction

Technology and its development are inevitable in our daily lives. Recent developments in technology have led to the production of cheaper storage, better quality cameras, and inexpensive sensors. Combining this all together has given rise to the internet of things paradigm. This in turn has increased the total amount of data that is being transferred and stored, even sensitive information like personal details, medical information, medical images, banking details, and so on. Information security has been a topic of research ever since the beginning of digital communication. Information hiding techniques are used to counter the attacks on data and provide security, privacy, confidentiality, and integrity to the data [3].

Information hiding techniques are majorly categorized into cryptography, digital watermarking, and steganography.Cryptography [50, 108] is the popular information hiding method used to encrypt the plain data into cipher data. The encrypted cipher data is decrypted back to plain data. Digital watermarking is the basic technique used for hiding watermarks like company logos, and trademarks to claim authorship and ownership. Even though Cryptography and digital watermarking is unbreakable, the encrypted message is visible to the Human Visual System (HVS). On the other hand, steganography aids in hiding the secret information inside the carrier without any traces to HVS [9].

Steganography is not new and has been in existence ever since the BCs. Before the digital era, the information transfer happened by shaving the slave’s head, invisible inks, waxes, and silks. As digital media developed, the steganography method has also evolved. Based on the digital media used as the carrier, technical steganography can be divided into image, audio, text and video steganography [48]. Steganography is reduced to the prisoners’ problem, where Alice and Bob are inside the prison [10]. Eve is the warden who oversees all the communication between Alice and Bob. Now, Alice and Bob are planning to escape the prison, and to pursue the plan, they have to communicate in a way where Eve does not get any suspicions. Using cryptography and digital watermarking in this situation only protects the content of the information, however, Eve will realize about their secret communication. Steganography is the only option for Alice and Bob to communicate without creating any suspicion. The escape plan is hidden inside a normal-looking image and communicated between Alice and Bob. Eve can see only the normal-looking cover image. Figure 1 explains the overall workflow of the steganography and steganalysis from Alice, Bob, and Eve’s perspectives.

Fig. 1
figure 1

Steganography reduced to the prisoners’ problem. Alice and Bob communicate with each other using steganography methods. Eve does not suspect their secret communication since the secret information is not visible to HVS

Video steganography is the process of hiding secret information inside videos. The secret information can be any media like text, audio, images, video, and binary file and the carrier video can be raw/compressed in any format. A detailed classification of the video steganography methods based on different criteria is given in Fig. 2. The first level of classification is based on the format of the cover video. The cover video considered are either in the raw domain or compressed domain. Raw domain videos are further classified into spatial domain and transform domain. Least Significant Bits (LSB) substitution and other significant methods are included in the spatial domain. Discrete Wavelet Transform (DWT) and Discrete Cosine Transform (DCT) are the extensively used transformation methods to convert the cover videos into the transform domain. After converting the video into the transform domain, embedding of the secret information is undertaken. In the compressed domain, video steganography uses compressed cover videos. Compressed videos have less storage space compared to raw videos and the embedding happens during or after the compression of the videos. Motion Vectors, intra-prediction modes, entropy coding modules, and DCT/DST techniques are the extensively used video steganography methods in the compressed domain.

Fig. 2
figure 2

Hierarchical classification of video steganography methods

Video steganography has its application in different domains/fields where covert communication is often used. The popular fields where video steganography is utilized are intelligence agencies, the military, the medical sector, and multimedia. Intelligence agencies always prefer covert communication when they communicate inside as well as outside the agency. Video steganography is widely used in this case where they can hide the very existence of the secret message from the attacker. Similar to intelligence agencies, military organizations are also widely using steganography techniques to cover up their communication. Because unauthorized disclosure of the secret data can result in national security issues. The medical sector has also benefitted from the application of video steganography. The current advancement in the health sector has made the storage of the patient’s information in digital form. Further, this information is stored in the cloud and can be transferred to respective patients or authorized health care providers with the help of internet connectivity. Transferring medical data over the internet is a critical problem since any data loss that happened due to cyberattacks can negatively affect the patient health. The medical sector is using video steganography techniques to conceal their private information from unauthorized entities when it is transferred via communication channels. Apart from that video steganography is used to preserve the privacy of authorized individuals detected in the video sequences captured by the surveillance camera. The data of the individuals are embedded inside the video sequences from the surveillance camera.

The main characteristics of any steganography method are imperceptibility, security, robustness, and hiding capacity. There is always a compromise between the security, robustness, and hiding capacity of the steganography methods. The most popular method for steganography is the Least Significant Bits (LSB) substitution method for image, audio, and video steganography. Traditional video steganography methods are simple, effective, and quick. However, overloading the carrier image may increase the hiding capacity but will compromise the security and robustness in return [69, 103]. Compressed domain has better security and robustness compared to raw domain methods. The storage space is less for compressed video when compared to raw videos. However, during compression, some redundant data that are useful may be lost. Recently, deep learning methods are another perspective applied in the steganography field and have produced exceptional results also. Deep learning methods have increased hiding capacity, security, and imperceptibility but are time-consuming and complex.

In the past, few number of attractive review articles about video steganography are published [7, 69, 86, 96, 103, 105] . Most of these surveys are very brief and mainly concentrated on data-hiding in either raw or compressed domains. In addition, they considered only video steganography and not discussed video steganalysis which is somewhat equally important. In this paper, a collocation of all the methods available for video steganography dating over the past two decades is reviewed. A comprehensive review is done and the methods are grouped and summarized. Not only the methodology but also, the existing research gaps available, the challenges faced are delineated. Further analysis is done to point out the future direction. Further, a brief introduction to video steganalysis is also provided.

2 Video steganography in raw domain

The raw domain-based video steganography methods consider the cover video as a sequence of frames and the data embedding operation is applied to each frame separately. The general data embedding procedure in the raw domain is shown in the Fig. 3.

Fig. 3
figure 3

Steganography procedure in raw video domain

Initially, the cover video sequence is transformed into multiple frames. Then the secret data is hidden inside the frames using various methods. In the raw domain, the secret data is directly embedded in the spatial domain of the cover frame, or the cover frame is transformed into the frequency domain and secret data is embedded in the frequency domain. Before embedding, the secret data is subjected to preprocessing. Many of the methods have applied encryption techniques, error-correcting codes, etc to preprocess the secret data. The preprocessing of secret data is implemented to ensure the security of the secret data even if the cover video suffers any attacks or frame drops during transmission. The row domain-based method can be classified into two types: data hiding in the spatial domain and data hiding in transform domain methods.

2.1 Data hiding in spatial domain

Data hiding in spatial domain techniques utilize the pixel values of the cover frame to hide the secret data. It means the secret data bits are embedded directly into the pixel intensity values. The least significant bits (LSB) or LSB substitution is a prevalent method where secret data bits are embedded into the least significant bits of cover pixels intensities. utilized for data hiding in the spatial domain. This section is focused on the discussion of various LSB and other spatial domain methods proposed in the literature for data hiding in videos.

2.1.1 Least significant bits methods

The Least Significant Bits based methods are the commonly used algorithm for image, audio, and video steganography. LSB methods are simple and effective. Usually, LSB methods are described as k-LSB substitution methods where k stands for the number of secret bits that can be hidden. Based on the embedding algorithm, the value of k is changed. The hiding capacity of the embedding algorithm depends on the number of bits (k) that can be manipulated in the cover video. Increasing the number of bits for hiding can increase the hiding capacity but will overburden the carrier leading to exposure of the secret information.

A single pixel in the frame of the cover video has 8 to 24 bits based on the format of the video. Grayscale frames have 8 bits whereas a color frame has 24 bits. A color frame consists of 3 channels (RGB) and each channel consists of 8 bits per pixel. Similarly, an RGB secret frame of the video consists of 8 bits per pixel for all the 3 channels. The least significant bits of the cover frame is replaced with the most significant bits of the secret frame. During extraction, the LSB bits of the stego is taken and 0s are padded to get an approximation of the intended secret information. Along with different secret media, different levels of hiding capacity are considered, and MSE, PSNR, and SSIM values of different combinations are given in Table 1. From the Table 1, it can be noted that the PSNR value decreases with increasing hiding capacity. The imperceptibility, security, and robustness are also low with increased hiding capacity. However, the computational time taken for performing all the combinations, even with the increased hiding capacity is similar. It is considered that there is always a compromise between the hiding capacity and the imperceptibility, security, and robustness of the proposed system.

Table 1 Comparative analysis of different hiding capacity using the LSB method with Lena as the cover object

In the last two decades, different variants of the LSB-based substitution technique [8, 21, 22, 29, 60, 72, 115, 124] are used for hiding data in video streams. Ramalingam [99] proposed a software tool named “stego machine” for hiding the secret text files inside the video. The proposed approach utilized the traditional 1 LSB approach. Ie, only 1 least significant bit of the pixel values in the cover frame is used for embedding the data. The four least significant bits (4 LSB method) of the cover frame are utilized for hiding the secret data in [34]. Before embedding, the secret data (frame) is subjected to partition using the non-uniform rectangular partition algorithm and the resulted grids of the secret data are embedded in the cover frame. , In the last decade, most of the LSB-based approaches included the cover pixel selection technique, secret data encoding as well encryption techniques to improve the security and robustness of the data hiding algorithm. “HASH LSB” [17], an extended version of the LSB approach integrated hash function along with the LSB substitution method. The proposed LSB scheme followed a ‘3-3-2’ embedding pattern to hide the secret data in the cover image. For a cover pixel, the ‘3-3-2 LSB pattern’ utilizes 3 LSBs of the red component, 3 LSBs of the green component, and 2 LSBs of the blue component for hiding the data. The hashing function is employed for selecting the suitable bits in LSBs of the pixels to embed the data. The “HASH LSB” technique along with the RSA encryption scheme is introduced in [40]. The proposed work utilizes the RSA algorithm for encrypting the secret data before data hiding using the ‘HASH LSB’ embedding scheme.

Significant number of methods have introduced the application of encryption and preprocessing methods (including encoding the secret message with error-correcting codes) for handling the secret data before embedding it inside the cover frame. Encryption and computer forensics have been employed with the 4 LSB method for data embedding in [80]. Although encryption techniques can provide additional security for data, the proposed method is more prone to attack. Because the 4 LSB substitution can cause significant visual degradation to the cover video after embedding the secret data. Yadav et al. [131] focused on improving the security of the secret data in the LSB-based data hiding approach by introducing an encryption scheme. The encryption scheme involves the XOR operation between the secret data and the secret key to encrypt the secret data before embedding.

Mstafa et al. [81] introduced error-correcting codes along with the encryption technique in the LSB-based data hiding scheme. The proposed approach converts the cover frame into the YCbCr color space and utilized Y, U, and V components for hiding the data. Initially, the pixel positions of Y, U, and V components in the cover frames are altered using a specific key. The secret data is encoded using hamming code and the resulted encoded data is further encrypted with another key by applying the XOR operation. In a selected pixel of the cover frame, 7 bits of encoded and encrypted secret data are embedded in a manner where 3 bits are in the Y component and the remaining every 2 bits in the U component as well as in the V component. Instead of hamming code, BCH codes are used for encoding secret data in Mustafa et al. [84]. The proposed work followed the same color space conversion and pixel position alteration scheme introduced in [81]. After that, the bits position of the secret data is changed using a private key. Then, the secret data was encoded using BCH codes. In each cover pixel, 8 bits of secret data are embedded by following a 3-3-2 LSB pattern (3 bits in Y, 3 bits in U, and 2 bits in V). Integration of error-correcting code with encryption technique significantly improved the security of the proposed data hiding methods.

LSB substitution-based method [1] employed a metaheuristic optimization algorithm namely ‘cuckoo search’ for preprocessing the secret data. The cuckoo search algorithm is implemented to process the secret message byte by byte. It arranges the bits of each byte (of the secret data) in distinct five forms before embedding them inside the cover frame. In the cover frame, the Euclidean distance method is utilized for finding the appropriate pixel for embedding the secret data. Furthermore, Leavy flight random walk methodology is employed for traversing from the current cover pixel to the next cover pixel. The secret data is hidden in the cover pixel by following the 3-3-2 LSB embedding pattern. Jha et al. [37] proposed an extended version of LSB for hiding the video sequence inside the video sequence. Before embedding the secret frame inside the cover frame, the pixels of the selected cover frame is subjected to scrambling using the prime factorization technique. Then the bits of the secret frame is embedded into the cover frame using the spiral LSB technique. Khan et al. [46] focused on hiding the secret data using the LSB approach in the keyframes of the cover video sequence. Various statistical features such as standard deviation, skewness, and kurtosis are utilized for extracting the keyframes from the cover video. Moreover, the AES algorithm is used to encrypt the secret message before embedding.

To improve the robustness and security of the LSB-based data hiding techniques, various adaptive steganography approaches for the LSB method are proposed in the literature. The adaptive steganography approach hides the secret data in a specific predefined region of interest (moving objects, skin regions, etc. ) in the cover frame. Edges of the objects in the cover frame are chosen for predominantly hiding the secret data in [43]. The Canny edge detection technique is utilized for detecting the edges. The 4-LSB method is used for embedding the secret data in the detected edge pixels of the cover frame. Non-edge pixels are also utilized for hiding the data by using the 2-LSB method. Moreover, the RSA algorithm is employed for providing additional security to the secret data by encryption mechanism. Background objects and foreground objects (except their face) in the cover frame are selected as the region of interest for hiding the medical images [6]. The human vision region of interest is utilized to classify the background and foreground objects in the cover frame. Motion attention index value and variation range are used to determine the human vision region of interest. Moreover, the AES mechanism is implemented to encrypt the medical images. . Mstafa et al. [88] proposed a 4-LSB method to embed the secret data in corner points available in the cover frame. The shi-Tomasi algorithm is used to detect regions of corner points within the cover video frames. The secret data is encrypted using Arnold’s cat map technique. The obtained result shows that the proposed approach is robust against artificial noises.

In the reference [78], the cover video is treated as a sequence of frames, and each cover frame is partitioned into four quadrants. Each byte in the secret data is divided into 4 pairs of bits, where each pair of bits are embedded in each quadrant of the cover frame. LSB substitution-based approach proposed in [39] introduced a cover frame selection mechanism based on DNA alphabets. To improve the security of the embedding data, all pixels of the selected cover frame are not chosen for embedding the secrete data. Instead, suitable pixels are selected by the construction of a burger chaotic map for pixel plotting over the cover frame followed by randomization of the pixel points obtained in the chaotic map using a random number generator. The linear congruential generator is used for generating the random numbers. Younus et al. [134] utilized the knight tour algorithm for selecting the suitable random pixels in the cover frame for hiding the secret data. The knight tour algorithm is based on the movement of the knight on the chessboard. Once the suitable pixels are selected, the secret data is hidden in the last 2 LSBs of the cover pixel.

LSB substitution method is integrated with patch-wise code formation technique for hiding a video inside another video [97]. Initially, the cover frame is preprocessed using the fuzzy adaptive median filtering method to remove the impulse noises. Further, the redundancies in the frame are removed using the pixel clustering technique. The secret data is embedded in the least significant bits of the cover pixels. After embedding the cover video is transformed into an encoded format using the patch-wise code formation technique. The patch-wise code formation technique is included to improve security and reduce the transmission time.

2.1.2 Other spatial domain methods

Most of the video steganography techniques for raw videos in the spatial domain have relied on LSB substitution to embed the secret data. Besides that, a couple of works proposed in the last decade have utilized non-LSB methods to hide the secret data in the spatial domain of raw videos. Jangid et al. [36] utilized K-means clustering and LBP features to embed the secret data. The cover frames are converted into Lab color space and the K-means clustering algorithm is implemented to group the cover frames into different clusters. Only selected clusters of the cover frames are chosen for embedding the secret data. LBP methodology is utilized for hiding the secret data in the selected cluster of the cover frame. The obtained evaluation results show that the proposed method achieved better imperceptibility than the method which embedded the secret data in the transform domain (IWT) using the LSB approach. An adaptive steganography approach [47] utilized the Cb component in the YCbCr color space of cover frames for embedding the secret data. Firstly, the skin regions (face ) in the cover frames are detected. The skin region detection methodology involves converting the RGB frames to HSV color space followed by applying morphological dilation as well as filling operation. The frames containing skin regions are transformed to YCbCr color space and the frame with the least MSE value is chosen for embedding the secret data.

Blocks of the cover frame that have nonuniform colors are exploited for embedding the secret data [13]. The regional histogram optimization technique is implemented to find the appropriate cover pixels (the blocks with nonuniform colors). The regional histogram optimization method divides a cover frame into multiple blocks and the histogram dispersion of each block is plotted to find the blocks that have uniform colors. The blocks with uniform colors are excluded and the rest of the blocks are utilized to embed the secret data. Although the proposed approach is simple, it delivered acceptable imperceptibility. A reversible lossless data hiding technique based on a histogram distribution constrained scheme is proposed in [4]. The proposed approach utilizes the luminance component of raw video frames for embedding the data. Firstly, the luminance component of the video frame is extracted and separated into multiple non-overlapping blocks. After generating multiple non-overlapping blocks, the arithmetic difference of each block is calculated and secret data is placed into the blocks by shifting the arithmetic difference values of the block. The experimental results show that the proposed method is robust against the h.264/AVC compression.

Kelash et al. [45] utilized the average histogram values of the cover frames to determine the suitable frames from cover video sequences for embedding the secret data. Initially, the histogram variation of each frame is computed and the frames having variation greater than the histogram constant value are selected for hiding the data. Each selected frame is broken down into blocks and suitable pixels are selected by comparing the consecutive blocks. Each suitable pixel is divided into two parts. The data is embedded in the right part of the pixel and the count of the bits is altered while embedding is encoded in the left part of the pixel. Ramalingam and Isa [101] proposed a data hiding scheme to embed the secret data in random RGB components of the cover frame. The pixels on the cover frame are randomly permuted using a random key (seed) and a pseudo-random number generator. Every 8 bits of the secret message are embedded in the random pixel by following the specific order “RGBBGRGG”. It means the first and fifth bit of the secret message is embedded in the red component of the pixel, the third and fourth bit in the blue component of the pixel, and the rest of the bits are embedded in the green component of the pixel.

Most of the LSB and other spatial domain methods discussed in this section achieved acceptable imperceptibility as well as data hiding capacity. However, the robustness of the proposed methods is a concern and many of the methods have not conducted any quantitative analysis to evaluate their robustness. Most of the spatial domain-based methods are prone to steganalysis attacks and are not robust against compression as well as noise attacks. A critical analysis of different methods in spatial domain steganography is given in Table 2.

Table 2 Critical analysis of video steganography methods in spatial domain

2.2 Data hiding in transform domain

Unlike the spatial domain-based method which directly embeds the secret data in raw pixel intensities of the cover pixel, the transform domain-based method converts the blocks of cover frames in the spatial domain to the transform domain. After that, the secret data is embedded in the least significant bits of transform coefficients. The general workflow of data hiding in the transform domain is shown in the Fig. 4. Discrete wavelet transform (DWT) and discrete cosine transform (DCT) are two predominantly used transform function in video steganography [53, 83, 94, 117]. The general description of both DWT, and DCT functions and discussions of methods using DWT, and DCT functions for video steganography are provided in this section.

Fig. 4
figure 4

Data hiding in transform domain: A general workflow

2.2.1 Discrete wavelet transform

A signal can be represented in either the time domain or frequency domain and each domain capture interesting features in their domain. In a stationary signal, the frequency components won’t change with time, whereas, in non-stationary signals, the frequency changes over time. Wavelet transformation represents the signals in time-frequency and so is effective with non-stationary signals. Fourier transform is common with stationary signals. Generally, the time domain information is passed through low pass and high pass filters at different levels to decompose the information. Similarly, the frequency domain information is captured by decomposing the depth of the signals. Wavelet transform can be either continuous or discrete. Discrete Wavelet transformation is of interest in image processing tasks as it is simple, operational, and effective.

Discrete Wavelet Transform (DWT) decomposes the signal into sets with significant and insignificant information. The significant information is related to general appearance and is called low-frequency DWT coefficients. Similarly, the insignificant information represents the behavior of the signals and is called the high-frequency coefficients. A single signal is passed through a set of filters and decomposed into two parts - approximation and details.

The rows and columns of an r × c image are passed and processed independently. The formula used for decomposing the rows and columns are given in (1) and (2) respectively.

$$ i(x,y) = \left\{\begin{array}{ll} {\sum}_{0}^{n-1}I(r,m).h_{L}(m-r) & , r \equiv 0 (mod2) \\ {\sum}_{0}^{n-1}I(r,m).h_{H}(m-r) & , r \equiv 1 (mod2) \end{array}\right. $$
(1)

where I(r,c) is the image, hL is the low pass filter and hH is the high pass filter.

$$ i^{\prime}(x,y) = \left\{\begin{array}{ll} {\sum}_{0}^{n-1}i(m,c).h_{L}(m-c) & , c \equiv 0 (mod2) \\ {\sum}_{0}^{n-1}i(m,c).h_{H}(m-c) & , c \equiv 1 (mod2) \end{array}\right. $$
(2)

Finally, the low pass components are arranged in the top half while the high pass components are arranged in the bottom half. The same steps are repeated for several iterations based on the application. Every time the transformation is applied to the low-frequency components.

The data hiding technique based on the LSB substitution approach in the wavelet domain is implemented in [95]. The cover frames of the video sequence are transformed to the wavelet subbands by using the lazy lifting wavelet transform technique. The three least significant bits of each transform coefficient are used to embed the secret data. Moreover, the meta-information about the hiding scheme is embedded in the LSBs of the audio component. The meta-information is required for the receiver to extract the secret data. A hybrid data hiding method [2] used RSA encryption, three-level 2D-DWT operation, DCT operation, and LSB substitution method to embed the secret data in the cover video. For Each cover frame, the red component is extracted and subjected to 3 level 2D- DWT operation. Only the HH band of the red component is decomposed into three levels. Later, the DCT operation is performed on the resulted HH band (The HH band obtained after three-level decomposition). Among the obtained DCT coefficients, middle-frequency subbands are selected for embedding the secret data. To provide additional security for the secret data, it is encrypted using the RSA algorithm. DWT-based method [49] utilized only the red channel of the RGB cover frame for hiding the secret data. The red channel of the cover frame is extracted and the DWT operation is performed to decompose it into frequency subbands. The secret data is embedded in the HH subbands using the LSB approach. Sushmitha et al. [114] proposed an approach for hiding the secret video inside a cover video in the wavelet domain. DWT operation is applied to the cover frames and the LSB substitution approach is used to hide the secret frames in the HL, HH, and LH subbands of the cover frame. Moreover, the proposed approach is extended to hide two secret videos in a single cover video. The cover video is split into two parts, one secret video is embedded in the first part and another secret video in the second part using the same DWT technique and LSB method.

Owing to the fact that the adaptive steganography technique can provide additional security and robustness, few adaptive steganography approaches based on the wavelet domain are proposed in the literature. Lu et al. [74] proposed an adaptive technique to hide the biometric data in the frequency subbands of the video frames. The suitable frame and regions of interest inside the frame for embedding the data are selected by implementing a motion analysis technique. Initially, a watermarking mechanism is employed to embed the sequence number for each cover frame. The watermarking mechanism allows the receiver to extract the information from the cover frame without information loss. One level DWT operation is performed on each watermarked frame to divide it into subbands. The motion analysis is performed initially on each frame and the frame with higher motion activity is selected. In each selected frame, motion analysis is again performed for each block to find the blocks with higher motion activity. The secret biometric data are embedded in the blocks with higher motion activity. Multiple object tracking algorithm is employed in [87] for finding the suitable region of interest to hide the secret data. In the cover frame, only pixels of moving objects are chosen for embedding the secret data. The multiple objects tracking algorithm based on background subtraction and Kalman filtering is applied to the cover frames to detect the moving objects. The secret data is embedded in the DWT or DCT coefficients of the regions that have moving objects. Moreover, the secret data is encoded using error-correcting codes before embedding it in the region of interest.

Skin tone areas present in the cover frames are considered as the region of interest for hiding covert information. Embedding inside the skin tone areas is based on the fact that the skin tone areas in the frame have better immunity against noises. Kumar et al. [57] utilized the skin tone areas in the cover frames for embedding the secret data. A skin detection algorithm is implemented to detect the frames and regions containing the skin. The selected cover frames are decomposed into frequency subbands using a three-level DWT operation. The secret data is embedded in the LSBs of the transform coefficients using the 1-LSB algorithm. Obtained results show that the proposed method is robust against MPEG-4 compression attacks. Sadek et al. [104] utilized an adaptive skin detection algorithm to generate a skin map for each cover frame. Further, the skin maps are converted to a skin-block-map to eliminate the error-prone skin pixels and choose good pixels for hiding the secret data. The red and blue channels of the skin regions are used for embedding the data. The red and blue channels are transformed into wavelet domains by applying three-level 2D-DWT and the secret data is hidden in the frequency coefficients. The introduction of the skin block map to eliminate the error-prone pixel has improved the robustness of the proposed method against the MPEG-4 compression attack. But the process is computationally expensive and eliminating error-prone pixels resulted in less hiding capacity. In [82] facial regions present in the video frames are exploited for hiding the secret data. The proposed work employed the Viola-Jones algorithm and KLT tracking algorithm to detect and track the facial regions in the cover frame. The detected regions in the cover frame are decomposed to frequency subbands by applying DWT. The secret message is encoded using BCH codes and the encoded secret is embedded in the transform coefficients. Further, the key used for encoding and information about the region of interest is embedded in the non-facial regions.

A trained artificial neural network and LSB algorithm are used for data hiding in the DWT domain [42]. Initially, the extracted cover frames are transformed to the wavelet domain using DWT operation. The trained artificial neural network classifier is utilized to select the suitable regions in the frequency subbands for hiding the secret data. Once the suitable regions are identified, the secret data is embedded using the LSB algorithm. Suresh et al. [112] integrated oppositional grey wolf optimization algorithm and DCT-based keyframe extraction mechanism for hiding the secret data in the DWT domain. The oppositional grey wolf optimization algorithm is employed for enhancing the visual quality, minimizing the distortion in the cover video after embedding, and improving the security of the secret data. DCT operation is used to detect the scene changes and extract the keyframes. Once the keyframes are obtained, the optimal regions in the keyframes for hiding secret data are selected using the oppositional grey wolf optimization algorithm. The optimal regions are decomposed into frequency subbands by applying two-level DWT. Only LL and HH bands are used for embedding the secret data.

Wahab et al. [121] proposed a hybrid approach for video steganography based on discrete wavelet transform (DWT) and histogram shifting. After transforming the cover frame to the wavelet domain by performing the DWT operation, the histogram shifting technique is implemented to embed the secret data. Unlike traditional DWT methods that directly embed the secret data in subbands, the proposed work selects the histogram of subbands with higher frequency values and a part of the selected histograms are subjected to shifting operation. The shifting operation is carried out to create the space for embedding the secret data. Dalal and Juneja [15] utilized the frequency subbands of the luminance component for hiding the secret data. The proposed work converts the RGB cover frames to YUV color space and extracts the luminance component (Y). Second level 2D-DWT is applied to the Y component and 16 subbands are generated. Among 16 generated subbands, 8 middle subbands are used for embedding the secret data. Evaluation results show that the proposed method is robust to noise attacks and compression attacks to an extent.

Dalal and Juneja [16] conducted a study to compare the performance of Orthogonal and bi-orthogonal filters used for frame decomposition in DWT domain-based data hiding methods. During implementation, the cover frame is decomposed by applying one level 2D-DWT with orthogonal filters and bi-orthogonal filters separately. The secret image is hidden in the LH and HL sub-bands of the cover frame. The obtained results show that Bi-orthogonal wavelet filters outperformed the orthogonal filters for DWT domain-based video steganography applications. A comparative analysis study is conducted in [109] to examine the performance of DCT, DWT, and CvT (curvelet transform) for hiding the secret data inside video in the transform domain. LSB substitution method is implemented to hide the data in the transform domain after applying the transform function. Obtained evaluation results show that CvT based method delivered better imperceptibility in both conditions (with and without the presence of noises). Moreover, the DCT and DWT approaches are more computationally expensive than CvT.

Few works in the literature have utilized the integer wavelet transform for hiding the secret data inside the video. An integer wavelet transform-based fusion approach is applied for minimizing the distortion associated with hiding the secret data inside the video [90]. Both the cover frame and secret frame are decomposed by applying the IWT. The decomposed cover frame and secret frame are fused by adding the wavelet coefficients of respective sub-bands of both the cover frame and secret frame. Then inverse integer transform was applied to the fused matrix to generate the stego-video. Evaluation results show that the proposed method generated the stego-video with less distortion and acceptable robustness. Ramalingam and Isa [100] proposed a method based on Haar IWT and LSB for embedding the text messages inside AVI video files. The RGB frames of the cover video are decomposed to frequency sub-bands by applying one-dimensional Haar IWT. Before decomposing the RGB frame into frequency sub-bands, the video frames are subjected to normalization. Normalization is implemented to prevent the overflow or underflow that may happen while altering the transform coefficients of the cover video. The LSBs of higher frequency bands HH, HL, and LH are used for embedding the secret text data.

2.2.2 Discrete cosine transform

DCT is also a transform function like DWT that divides the image into spectral subbands. The major difference between DCT and DWT is the earlier one generates more frequency bands and provides higher frequency resolution. Nevertheless, DWT generates few frequency bands and provide high spatial resolution. Significant amount of works in the literature used the DWT domain to embed the secret data in raw videos. Unlike DWT, the DCT domain is not frequently used in the literature to hide the secret data inside the raw videos . On the other hand, video steganography methods proposed in the compressed domain have utilized the DCT domain extensively for hiding the secret data. This section discusses the DCT based method in the raw video domain only.

In the raw video domain, the two-dimensional DCT is applied to each frame of the video separately and transforms the frame into low, middle, and high-frequency bands. The secret data is hidden in the transform coefficients of either one or multiple bands.

Consider an arbitrary frame I of resolution J × K. And T is the transformed frame generated by applying DCT on I. The DCT coefficients of T are calculated using the equation,

$$ T_{xy}=\alpha_{x}\alpha_{y}\displaystyle\sum\limits_{j=0}^{J-1} \displaystyle\sum\limits_{k=0}^{K-1} I_{jk}\cos\frac{\pi(2j+1)x}{2J} \cos\frac{\pi(2k+1)y}{2k} $$
(3)

After embedding the bits of secret data in the DCT coefficients, inverse two dimensional DCT is applied on the frame T to generate the frame I using the equation,

$$ I_{jk}=\sum\limits_{x=0}^{J-1} \sum\limits_{y=0}^{K-1} \alpha_{x}\alpha_{y} T_{xy}\cos\frac{\pi(2j+1)x}{2J}\cos\frac{\pi(2k+1)y}{2k} $$
(4)

where

$$ \alpha_{x}= \left\{\begin{array}{ll} \frac{1}{\sqrt{J}} & , x = 0\\ \sqrt{\frac{2}{J}} &, 1\leq x\leq J-1 \end{array}\right. $$

and

$$ \alpha_{y}= \left\{\begin{array}{ll} \frac{1}{\sqrt{K}} & , y = 0\\ \sqrt{\frac{2}{K}} &, 1\leq y\leq K-1 \end{array}\right. $$

Here, Ijk represents the pixel value in the cell \(^{\prime }jk^{\prime }\) (column j and row K) of the frame I. Further, Txy represents the transform coefficient corresponds to the cell \(^{\prime }xy^{\prime }\) (column x and row y) of the 2D-DCT matrix.

Rajesh and Shajin [98] utilized DCT coefficients of the frames in raw video streams to embed the secret information. The secret data embedding procedure includes the following steps; 1. extraction of the frame from the video stream, 2. Dividing the frames into image blocks of size 8×8, 3. Applying 2D-DCT on each image block and, 4. Embedding the secret data in less significant DCT coefficients. The less significant coefficients are detected using a predefined threshold value. To improve the security of the secret data embedded in the DCT domain, Mumthas and Lijiya [89] introduced RSA encryption, random DNA encryption, and Huffman encoding along with two-dimensional DCT-based videos steganography. The secret message is encrypted using the RSA algorithm. And the encrypted secret message is subjected to random DNA encryption followed by compression. The cover frame is divided into blocks of size 8×8 and 2D-DCT is applied on each block. The compressed and encrypted secret message is embedded in the LSBs of the transform coefficients and Inverse DCT is applied on each block to generate the stego-frame.

Mstafa et al. [85] proposed a method for hiding the data in the DCT domain. Firstly, the frames of the cover video are converted into YUV color space. Then, two-dimensional DCT is applied to each plane of the YUV color space. The secret data is encoded using two error-correcting codes, BCH codes and Hamming codes. The encoded data is embedded in the DCT coefficients, except for the coefficients with zero frequencies. Obtained evaluation results show the proposed scheme achieved high embedding capacity with minimal visual distortion in the video. Moreover, the presented method is robust against the salt & pepper attack, Gaussian white noise, and the median filtering attack.

Suresh et al. [113] proposed a data hiding approach based on shuffling the data on least significant DCT Coefficients. Initially, the scene change detection technique is implemented to select the cover frames. The scene changes are detected by the inter-frame difference value. After selecting the cover frame, each color channel in the cover frame is subdivided into 64 sub-images and the DCT coefficient is computed for each sub-image. Among obtained 64 DCT coefficients, 8 least DCT coefficients are selected for hiding 1 pixel of the secret image data. A random sequence generation-based shuffling is implemented to embed each bit of secret data randomly in the obtained 8 least DCT coefficients of the cover frame. The proposed shuffling approach improves the security of the steganography method. A critical analysis of different methods along with the evaluation metrics and remarks for transform domain video steganography is outlined in Table 3.

Table 3 Critical analysis of video steganography methods in transform domain

3 Video steganography in compressed domain

The majority of video steganography methods proposed in the literature for data hiding in the raw domain are simple and easy to implement. But, they are more prone to various attacks, especially compression attacks. Furthermore, currently, videos in compressed form are preferred for storing as well as transmission over the internet. The compressed video requires less storage space compared to uncompressed video. And transferring the videos in compressed form over the internet is quicker and requires less bandwidth. In this context, the data hiding techniques in the compressed video domain have gained popularity in the last two decades. On the other hand, compression causes the removal of redundant video data and reduces the space for hiding more data.

Among various available video compression coding standards, MPEG-X and H.26X are the widely utilized methods in recent years. Specifically, H.264/AVC a.k.a MPEG-4 Part-10 video coding standard is the popular and predominant video codec used in the literature by researchers for hiding data in the compressed domain. The H.264 video codec has multiple novel features compared to its predecessors and some of the novel features are “multiple frames reference capability”, “flexible macroblock ordering”, Intra prediction in intraframe, etc. Generally, H.264 video codec consists of multiple groups of pictures (GOP). And each GOP contains the intra-coded frames (I-frame), predicted frames (P-frame), and bidirectional predicted frames (B-frame). The I-frame a.k.a keyframe is the one that is independently coded and the first frame of each GOP. The P-frame contains only the difference between the current frame and the preceding frame. The B-frame holds only the changes in the current frame from both the previous and following frames.

The video encoding procedure in H.264 codec is shown in Fig. 5. During encoding, the initial frame (which contains all the important data and is considered the I-frame ) is divided into macroblocks where each macroblock consists of 16 × 16 pixels. The data compression process comprises various steps such as prediction, domain transformation, and encoding. The prediction utilizes the temporal and spatial redundancy in the video data. Prediction allows encoding the difference between the previously coded data and the predicted data. There are two types of prediction: Intra prediction and inter-prediction. Intra prediction generates the prediction of macroblocks based on previously coded data in the current frame while inter prediction generates the prediction based on the data in the previously coded frames. Motion estimation and motion compensation techniques are utilized to predict the frame. The difference between the prediction and the current macroblock is known as residual. The block of residuals is subjected to domain transformation using integer transform. DCT is the most commonly used integer transform. The block of transformed coefficients is quantized to minimize the precision of the coefficients.

Fig. 5
figure 5

H.264 video encoding

The final step of encoding converts the various values (quantized DCT coefficients, data required by the decoder to reconstruct the prediction, other data about the video sequence, etc. ) obtained in the previous steps and syntax elements to binary codes.

In the compressed domain, the data hiding is implemented in two ways; data hiding along with the video encoding procedure and data hiding in the encoded bit stream. The data hiding techniques along with the video encoding procedure utilize various syntax elements related to the video coding task for embedding the secret data. A general overview of data hiding in syntax elements of the compressed domain is presented in the Fig. 6. The data hiding in the encoded bit stream methods exploit the entropy coding modules to carry the secret data.

Fig. 6
figure 6

An overview of data hiding in compressed domain

3.1 DCT/DST coefficients

In the literature, the quantized DCT coefficients obtained during the video encoding procedure have been utilized predominantly for hiding the secret data in H.264 videos. Generally, the secret data is embedded into the DCT coefficients of 4×4 luminance block of the cover frame (especially the I-frame). Intra-frame distortion drift is one of the main challenges faced while embedding secret data in the compressed video domain. In intra-frame prediction, the current prediction block will be the sum of residual values and predicted values. The predicted value is computed from the block samples of its neighboring encoded block. Suppose the neighboring encoded block is one of the venues for hiding the secret data, there should be distortion due to embedding the data. Since predicted values are computed from its neighboring blocks, this embedding-induced distortion in the previous block will propagate to the predicted block by intra-frame prediction. Most of the earlier works in literature have not considered the intra-frame distortion drift and experienced high visual distortion with less embedding capacity.

Ma et al. [76] proposed a novel method for hiding the secret data in quantized DCT coefficients with limited intra-frame distortion drift. The coefficients of 4× luma block in I-frames are utilized for embedding the secret data. After entropy decoding the H.264 bitstream, a pair of quantized DCT coefficients are selected from each 4× luma block to hide the secret data by controlling the intra-frame distortion drift. One of the DCT coefficients in the selected pair is used for embedding while another DCT coefficient is intended for compensating the intra-frame distortion drift. The correlation between DCT coefficients and distortion induced in the pixels (which are utilized in intra-frame prediction) is examined to select the appropriate pair of DCT coefficients. Based on [76], a method for data hiding in quantized DCT coefficients without intra-frame distortion drift is proposed in [77]. Similar to [76], the proposed method utilized pair of DCT coefficients to embed the secret data as well as to accumulate the distortion drift. Moreover, the directions of intra-frame predictions are exploited to avert the distortion drift. Even though the approach of combining pair of DCT coefficients and the direction of intra-frame predictions can prevent the intra-frame distortion drift, the hiding capacity of the method was less. Only about 50 % capacity of the luminance blocks is utilized for data embedding.

To utilize the full capacity of luminance blocks for data hiding without any intra-frame distortion drift, a DCT-based perturbation method is proposed in [62]. In Ma et al. [77] approach, each 4*4 block is embedded with 3 bits of the secret data. Unlike [77], the proposed approach utilized quantized DCT coefficients of each 4× block to embed 4 bits of secret data. Usually, embedding more data increase the visual distortion. To cope with the distortion induced in the video by increasing the amount of secret data hidden in each block, a DCT-based perturbation scheme is introduced. In the perturbation scheme, a new filtered 4×4 luma block is selected to hide the secret data by perturbing the related quantized DCT coefficients. The quantitative evaluation of the proposed DCT-based perturbation scheme shows that the embedding capacity is improved without compromising the visual quality. Reference [91] focused on further improving the hiding capacity of the quantized DCT coefficients-based method without harming the video quality. To embed the secret data and prevent the intra-frame distortion drift, the quantized DCT coefficients are classified into two different clusters. The first cluster is reserved to embed the secret data while the second cluster is utilized to prevent the intra-frame distortion drift. An embedding modification direction table is introduced to embed the secret data in DCT coefficients with minimal distortion. Initially, an embedding direction modification table for secret bits is created. Each element in the embedding direction modification table corresponds to each bit of the secret message. Then, the embedding modification direction value of the cluster reserved for embedding (embedding cluster ) is calculated by a specific equation. After, the difference between the embedding direction modification value of the embedding cluster and decimal values of the n secret bits are computed to regulate the embedding and distortion.

In another work, Liu et al. [67] focused on improving the robustness of the data embedding approach [77] which uses quantized DCT paired coefficients and directions of intraframe prediction for preventing distortion drift. In that context, an error-correcting code ‘BCH code’ is used to encode the secret message before embedding it in quantized DCT coefficients. The robustness of the proposed approach is examined by exposing the secret data to re-encoding and re-quantization attacks. The obtained result of the quantitative examination shows that the robustness of the proposed method is improved by 25 % and 100 % of the secret message is recovered when the secret data is exposed to reconding attack. However, the recovery of the secret message bit is impossible if the frame drop happens. In [66], Liu et al. further improved the robustness of [77] to handle the frame loss problem. Shamir’s (t,n)-threshold secret sharing is implemented to handle the secret data before embedding. The secret data is divided into the n-sub secrets with the help of Shamir’s (t,n)-threshold secret sharing. The sub-secret are embedded in the quantized DCT coefficients of 4×4 luma blocks by following embedding conditions defined in [77] to prevent intra-frame distortion drift with improved robustness. Compared to [77], the implementation of Shamir’s secret sharing for processing the secret message improved the survival rate by about 60% when the stego-video experienced frame loss. Reference [66] is further extended in [65] to develop a robust reversible data hiding method without intra-frame distortion drift. The major contribution of the work is focused on recovering the original cover video completely after extracting the secret data.

In the last decade, most of the steganography techniques focused on embedding the data in quantized DCT coefficients have addressed the intra-frame distortion drift and proposed solutions to overcome this issue. To further improve the quality of the cover video, reduce the impact induced by embedding the secret data and increase the security of the data hiding approach, Syndrom trellis code (STC) is utilized by a few works in recent years [11, 122, 130]. Cao et al. [11] proposed a content-adaptive data hiding approach based on STC for H.264 videos. A new method called cover block decoupling is introduced to reduce the impact induced by the embedding. Two cover block decoupling strategies, passive strategy, and active strategy are presented in this work. The passive strategy is to select the non-referenced block (the blocks which are not referenced for Intra prediction) as the cover block for data embedding. Since H.264 coding contains only very few non-referenced blocks, choosing them alone for data hiding will severely affect the hiding capacity. To increase the capacity an active strategy that is focused on embedding the data in the first block of each macroblock and the rest of the blocks are utilized as the buffering zone to suppress the impact caused by the embedding. Most of the above-discussed methods in this section addressed mitigation of the inter-block distortion but not addressed the inner-block distortion. An STC-based method addressed both inter-block distortion and inner block distortion to further improve the video quality, as well as security [130]. The embedding of the secret data in DCT coefficients is based on three predefined strategies, 1. If the current block is not referenced for prediction, then all coefficients are used to embed the secret data independently, 2. If the pixels in the rightmost column or bottommost column of the current block are referenced for adjacent block prediction then paired coefficients are employed for embedding as well as compensating the distortion, 3. If the rightmost subblock and bottommost subblocks of the current block are referenced for the prediction of adjacent blocks, then four coefficients (one coefficient for embedding and the rest for compensating the distortion induced by the embedding) are selected for embedding and distortion compensation.

H.265/HEVC (high-efficiency video coding) is the most advanced and next-generation video coding standard which can provide more compression than H.264/AVC without affecting the video quality. H.265/HEVC encoding includes similar steps to H.264/AVC encoding. Analogs to macroblocks in H.264, a coding tree unit (CTU) of size up to 64×64 pixels is the basic coding unit in H.265/HEVC. The CTU can be further divided into multiple coding units of size 32×32 pixels, 16×16 pixels, 8×8 pixels, 4×4 pixels by following a quadtree structure. Furthermore, HEVC uses two different integer transforms which are based on DCT and DST to code the residual blocks. In the literature, few works utilized the quantized transform coefficients (DCT coefficients and DST coefficients) of intra-predicted residuals of H.265/HEVC bitstream to embed the secret data. Chang et al. [14] is the first method in the literature that utilized DST/DCT coefficient for information hiding in H.265/HEVC videos. The proposed method addresses the solution for intra-frame distortion drift as well as inter-frame distortion drift caused by embedding the secret message in DST/DCT coefficients. LSBs of the quantized transform coefficients are utilized for embedding the watermark data in H.265 video [116]. Only non-zero quantized transform coefficients are considered for embedding the watermark bits. The DST coefficients of the 4×4 luminance block are used for embedding the secret data in cite [70]. To handle the intra-frame distortion drift, the proposed approach introduced three conditions based on the directions of intraframe prediction and used the multi-coefficient approach for data embedding. In the multi-coefficients approach, a triad of coefficients is used to handle embedding and distortion prevention. Among three coefficients, one coefficient is used for data embedding and the rest are utilized to reduce the distortion caused by the embedding of secret data. Liu et al.’ s method [68] used 8×8 luminance quantized DCT coefficients for embedding the secret data. Two conditions based on the directions of the intraframe prediction and multi-coefficient-based embedding are employed for preventing the intra-frame distortion drift. Two types of multi-coefficient-based embedding approaches are used in the proposed work. The first type uses the combination of four coefficients where one coefficient is for embedding the secret data and the rest three are used to prevent distortion. The second type uses a pair of coefficients where one coefficient is reserved for embedding and the other one for compensating the distortion. Liu and Xu [71] proposed a robust steganography method for H.265 by utilizing the multi-coefficients of the selected 4×4 luminance DST blocks for data embedding. To enhance the robustness and security of the secret message, Shamir’s (t, n)-threshold secret sharing is introduced to encode the secret message before embedding. Similar to their earlier work in [70], three conditions based on the directions of intraframe prediction and multi-coefficient approach-based data embedding are implemented to avert the intra-frame distortion drift.

3.2 Motion vectors

The motion vector is an essential syntax element in the process of video encoding and it is utilized for motion estimation as well as motion compensation to reduce the temporal redundancy. The motion compensation technique allows the prediction of the current block from previous or future blocks by accounting for the motion of objects in the frame. The motion vector is utilized widely for embedding the secret data in compressed videos. The earlier works in the literature considered embedding the secret data directly into the motion vectors. Most of the existing motion vector-based video steganography algorithms can be classified according to the modification approach employed in motion vectors for data embedding. Data embedding based on the modification of the magnitude of the motion vector and modification of the phase angle of the motion vector are the two approaches generally implemented in the literature. The data embedding approach based on the modification of the magnitude alters the magnitude value of the suitable motion vector by adding or subtracting 1 based on the secret data bits. The phase angle modification method converts the cartesian coordinate system to an imaginary system and considers each section of imaginary sections as 0 or 1. Based on the secret bit to be hidden, the selected motion vectors are rotated for data embedding.

Zhang et al. [137] and Xu et al. [126] considered some of the suitable motion vectors among all available motion vectors to embed the secret data bits. The suitable motion vectors are called candidate motion vectors and their magnitude is greater than a predefined threshold value. The phase angle of the motion vectors [23, 30] is utilized to hide the secret data bits. Initially, the candidate motion vectors are selected based on comparing the magnitude of the available motion vector with a predefined threshold value. Then the phase angle difference between successive motion vectors (candidate motion vectors) is utilized to embed the secret data bits.

A video steganography scheme based on the motion vector and matrix encoding is proposed in [59]. The proposed method is focused on hiding the data in macroblocks that are moving at high speed. The macroblocks moving at high speed were estimated based on the size of the motion vector. The motion vector of a macroblock with a higher amplitude than a predefined threshold value is selected for embedding the data. The matrix encoding scheme is introduced to decrease the modification rate of the motion vector. Aly [5] et al. utilized the motion vectors used for encoding and decoding the P-Frame and B-Frame in the compressed video for hiding the secret data. Unlike [126, 137] which selected candidate motion vectors based on motion vector attributes such as angle or phase angle, the proposed method considered motion vectors that are associated with macroblocks of high prediction error as the candidate motion vectors. The data is hidden in both vertical and horizontal components of the candidate motion vector in P and B frames.

Cao et al. [135] proposed an adaptive approach for hiding the secret messages in MPEG-4 videos by utilizing the motion estimation process. A novel technique called perturbed motion estimation (PME) is implemented to estimate the motion as well as to embed the secret data. The novel technique, PME is inspired by Fridrich et al’s perturbed quantization steganography based on the wet paper code. According to Fridrich et al’s, the security of the video steganography method can be improved by embedding the secret data in adaptively selected components of the cover data without sharing the adaptive selection rule with the recipient. To achieve adaptive embedding without sharing the adaptive region selection rule, the wet paper code (“a simple variable-rate random linear code”) is introduced. The proposed novel technique PME perturbs the motion estimation process associated with the encoding of some candidate macroblocks to hide the secret message. A selection rule based on the MSE value is introduced to select the candidate macroblocks. In traditional motion vector-based steganography methods, embedding the secret data in the motion vector induces shifting of the local optimal motion vector to non-optimal and thereby leaves the clues of data embedding. This issue will make the embedding methods vulnerable to motion vector-based steganalysis attacks. To improve the security of the motion vector-based steganography and robustness against the steganalysis systems, Cao et al. [12] introduced a data hiding scheme based on syndrome trellis code and uncertainty of motion estimation in H.264 videos. The syndrome trellis code is introduced to reduce the overall embedding impact while uncertainty in motion estimation is utilized to solve the distortion induced by the shifting of the motion vector from optimal to non-optimal. The proposed methods relate the embedding-induced distortion in a motion vector and its associated uncertainty to achieve higher security against steganalysis attacks. Yao et al. [133] proposed an effective method to enhance the security of the data hiding approach in the motion vectors associated with the process of H.264 encoding. A distortion function is defined to express the embedding impact on motion vectors. The distortion function is designed by considering the change in the prediction error due to modification of the motion vectors and change in the statistical distribution of the motion vectors. The proposed data embedding method consists of three stages. In the first stage, the embedding distortion for each motion vector in the cover frame is defined by using the designed distortion function. The second stage involves the data embedding procedure and modification of the motion vectors by limiting the changes. Two-layered syndrome trellis codes are introduced for data embedding to achieve minimal distortion steganography. And finally, the video is encoded using the modified motion vectors. The evaluation results show that the proposed approach has significantly improved the security of the motion vector-based data hiding approach. Zhang et al. [136] focused on preserving the local optimality of the modified motion vectors after embedding the secret data on them. By preserving the local optimality of the modified motion vectors, the traces or clues generated in the cover videos due to the data embedding can be minimized, and thereby the security of the steganography method can be improved. The proposed method considers a search area in the cover frame that consists of multiple motion vectors. Before modifying a candidate motion vector, the local optimality of each candidate motion vector is evaluated and a candidate motion vector that contributes minimal degradation to the video encoding efficiency is selected for the modification.

A reversible data hiding algorithm using histogram shifting of motion vector values for H.264 videos is proposed in [127]. An error propagation control mechanism is employed to reduce the distortion induced by the modification of motion vectors. The motion vector values of the selected reference frame are not modified and the motion vectors associated with non-referenced frames are utilized to embed the secret data.

3.3 Intra prediction modes

Intra prediction in the video encoding procedure generates the prediction of macroblocks based on the previously coded data in the same frame. The video encoding process utilized multiple intra-prediction modes to encode the macroblocks. In H.264/AVC coding standard, there are 13 prediction modes, which include nine of 4×4 blocks and four of 16×16 blocks. The H.265/HEVC codec has 35 Intra prediction modes, where 33 are angular, and the rest are the planner and DC prediction modes. In the literature, the Intra prediction modes are utilized to embed the secret data in H.264 and H.265 videos by mapping the modes to secret data bits.

Hu et al. [33] proposed an intra-prediction mode-based data hiding scheme in H.264/AVC videos. The proposed method is configured to embed 1 bit of secret data in each 4 ×4 luma block by altering the 4 ×4 intra-prediction modes. A predefined rule is implemented to select the candidate luma blocks for embedding the secret data. The candidate luma blocks are modified to hide 1 bit of secret data based on a predefined mapping algorithm between the secret data bit and intra-prediction mode. The prediction difference of intra 4 ×4 blocks is utilized for data hiding in H.264 videos [140]. The proposed method initially groups the set of 4 ×4 prediction modes into two disjoint subsets (let it be A and B) of prediction modes based on the prediction difference of intra 4 ×4 blocks. The mapping rule between secret binary bits and prediction mode is given as,

$$ \text{mod} e \in \left\{\begin{array}{ll} A &{if} s_{i} = 0\\ B & {if}s_{i} = 1 \end{array}\right. $$
(5)

Where mode is the coding mode of the current block and si is the hidden data bit. Further, logistic mapping rules are implemented to randomly choose the hidden location and improve the security of the proposed method.

Yang et al. [132] used the intra-prediction modes and matrix coding to hide the secret data in H.264/AVC video stream. Here, a 4 × 4 block is selected for embedding the secret data only if the respective block is a candidate block based on certain predefined rules and all of the 4 × 4 blocks within a 16 × 16 block are of different prediction modes. To implement secret data-Intra prediction mapping, two secret data bits are mapped to every three 4 × 4 blocks by matrix coding.

Zhang et al. [139] proposed an adaptive video steganography algorithm based on intra-prediction mode. The proposed scheme hides the secret data in selected 4 × 4 blocks altering the prediction mode based on the predefined mapping rules between secret data bit and intra-prediction mode. Unlike other intra-prediction mode-based methods which select hostable blocks/candidate blocks using the different scrambling algorithms, the proposed method introduced the Syndrome trellis code-based algorithm to choose the regions with more complex textures for data embedding.

3.4 Entropy coding modules

In the video encoding process, the entropy coding method is implemented to encode various parameters obtained in the previous stages of the encoding process including syntax elements, coded block patterns, motion vectors, residual data, reference frame index, etc. Context-based adaptive binary arithmetic coding (CABAC) and context-adaptive variable-length coding (CAVLC) are the two entropy coding schemes implemented in various video encoding standards including H.264/AVC and H.265/HEVC. The property of the entropy coding schemes is utilized in the literature for concealing secret data in compressed video domains.

Liao et al. [61] proposed an entropy coding scheme-based data hiding technique in h.264/AVC videos. The proposed method achieves data hiding by embedding the secret data bits in the trailing ones of 4×4 blocks during the CAVLC procedure. To choose block positions for embedding the secret data, chaotic map-based random numbers are generated initially. After blocks are chosen, the secret information bits are hidden in the trailing ones The proposed method achieved acceptable imperceptibility with low embedding capacity. Ke and Weidong [44] utilized the property of CAVLC for data hiding in H.264 videos. The data embedding is implemented by altering the sign flag of the trailing one and different levels’ codeword parity flag in CAVLC. The proposed approach considers the trailing ones of non-zero coefficients with high frequency in luma components for data hiding. Based on the secret data bit the trailing ones are modified where even parity is given to codeword if the secret bit is ‘0’ and odd parity if the secret bit is ‘1’.

Zhang et al. [138] proposed a data hiding scheme based on the trailing coefficients obtained in each DCT transform block during H.264 encoding process. The proposed approach applies DCT transform on frames of the video sequence to obtain DCT coefficients and scans the DCT blocks in order. Among the arranged blocks, odd number blocks are selected for embedding the secret data and even-numbered blocks are used as correcting blocks. In the odd blocks, the value of the trailing coefficient is negative when the secret bit is 0 and the value is positive when the secret bit is 1. The proposed method displayed acceptable robustness against various noise attacks.

Xu et al. [128] utilized codeword substitution to embed the secret data in encrypted streams of H.264/AVC standard video sequences. The sensitive parts of compressed video sequences including motion vector differences, intra-prediction modes, and residual coefficients are encrypted using the stream cipher. The code word substitution approach named bin string substitution technique is employed to hide the secret data bit in the encrypted domain. The proposed method managed to maintain the same bitrate even after encryption as well as data hiding. Reference [129] presented an improved version of data hiding in the encrypted stream [128]. The proposed method mainly focused on increasing the embedding capacity of [128] without affecting the visual quality. In [128] only the code-word of the level whose suffix Length is 2 or 3 is utilized for data embedding by single codeword substitution. But in the proposed method code word of level with suffix length 1 is also used. Paired codeword substitution is implemented for data embedding when the suffix length is 1. And for suffix lengths greater than 2, the multiple-based notational system is adopted instead of single code-word substitution to achieve data embedding. Critical analyses of different methods proposed in the compressed domain are given in Table 4.

Table 4 Critical analysis of video steganography methods in compressed domain

4 Features of video steganography

Evaluation of video steganography methods is important to determine the performance and efficiency of the method. The main features expected from good steganography methods are imperceptibility, hiding capacity, security, robustness, and resistance to other steganalysis attacks. In this section, each feature of the video steganography method is elaborated on in detail.

4.1 Imperceptibility

Imperceptibility is the capability of the method to hide secret information that is not visible to the human eyes. Humans should not be able to interpret the secret information hidden. This measure is more related to the visibility of the resulting videos. Higher imperceptibility means lower distortion and higher visual quality of the stego video. Many methods have displayed the constructed stego video results and the extracted secret image results to show the imperceptibility of the method.

Different evaluation metrics are used for measuring the imperceptibility of the method. The most commonly used measures are Mean Squared Error (MSE), Peak Signal-to-Noise Ratio (PSNR), and Structural similarity index matrix (SSIM). MSE is calculated by taking the mean of the error between the input and output frames. PSNR is calculated with MSE values as the base. Equation (6) and (7) shows the formula used for calculating MSE and PSNR respectively. MSE and PSNR values are measured in decibels (dB) and are widely used because of their simplicity.

$$ MSE = \frac{{\sum}_{R,C}[I_{1}(r,c) - I_{2}(r,c)]}{R*C} $$
(6)
$$ PSNR = 10 * {\log_{10}\frac{E_{2}}{MSE}} $$
(7)

SSIM is another measure used in contemplating the cognitive degradation between images caused during compression or reconstruction. Loss between two instances of the images is calculated by keeping one of the images as a reference and compared against the processed image. In the case of video steganography, the input images are the reference and the reconstructed stego image and the extracted secret image are the processed image. The formula for calculating the SSIM value is given in (8). PSNR is the pixel-level difference between two images, however, SSIM calculates the visual difference between the two images. SSIM is considered a better metric for image degradation methods when compared with PSNR.

$$ SSIM(x,y) = \frac{(2\mu_{x}\mu_{y} + C_{1}) + (2 \sigma_{xy} + C_{2})} {({\mu_{x}^{2}} + {\mu_{y}^{2}}+C_{1}) ({\sigma_{x}^{2}} + {\sigma_{y}^{2}}+C_{2})} $$
(8)

4.2 Hiding capacity

Capacity is a metric to measure the amount of secret media that is hidden inside the cover image with minimum distortion. Hiding capacity is also called embedding capacity as it directly refers to the amount of secret information that can be embedded. Hiding capacity is calculated by dividing the amount of secret information by the total size of the cover video.

Equation (9) shows the formula to calculate the hiding capacity of the video steganography. Capacity is given in terms of bits per pixel (bpp) which indicates the number of secret bits that can be hidden inside each pixel of the cover frame.

$$ C_{H} = \frac{C_{s}}{C_{c}} $$
(9)

CH is the hiding capacity of the embedding algorithm, Cs is the size of the secret information, and Cc is the total size of the cover video.

Entropy is another measure used to calculate the embedding capacity of the method. It measures the total amount of information carried by the video, taking into account the information density. It also measures the randomness of the video and the values range between 0 and 1. Equation for calculating the entropy value is given in (10). M is the total intensity and N is the probability that a particular intensity will happen.

$$ E = \sum\limits_{m=1}^ M N_{m} log_{2}(N_{m}) $$
(10)

4.3 Robustness

Robustness refers to the extent to which the secret media is embedded and retrieved without any loss of information. The secret information should be communicated across the users without any loss. The robustness of the steganography method can be measured using its resistance against different noise attacks. Many methods have subjected their steganography results to these noise attacks and measured the resistance of their algorithms against these attacks.

It is common to transfer the stego video through untrusted channels like the internet, wi-fi, and satellite. When being transferred, the image can be degraded because of the inclusion of noises through external disturbance. Many methods have tested the robustness of their method by adding noise to the stego videos and checking the security. Image noises with different levels of distortion densities are included and tested. In general, four different types of image noises are considered for testing, namely, Gaussian, salt and pepper, speckle, and periodic noises.

Salt and pepper noise is the common inclusion added to the image for testing the proposed method against steganalysis attacks. This noise is also called impulse valued noise, intensity spikes, and bipolar impulse noise. Salt and pepper noise happens when the original value of the pixels is replaced statistically with corrupted values. Salt and pepper noises are prone to occur during transmission of the video, malfunctioning of the camera sensors and memory. Even with the alterations of the original pixel values, the appearance of the images does not change.

Gaussian noises are the noises based on the Gaussian distribution function. Gaussian noise is also called statistical noise and is influenced by the probability density function and the normal distribution. Gaussian noises are caused naturally because of the image fluctuations. The main cause for Gaussian noise is during the image acquisition process like the faults in the sensors, changes in illumination, peak temperatures, and other electronic circuit noises. Spatial filters are used to smooth the Gaussian noises but they may affect the image quality due to the blurring of the edges. Gaussian noises can be modeled easily in images by replacing the original values with random values produced by a mathematical model.

Apart from the salt and pepper, Gaussian noises, speckle noises, and periodic noises are the other forms of noises available. Speckle noises are unwanted changes to the image signals caused by uneven changes in the scattering surface. Speckle noises are common in Synthetic Aperture Radar imaging, and laser and acoustic images. Speckle noises are multiplicative and exist in granular patterns in the form of artifacts, blurry edges and corners, and disturbing backgrounds. Speckle noise can be modeled by multiplying the original image pixel values with random values.

Periodic noises happen due to the electro-mechanical or electrical disturbance that occurs during the image acquisition process. Images affected with periodic noises resemble as if a layer of repeated image spikes are added to the original image. Passing the images with periodic noise through frequency domain filters can reduce the noise considerably. However, the level and type of frequency filter depend on the application. From the video steganography perspective, these noises are modeled mathematically and introduced on the stego image. The efficiency of the proposed steganalysis method is determined by its ability to detect and decode the secret message in the noise-infused stego image.

Another metric to measure the robustness of the embedding is the Bit-Error Rate (BER). BER is the metric used to measure the amount of distortions on an image during manipulation. Bit-Error rate and the Signal Noise Ratio (SNR) are inversely proportional. High values of SNR indicate higher similarity between the video transmitted and the video received. However, BER will be less when the SNR is high. The equation for calculating the BER is given in (11).

$$ BER = \frac{{\sum}_{R,C}[I_{1}(r,c) \oplus I_{2}(r,c)]}{R*C} $$
(11)

5 Steganalysis: an overview

While steganography is the process of hiding secret information, steganalysis is the process of breaking the steganography algorithm to detect and uncover the secret information embedded. Steganalysis is classified into active steganalysis and passive steganalysis. Passive steganalysis detects the presence of the secret information alone, but active steganalysis detects and decodes/modifies the hidden secret information.

Steganalysis is important for two reasons; one for reversible steganography where steganalysis is required at the receiving end to extract the secret information embedded. Another reason for using steganalysis is to prevent the transfer of illegal information. There are numerous steganography tools easily available online which makes the need for steganalysis crucial. Even a layman can easily access the steganography tools and use them for sending confidential and prohibited information without raising any suspicions to government officials.

The need for steganalysis is substantial, however, steganalysis is not easy. With the proper selection of cover media, even the best steganalysis tool may not be able to break the steganography. Since any details about the cover media is not available and without that knowledge, breaking the steganography is difficult. Especially with video as the cover media, steganalysis can be challenging as the correlation between the videos and the features is difficult. Steganalysis methods are developed under the notation that the characteristics and features of the cover media are modified when the secret information is ingrained. Many steganalysis methods work by comparing the features and other characteristics between the stego object and the cover object.

A thorough analysis of the existing steganography method and its advancement is required to formulate the steganalysis tool. A good steganalysis method should be able to break different steganography attacks and some of the existing steganalysis methods. Steganalysis is divided into two types, namely, specific steganalysis and universal steganalysis. Specific steganalysis methods are developed to deal with a particular type of steganography method. For example, reversible steganography is a specific steganalysis method, since this method can only break a particular steganography method and may not be able to work efficiently with other steganography methods. On the other hand, universal steganalysis methods aim at breaking all steganography methods. Specific steganalysis methods are possible, whereas, universal steganalysis methods are difficult to implement. The development of universal steganalysis methods should be in the direction of future works. Efforts for developing a single software that is capable of breaking any type of steganography should be invested.

5.1 Steganalysis techniques

Steganalysis methods are broadly classified into three types - signature steganalysis, statistical steganalysis, and feature-based steganalysis. Signature steganalysis, as the name suggests uses the signature left behind by the embedding method to detect the presence of the secret information. Statistical steganalysis uses statistical methods and mathematical formulas to detect and uncover secret information. Feature-based steganalysis methods extract features from the cover video and stego video for investigating the presence and thereby uncovering the secret information. A detailed branch diagram of the steganalysis techniques and their sub-branches is given in Fig. 7.

Fig. 7
figure 7

Classification of different steganalysis techniques

5.1.1 Signature steganalysis

Further, signature steganalysis methods are divided into visual attacks and structural attacks. The first and foremost feature expected from video steganography is imperceptibility. The distortion caused by the video steganography method has to be minimum, or else the traces of the hidden message may become visible to the Human Visual System (HVS). Visual attacks are the simple steganalysis technique that can break the steganography using the HVS. A stego frame and the cover frame are compared side-by-side with the naked eye to check for any visible changes [104, 125]. Though visual attacks are easy to implement, they are not reliable. Not only reliability but also the automation and the requirement of experts to perform the testing are other disadvantages of using the visual attack.

The characteristics and features of the cover video change after embedding the secret information. Some of the characteristics, features and other structural components of the stego video are taken into account to detect the presence of the secret information. This type of steganalysis attack is called the structural attack. One example is the file size comparison [52]. After embedding, the file size of the cover video is prone to changes. Similar to visual attacks, structural attacks are not reliable and experts in the domain are required. Not all tampered videos will undergo structural changes and can escape the structural attack methods.

5.1.2 Statistical steganalysis

Statistical steganalysis utilizes the values of the image pixels and analysis them for detecting the confidential content. Statistical steganalysis is pre-eminent compared to signature steganalysis. Statistical methods use the knowledge of the image pixel values and mathematical models to detect and recover the secret information. Statistical steganalysis can be grouped into Histogram analysis, Chi-Square Attacks, RS Steganalysis, LSB embedding, LSB matching Pixel-pair analysis, Bit plane analysis, JPEG compression and transform domain steganalysis.

Histogram steganalysis analysis the histogram of the cover video frame and the stego video frame to detect the presence of secret information. The histogram is a graphical representation of the pixel values of the image based on the distribution. When a cover video is manipulated to embed the secret information, the histogram of the stego video is affected, The embedding of the secret information may not be visible to the human eyes, but when the histogram is plotted and compared against the original cover video, even the slightest manipulation can be detected [32, 41, 52, 64], and [31].

The Chi-square test is a common steganalysis technique used to detect the presence of secret information. This test works by observing the similarity between the real-time event and the expected outcome. It uses the frequency distribution to determine the randomness in the videos. Lower values of the test indicate a higher degree of randomness, confirming the presence of the secret message. Higher values mean a lower degree of randomness and prove there is no tampering in the video [52, 52].

RS steganalysis is another powerful tool introduced by Jiri Fridrich et al. [25]. RS analysis is used to detect the secret information that was embedded using the LSB-based methods. It compares the pixel values of the image in the spatial domain. The selection of pixel pairs varies based on the method, sometimes the neighboring pixels are chosen and other times pixels from different blocks are chosen for comparison. These groups of pixels are called Singular groups (S) and Regular groups (R). The presence of the secret is determined by grouping the pixels based on the frequency distribution and analyzing the LSBs of the stego and the cover video. The LSBs are flipped and randomized to detect the secret message [75, 102]. RS analysis has better reliability compared to chi-square tests [106].

LSB embedding and LSB matching steganalysis methods are based on the working principle of the LSB steganography method. These steganalysis methods are popular since the usage of LSB steganography methods is wide compared to other steganography methods [92]. Transform domain steganalysis converts the image into the frequency domain with magnitude and phase. Magnitude represents the frequency count values of the image and phase represents the direction to restore the image to its original form. The commonly used approaches for transform domain are Wavelet, Fourier, and Cosine Transform [106].

5.1.3 Feature-based steganalysis

Features are an important part of an image. Feature-based steganalysis methods extract the features from the images and analyze the features to detect the presence of secret information. These features can be further used in training a classifier to automatically detect the secret information using machine learning algorithms [106]. Pixel value differencing (PVD) steganography methods hide more bits of the secret information in the smoother regions of the cover image than in the complex regions. Histogram analysis of the PVD steganography revealed a Laplace distribution. A feature-based steganalysis method is used to detect the presence of secret information. Since PVD has the Laplace distribution, the expected frequency distribution of the image is obtained using any randomness test. The expected values are compared against the observed values and the degree of similarity is calculated. If the similarity is below a certain threshold, then the image has not been tampered with, else it has some embedded information [63].

6 Discussion, challenges and future directions

The video steganography or data hiding in video sequences can be achieved in multiple ways. This work discussed various data hiding approaches proposed in the last two decades. The data hiding approaches are classified based on the data hiding venue used in each method. Among those discussed approaches, LSB substitution in the spatial domain is the simple, easy as well as predominant method employed by the researchers in the literature. Later complex methods are introduced to further enhance the performance of data hiding. However, there are many challenges to developing a precise steganography system. Further steganography is a fast-evolving field in the information security domain. Here we listed a few challenges as well as future directions for video steganography.

  • Most of the LSB-based methods discussed in the work have displayed acceptable imperceptibility with high data hiding capacity. However, the LSB-based methods are not robust enough to withstand various attacks and noises. Moreover, the state of art steganalysis methods can easily detect the LSB substitution-based modification in video sequences. Because in LSB-based approaches the modification is made directly on the raw pixel values of the frames.

  • Transforming the raw pixel values in the spatial domain to the frequency domain and hiding the secret data in the transformed frequency coefficients have displayed enhancement in security as well as robustness. DCT and DWT are two commonly used transform functions in steganography approaches for embedding in the transform domain. Different levels of DWT (first, second, third, etc.. ) were performed on the cover medium in a few methods to improve the robustness as well as security. However, applying multiple levels of DWT on the cover medium reduces the embedding capacity. In the future, the researchers can explore the effectiveness of other wavelet transforms other than DCT and DWT for data embedding. Reference [109] displayed the effectiveness of CvT over DCT and DWT. To the best of our knowledge, no other methods in the literature have used CvT function for embedding in transform coefficients.

  • In raw domain steganography, instead of serially selecting the pixel values or transform coefficients for embedding the secret data bits, the researchers employed game theory, genetic algorithm, or random number generator for random selection of pixels or transform coefficients. To an extent, the random selection of embedding venues has improved the security of the implemented method.

  • Adaptive steganography methods have been implemented in the literature for improved robustness and security. The adaptive methods employed certain artificial intelligence algorithms for detecting moving pixels and the secret data is hidden in the raw moving pixels or transformed coefficients of moving pixels. Furthermore, the skin lesions available in the video frames are utilized as the venue for data embedding. Edges of the objects available in the cover frame are also a suitable venue for data embedding. The existing methods [74, 82, 87, 104] have utilized conventional methods for detecting moving objects or skin lesions. The latest deep convolutional neural network-based methods can be employed in the future for effectively detecting moving objects and skin lesions. Moreover, video summarization techniques [54,55,56] can be utilized to identify key frames, and later the detected keyframes can be used as the data hiding venue.

  • The basic features of the steganography method are higher imperceptibility, higher security and robustness, and higher embedding capacity. But, it is not possible practically to achieve all the features. A threshold for the trade-off between imperceptibility, security, robustness, and capacity should be developed for practical use [28, 38] and [111]. To achieve better hiding capacity, more secret bits are embedded in the cover media, which may lead to the exposure of the presence of the secret media. Increasing the hiding capacity has compromised security, robustness, and imperceptibility. Based on the application scenario, which feature of the steganography algorithm can be compromised for the betterment of the other features can be decided.

  • Artificial intelligence is a standard framework acclaimed in many computer vision and other multimedia applications. Steganography is an image/video reconstruction technique where the main goal is to reconstruct a stego frame which is a combination of the cover and secret media. Artificial intelligence methods like machine learning , neural networks are extensively used in image steganography and have proved to improve the imperceptibility, robustness, and computational cost. AI methods are optimal and provide adaptive solutions. However, AI methods are used in the field of image steganography method [110, 111]and only a handful of research [35, 79, 123] has focused on video steganography. Due to the generality of image steganography, the AI technology applied to image steganography has great reference value for video steganography.

  • General Adversarial Networks (GAN) are powerful artificial intelligence network used in image reconstruction field [27]. Image steganography using GAN is popular and has achieved a greater performance with increased security, robustness, and capacity [119, 120], and [58]. Using GAN for video steganography is a field that is still not explored. More studies focusing on utilizing the GAN architectures for hiding secret information inside cover videos can be concentrated. Another method that can be explored is the coverless steganography where the cover object is generated or selected from the database based on the secret information [73] and [19]. Coverless steganography has added advantage as there is no need to transfer the original cover object for steganalysis.

  • Robustness of the steganography method is measured by analyzing the resistance of the method against different noise attacks, compression attacks, and video/image manipulation attacks. The security of the method is measured by evaluating the resistance against steganalysis attacks. The proposed steganography method is subjected to these robustness attacks and against certain famous steganalysis techniques. The results are reported and analysed to measure the robustness and security of the steganography method [2, 57, 67, 88] and [137]. However, there is no assurance the proposed method will be resistant to all possible robustness and steganalysis attacks. Moreover, many video steganography methods have not reported their resistance against robustness and security attacks. In that case, it makes it difficult to honestly judge the efficiency of the steganography method. There is no unified metric to compare the performance of the different methods. The evaluation metric used is based on the convenience of the authors, as there is no established evaluation metric for steganography.

  • In the literature, several methods [67, 81, 85, 89] (including both row domain-based methods and compressed domain-based methods) leveraged the merits of cryptography and error-correcting codes for enhancing security as well as robustness. Encrypting the secret data before embedding is widely adopted for securing the secret data in video steganography methods. Integration of encryption schemes provides an additional security layer. However, the integration of encryption along with steganography will make the whole encoding and decoding process time-consuming. And these approaches are computationally expensive compared to the methods which just implemented the steganography method alone. Most of the ensemble methods proposed in the literature have not addressed or evaluated the time consumption issue. To achieve real-time encoding and decoding, future works can be focused on the parallelization of encryption and steganography methods. Or efficient secret sharing schemes [26, 51] can be integrated with steganography.

  • The main consideration in video steganography is choosing a proper cover media. Signature steganalysis methods use the visible changes in the video to detect the presence of secret information. File size comparison is a technique used in signature steganalysis to compare the size of the cover video and the stego video. Bitrate is a measure to check the change in the bit rate after embedding. It is the difference between the bit rate before embedding and after embedding. It is not possible to maintain the same file size after embedding by all the steganography methods. The optimal prediction of the original cover video is destroyed after embedding. This increases the bitrate. The bitrate increase is more with fast motion videos and complex textural videos [93]. The changes in the bitrate can thus be controlled by carefully selecting the cover video.

  • Reversible steganography is a steganography method where a steganalysis method to break the steganography method is developed alongside [20]. It has a sender-receiver kind of architecture with a steganography algorithm placed at the sender side and steganalysis at the receiving end. Reversible steganography is common in image steganography, however, video steganography methods do not focus on the steganalysis algorithm design. More focus is given to video steganography methods only, which makes the implementation of the sender-receiver architecture for videos. Video steganalysis methods are still not explored extensively. More attention to video steganalysis methods, and reversible video steganography can be given to developing better techniques.

  • Specific steganalysis methods are easy to develop and universal methods are tough to develop. Universal steganalysis methods should be able to break the presence of secret information embedded using any kind of steganography method without the knowledge of the technique used. More universal steganalysis methods can be designed where one steganalysis method can work universally to detect the stego video irrespective of the steganography method with less computational work [92].

  • Any digital video sequence can be used as the dataset for hiding the secret data and evaluating the proposed method. Most of the existing video steganography methods have used public video datasets available for different computer vision, multimedia, and machine learning tasks for their evaluation. Among those video trace library [107] , UCI [18], YFCC100M [118], and PETs 2009 [24] are the popular datasets used in video steganography tasks. The video steganography research domain still lacks a unified large dataset that is particularly developed for the evaluation of the data embedding problems.

  • Encrypting the secret information is one way of adding an additional layer of security. Another way can be to perform bifold steganography where the steganography method is applied twice. Once to create the stego object from the original cover and secret information. The same steganography method can be applied once more on the stego object to provide additional security. The feasibility of using the bifold steganography can be studied. The computational cost and time, the imperceptibility, security and robustness, and the practical use of the method can be reported.

  • The steganography methods can be classified into blind and non-blind methods based on the information required to decode the secret data. The blind methods do not require original cover media for the decoding task. On the other hand, the non-blind method requires original cover media for decoding the secret data. Thus non-blind methods have to transmit the original cover media along with the cover media holding secret data to the receiver side. Sending multiple copies of cover media makes the attacker suspicious and may expose the existence of secret data. In this context, blind steganography methods are more secure compared to non-blind methods.

  • Steganography is a very powerful tool that can help in communicating secret confidential information between parties. With the advancements, these tools can be easily exploited by terrorists and other anti-government bodies. Government should provide more regulations and restrictions on these kinds of tools to prevent them from falling into the hands of people with wrongful intentions.

7 Conclusion

This work provides a comprehensive summary of recent improvements in video steganography systems. Based on the data hiding venues, we classified the existing methods into different categories. Along with the detailed explanation of various video steganography methods proposed in the last decades, this work also looked at its benefits and drawbacks. Further, this work discussed various features to be considered while designing an efficient and effective data hiding method. A brief introduction to steganalysis techniques is also provided in this work. The article is concluded by discussing various challenges and potential research directions for future research in video steganography.