Optimal keyframe selection-based lossless video-watermarking technique using IGSA in LWT domain for copyright protection

Video piracy is a challenging issue in the modern world. Approximately 90%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$90\%$$\end{document} of newly released films were illegally distributed around the world via the Internet. To overcome this issue, video watermarking is an effective process that integrates a logo in video frames as a watermark. Therefore, this paper presents an efficient lossless video-watermarking scheme based on optimal keyframe selection using an intelligent gravitational search algorithm in linear wavelet transform. This technique obtains color motion and motionless frames from the cover video by the histogram difference method. One-level linear wavelet transform is performed on the chrominance channel of motion frames and a low-frequency sub-band LL opts for watermark embedding. The performance of the proposed technique has been evaluated against 12 video processing attacks in terms of imperceptibility and robustness. Experiments demonstrate that the proposed technique outperforms five state-of-the-art schemes on the considered attacks.


Introduction
In recent years, the unauthorized users can easily access the digital media content (image, audio, and video). These content are illegally copied, manipulated, and distributed [1,2] across the globe over the internet. Around 90% newly released movies are illegally recorded by the camcorder device and distributed via the internet globally [3]. The Dark Knight movie is one example of video piracy whose 7 million copies were delivered illegally in 6 months after its release [4]. As a result, copyright and content security are the most prevalent issues in the modern world. To address the same, video watermarking is one of the promising solutions which marking method in concealing a watermark into the cover video frames, while robustness measures the efficiency of a watermarking process in recovering the watermark from video frames after video attacks. Moreover, payload capacity refers to the number of bits contained within each frame. The complexity of a watermarking method often increases with the increased payload capacity. In the literature, it has been witnessed that these three parameters are contradictory and confined with each other [6]. Therefore, it is a difficult task to develop an effective watermarking system that is efficient in terms of these parameters along with low computational complexity.
In the literature, many video-watermarking schemes are presented to achieve the objectives mentioned above. Table  1 lists some of the popular video-watermarking schemes. It can be observed that most schemes perform watermark embedding in DWT, DCT, and SVD domains [9][10][11][12][13][14][15]. Generally, such schemes are computationally complex [28] and unable to recover the lossless watermark image due to shiftvariant property [29]. Moreover, few schemes are based on linear wavelet transform (LWT) [17][18][19]. The LWT-based video-watermarking schemes are resistant to both image processing attacks and temporal video attacks such as impulse noise, Gaussian noise, filtering, compression attacks, but fails against geometric attacks [25][26][27][28]30]. Further, it can be envisioned in Table 1 that existing video-watermarking schemes incorporate watermark in the non-motion frame of luminance components which results in poor imperceptibility [31]. Moreover, most video watermarking schemes employ single scaling factors (SEF) approach that significantly affects the balance between imperceptibility and robustness. To attain a better equilibrium between imperceptibility and robustness, the integration of video-watermarking schemes with multiple scaling factors (MSF) is a promising solution [20][21][22]24,32]. However, the selection of the optimum value of MSF is an NP-complete problem [33] which can be solved by employing meta-heuristic algorithms.
Meta-heuristic algorithms are optimization algorithms that imitate the optimization behavior of natural phenomena [34,35]. Gravitational search algorithm (GSA) [36] is one of such meta-heuristic algorithms inspired by the Newtonian law of gravity. In GSA, the optimal solution is obtained through a collection of objects which co-ordinates with each other according to the law of gravity and law of motion [37]. In comparison to different meta-heuristic algorithms, GSA has a low computational cost and high convergence rate [38]. In addition, GSA has been broadly acknowledged in the literature for multimodal challenges, notably for clustering applications [39]. Moreover, GSA advantages in finding the best solution using the current positions only, and therefore, it is considered as a memory-less algorithm [36,40]. However, due to the lack of demographic diversity and an inappropriate balance between exploration and exploitation, it often stagnates into local optima [41]. In the literature, researchers have proposed several variants of GSA. Liu et al. [42] suggested dynamically adapting inertia factors for improving position updation. Further, Olivas et al. [43] introduced an interval type-2 fuzzy system-based modified variant of GSA which improves the exploitation and exploration of the search space. The adaptive GSA (AGSA) presented by Mirjalili et al. [44] allows the exploitation of the GSA to be modified based on the current situation. A variation of GSA called an exponential kbest gravitation search algorithm (eKGSA) had been introduced by Mittal et al. [45] for optimal thresholds for multi-level image segmentation. The hierarchical gravitation search algorithm proposed by Wang et al. [46] deals with premature convergence and low search capacity. Rawal et al. [47] presented fast convergent GSA which utilizes a sigmoidal function and an exponential step size to accelerate convergence and exploitation. Recently, Mittal et al. [48] presented a new variant of GSA, intelligent gravitational search algorithm (IGSA), which outperformed GSA regarding convergence rate and solution precision.
Therefore, the key contributions of the paper are twofold, (1) a new video-watermarking technique has been proposed, termed as a lossless video-watermarking technique using intelligent gravitational search algorithm and Hessenberg transform in linear wavelet transform (IGSA-LH) and (2) to attain equilibrium between imperceptibility and robustness, intelligent gravitational search algorithm (IGSA) has been leveraged to acquire an optimal set of multiple scaling factors. For experimental analysis, the proposed technique has been evaluated on four standard benchmark videos against 12 image and video attacks in terms of imperceptibility parameters, namely mean peak signal to noise ratio (MPSNR) and mean structural similarity index (MSSIM) and robustness parameter, i.e., mean normalized correlation (MNC). Further, the obtained results are compared with five existing video watermarking techniques, namely Karmakar et al. [2], Bhardwaj et al. [49], Farri et al. [17], Kuraparthi et al. [24], and Agilandeeswari et al. [50].
The remaining paper is arranged in the following order. The preliminaries for the proposed technique are presented in the next section. The third section describes the proposed technique followed by the experimental findings presented in the fourth section. Finally, the last section draws the conclusion.

Linear wavelet transform
DWT is a first-generation wavelet that yields floating-point coefficients. At any time, these coefficients may alter during subsequent processing. As a result, information is lost during watermark embedding due to the truncation of floating-point pixel values. LWT replaces the up and downsampling of DWT with split and merges into each level, resulting in a nonlinear wavelet transform. Because of the split and merge procedure, the computational complexity is nearly half of DWT [51]. To alleviate the same, the linear wavelet transform (LWT) is an extended version of DWT based on second-generation wavelets [52]. LWT splits an image into low-frequency (LL) and high-frequency (LH, HL, HH) subband. In contrast to DWT, LWT transforms the value of an image pixel from integers to integers, resulting in lossless, computationally faster, and reliable execution. There are three key steps, namely split, predict, and update which are discussed below. The complete procedure of LWT is illustrated in Fig. 1.

Splitting:
Consider an image F(m, n) that is split into two sections, i.e, even (F e (m, n)) and odd (F o (m, n)) which are defined by Eqs. (2) and (3), respectively: 2. Dual lifting (Predict): Generally, the prediction operator (P ) merges many even parts and applies the resulting value to the actual odd part. The odd part is evaluated using a prediction operator from the local neighborhood's even coefficients. The error in odd part prediction defined by Eq. (4) yields high-frequency coefficients (h(m, n)) and updated value of odd part is given by Eq. (5): 3. Primal lifting (update): Similarly, the low-frequency coefficients (h (m, n)) are produced by modifying the even part along with the updating value (U h (m, n)) which is given by Eq. (6). Further, U h (m, n) is updated through F(m, n) using Eq. (7):

Arnold transform
Arnold transform (AT) is a technique for scrambling images that improve protection and identifies their true owner [52]. The AT is used to scramble a watermark logo before embedding. It shuffles the pixel positions to create the new chaotic image iteratively, resulting in a scrambled image. An unauthorized user cannot retrieve the watermark logo from the watermarked image without knowing the security key. The AT is equated to a square watermark logo of dimension R 2 × R 2 as depicted in Eq. (8): where (a, b) represents the watermark pixel values and (a 0 , b 0 ) represents the scrambled image pixel values at ith iteration. The inverse Arnold transform (IAT) recovers the watermark logo through Eq. (9):

Hessenberg transform
The Hessenberg transform (HT) is a procedure for factorizing a general matrix (S) using orthogonal similarity transformations [53,54]. The HT of a matrix (S) is given by Eq. (10): where Q and H are orthogonal and Hessenberg matrix, respectively, such that h i j = 0 and i > j + 1. Usually, Hessenberg transform is performed on household matrices. The Householder matrix (P) is an orthogonal matrix defined by Eq. (11): where I r and u are m × m identity matrix and nonzero vector in R m , respectively. The overall procedure consists of m − 2 steps for a matrix (S) of size m ×m. Therefore, H is computed by Eq. (12): where Q = P1P2 · P m−3 P m−3 . For example, HT of a matrix (S) of size 4 × 4 is computed as follows: After performing HT on a matrix (S), matrix (Q) and matrix (H) are given as follows:

Intelligent gravitational search algorithm (IGSA)
Mittal et al. [48] proposed a variant of the gravitational search algorithm, intelligent gravitational search algorithm (IGSA), to improve the solution precision. IGSA focuses on enhancing the exploitation ability among the objects. To do so, the position equation is modified by including the global best solution (gBest) and global worst solution (gWorst). In IGSA, the attraction of objects towards g Best is proportionated with gWorst. Mathematically, the modified position equation of IGSA for an ith object in dth dimension at tth iteration is depicted in Eq. (18): where r (t) ∈ [0, 1] and ρ is a constant whose value is taken as 0.9.

Proposed technique
The proposed technique, termed as a lossless video-watermarking technique using intelligent gravitational search algorithm and Hessenberg transform in linear wavelet transform (IGSA-LH), is explained in the following four sections: identification of motion frames and keyframes ("Identification of motion frames and keyframes"), embedding process ("Embedding process"), extraction process ("Extraction process"), and finding optimal scaling factors through IGSA ("Selection of multiple scaling factors using IGSA algorithm"). The overall procedure of the proposed technique is illustrated in Fig. 3.

Identification of motion frames and keyframes
Motion frames are identified by performing the histogram difference method on cover video [55]. The complete keyframe selection procedure is illustrated in Fig. 4. In this method, the absolute histogram difference of two video frames back to back is calculated. If the absolute histogram difference between consecutive video frames is greater than a predefined threshold, a frame is identified as a motion frame. Similarly, keyframes are detected from the selected motion frames followed by entropy. Suppose, the single motion frame entropy is greater than the average entropy of motion frames then that motion frame is identified as a keyframe. Figure 5 depicts the identified keyframes corresponding to each cover video. A detailed workflow for identifying keyframes are referred to motion frames which is given below: Compute the sum of the absolute histogram differences (h1 − h2) between two adjacent frames. 3. The sum of the absolute differences is compared to a predefined threshold [55]. If the sum exceeds the threshold, the frame is referred as the motion frame (MF). 4. Continue until last frame of video. 5. Calculate the entropy of all the selected motion frames. 6. If the single frame entropy is greater than the compared average entropy of motion frames then that frame is selected as keyframe (k f ).

Embedding process
In this section, a scrambled watermark is incorporated into the keyframes. The following steps are considered while embedding watermark: 1. Convert keyframe (K f ) into YUV color space and select chrominance component (K v ) by Eq. (19): 3. Perform HT on sub-band LL v using Eq. (21): 4. Select watermark logo (w) of size (R 2 × R 2 ) and perform AT to obtain scrambled watermark (s w ) for security enhancement purpose. 5. Scrambled watermark (s w ) is decomposed into four subbands by performing 1-level LWT using Eq. (22) and sub-band LL w is selected for further processing: 6. Perform HT on sub-band LL w using Eq. (23): 7. Apply IGSA to obtain the set of MSF factors (α) according to the procedure detailed in "Selection of multiple scaling factors using IGSA algorithm". 8. Embed the component H w into H v using Eq. (24): 8. Apply inverse HT (IHT) on modified matrix (H v ) to obtain a Hessenberg embedded matrix (L L v ) by Eq. (25): 9. Apply inverse LWT (ILWT) on matrix (L L v ) to obtain the watermarked video frames (W v ).
10. Finally, convert all YUV watermarked video keyframes into RGB watermarked video keyframes and merges with rest video frames to obtain watermarked video (V ).

Extraction process
In extraction, the reverse operation of the embedding process is performed to recover the watermark logo and is presented below: 1. Divide the RGB watermarked video (V ) into k frames, namely f 1 , f 2 , . . . , f k , and extract keyframes (k f ). 2. Convert the watermarked video keyframes (k f ) into YUV color space and select chrominance component (27) and consider sub-band (LL v ) for extraction purpose 4. Apply HT on sub-band LL v of watermarked video keyframes by Eq. (28): 5. Extract watermark logo using Eq. (29): 7. Apply ILWT on matrix (LL w ) to obtain the extracted and scrambled watermark (LL ew ) using Eq. (31): 8. Again, Apply IAT to recover the watermark logo (w ).

Selection of multiple scaling factors using IGSA algorithm
The scaling factor is a critical parameter in watermark embedding. It regulates imperceptibility and robustness simultaneously. However, to maintain the trade-off between the above two parameters, a single scaling factor completely fails. The IGSA algorithm is employed to identify the optimal set of MSF through an objective function given in Eq. (32). The where MSSIM ( f ,f ) and MNC ( f ,f ) evaluate mean structural similarity index and mean normalized correlation between cover video frames ( f ) and watermarked video frames (f ). MNC (w,w) measures the mean normalized correlation between the watermark logo (w) and extracted watermark logo (w), and K denotes the total number of performed attacks. 3. Each individual of IGSA is updated according to Eq. (18). 4. This process is continued until the maximum iterations are reached.

Results and discussions
Experiments are simulated on MATLAB 2020a on a 2.5 GHz, I5 processor, and 8 GB RAM system. The efficacy of the proposed technique (IGSA-LH) has been evaluated on four standard benchmark videos: Silent, Foreman, Mobile, and Hall monitor in terms of imperceptibility and robustness. Imperceptibility is examined between cover video and water-  Fig. 7 The quality of watermarked video frames and extracted watermark logos after various attacks marked video using MPSNR and MSSIM, while robustness is evaluated between watermark logo and extracted watermark logo using MNC. The cover videos and watermark logo are taken from the online available database [56,57]. The parameter settings of the considered techniques are taken from the respective literature. The experimental results are studied in the following sections: "Imperceptibility analy-sis of the proposed technique" analyzes the imperceptibility while robustness is examined against supposed attacks in "Robustness analysis of the proposed technique". "Performance analysis against existing techniques" presents the comparative analysis of the proposed technique (IGSA-LH) with five recent video-watermarking schemes under considered attacks in terms of MPSNR, MSSIM, and MNC values.
In addition, statistical validation of the proposed technique is discussed in "Statistical Analysis of the proposed technique". Finally, "Comparative analysis of time complexity" provides the comparative analysis of the considered techniques in terms of time complexity.

Imperceptibility analysis of the proposed technique
The imperceptibility evaluates the quality of watermarked video frames. The imperceptibility is assessed between cover video frames ( f ) and watermarked video frames (f ). The performance of the proposed technique (IGSA-HT) has been examined on all considered four videos in terms of MPSNR and MSSIM which are defined by Eqs. (33) and (34), respectively. Figure 6 shows the quality of watermarked video frames which are quite similar to keyframes depicted in Fig.  5 where F denotes the total number of video frames.

Robustness analysis of the proposed technique
The quality of extracted watermark logo under the various image and video attacks is measured by robustness. The robustness is assessed between the original watermark logo and extracted watermark logo in terms of MNC which is defined by Eq.   Fig. 7. It can be observed from the figure that the proposed technique recovers a superior quality watermark against each attack except B h a r d w a je ta l . [ 49] Farri et al. [17] Kuraparthi et al. [24] Agilandeeswari et al. [

Performance analysis against existing techniques
The proposed technique (IGSA-LH) has been compared with five recent video-watermarking techniques, namely Karmakar et al. [2], Bhardwaj et al. [49], Farri et al. [17], Kuraparthi et al. [24], and Agilandeeswari et al. [50]. "Comparative analysis of imperceptibility" discusses the imperceptibility performance of the proposed technique against existing techniques in terms of MPSNR and MSSIM. While "Comparative analysis of robustness" studies the robustness of the considered techniques in terms of MNC parameters on 12 video attacks.  Fig. 8. Tables 4,5,6,7,8,9,10,11,12 and 15 illustrate the comparative analysis of robustness of the considered techniques against 12 different video attacks.
These results confirm that the average MNC value of the IGSA-LH technique is superior to the compared schemes that show the IGSA-LH technique's robustness under wiener filtering attack. (iii) Gaussian filtering attack: Gaussian filtering with kernel size 3 × 3 is encountered on watermarked video frames. Table 6 presents the average MNC values as (0.9992, 0.9837, 0.9965, 0.9990, 0.9910, 0. Moreover, the average MNC values of all the considered videos corresponding to each attack against considered schemes are also presented graphically for better visualization, shown in Fig. 9. It can be illustrated from the figure that the proposed technique outperforms all considered attacks except noise attacks. Therefore, the proposed technique is resilient to considered attacks.

Statistical analysis of the proposed technique
To statistically validate the performance of the proposed technique (IGSA-LH), a non-parametric Friedman's test is performed for each considered performance parameters, i.e., MPSNR, MSSIM, and MNC. The test contains two hypothesis, i.e., the null hypothesis (H 0 ) and the alternative hypothesis (H 1 ). As per the H 0 , all parameter values generated by comparative methods are significantly equal, while H 1 says that comparative methods are significantly different. The p value return by test is 0.01 for 30 different executions which is less than considered significant level (α = 0.05). Therefore, it can be stated that H 0 fails and the obtained results are significantly different. Further, Table 16 depicts the ranking of the considered techniques for each parameter, respectively, where the proposed technique ranked first. It can be confirmed from the tables that the proposed technique is statistically better than the considered techniques in terms of each parameter.

Comparative analysis of time complexity
The proposed technique has also been compared with five recent state-of-the-art schemes in terms of embedding and extraction time for 300 video frames of each considered video.

Conclusion
This paper presents a lossless efficient video-watermarking technique based on an optimal keyframes selection using IGSA and HT in the LWT domain. In this scheme, a scrambled watermark logo is incorporated into the keyframes followed by a one-level LWT. IGSA algorithm acquires a set of MSF which maintains the equilibrium between imperceptibility and robustness. The security of the IGSA-LH technique has been improved by performing Arnold transform on the watermark logo prior to embedding. The experimental results were validated against 12 video attacks and compared five recent state-of-the-art schemes. The comparative analysis of imperceptibility and robustness validate that the IGSA-LH technique is quite resilient to attacks. In future work, the performance of the proposed technique can be evaluated on color watermark logos over different video attacks along with various parameters. Further, the elliptic curve cryptography (ECC) technique can be applied to a proposed technique for security enhancement. Moreover, the applicability of the proposed technique can be extended to big-data applications. Bhardwaj et al. [49] Farri et al. [17] Kuraparthi et al. [24] Agilandeeswari et al. [50] Silent  Table 5 The robustness (MNC) of the considered techniques under wiener filtering attack for considered videos Cover video IGSA-LH Karmakar et al. [2] B h a r d w a j e t a l . [ 49] Farri et al. [17] Kuraparthi et al. [24] Agilandeeswari et al. [50] Silent  Table 6 The robustness (MNC) of the considered techniques under Gaussian filtering attack for considered videos Cover video IGSA-LH Karmakar et al. [2] Bhardwaj et al. [49] Farri et al. [17] Kuraparthi et al. [24] Agilandeeswari et al. [ B h a r d w a j e t a l . [ 49] Farri et al. [17] Kuraparthi et al. [24] Agilandeeswari et al. [ Bhardwaj et al. [49] Farri et al. [17] Kuraparthi et al. [24] Agilandeeswari et al. [ B h a r d w a j e t a l . [ 49] Farri et al. [17] Kuraparthi et al. [24] Agilandeeswari et al. [ Bhardwaj et al. [49] Farri et al. [17] Kuraparthi et al. [24] Agilandeeswari et al. [ B h a r d w a j e t a l . [ 49] Farri et al. [17] Kuraparthi et al. [24] Agilandeeswari et al. [ Bhardwaj et al. [49] Farri et al. [17] Kuraparthi et al. [24] Agilandeeswari et al. [ B h a r d w a j e t a l . [ 49] Farri et al. [17] Kuraparthi et al. [24] Agilandeeswari et al. [ Bhardwaj et al. [49] Farri et al. [17] Kuraparthi et al. [24] Agilandeeswari et al. [50]   B h a r d w a j e t a l . [ 49] Farri et al. [17] Kuraparthi et al. [24] Agilandeeswari et al. [ Bhardwaj et al. [49] Farri et al. [17] Kuraparthi et al. [24] Agilandeeswari et al. [ Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecomm ons.org/licenses/by/4.0/.