Introduction

In recent years, the unauthorized users can easily access the digital media content (image, audio, and video). These content are illegally copied, manipulated, and distributed [1, 2] across the globe over the internet. Around \(90\%\) newly released movies are illegally recorded by the camcorder device and distributed via the internet globally [3]. The Dark Knight movie is one example of video piracy whose 7 million copies were delivered illegally in 6 months after its release [4]. As a result, copyright and content security are the most prevalent issues in the modern world. To address the same, video watermarking is one of the promising solutions which embeds information into video frames for copyright protection [5].

Many watermarking algorithms have been proposed over the last decade and have been classified based on specific characteristics. In general, video-watermarking methods operate in two domains, namely spatial and frequency. In the spatial domain, a watermark is directly embedded by modifying the pixels. Approaches of this domain have low computational complexity. However, these methods have poor data-hiding ability and low robustness [6, 7]. On the contrary, frequency domain methods are pretty much efficient in this respect. In the frequency domain, the watermark is integrated with the wavelet coefficients of video frame. Some of the popular frequency domain methods are discrete Fourier transform (DFT), singular value decomposition (SVD), discrete cosine transform (DCT), and discrete wavelet transform (DWT). However, DWT is often employed due to its multi-resolution capabilities. This domain methods accomplish high payload capacity, imperceptibility, robustness, and more secure [6]. Further, This domain methods are computationally expensive as compared to spatial domain methods.

Generally, researchers focus on the three parameters of a video watermarking method, i.e., imperceptibility, robustness, and payload capacity [8]. In video watermarking, imperceptibility corresponds to the effectiveness of a watermarking method in concealing a watermark into the cover video frames, while robustness measures the efficiency of a watermarking process in recovering the watermark from video frames after video attacks. Moreover, payload capacity refers to the number of bits contained within each frame. The complexity of a watermarking method often increases with the increased payload capacity. In the literature, it has been witnessed that these three parameters are contradictory and confined with each other [6]. Therefore, it is a difficult task to develop an effective watermarking system that is efficient in terms of these parameters along with low computational complexity.

Table 1 Comparative description of video-watermarking schemes in frequency domain

In the literature, many video-watermarking schemes are presented to achieve the objectives mentioned above. Table 1 lists some of the popular video-watermarking schemes. It can be observed that most schemes perform watermark embedding in DWT, DCT, and SVD domains [9,10,11,12,13,14,15]. Generally, such schemes are computationally complex [28] and unable to recover the lossless watermark image due to shift-variant property [29]. Moreover, few schemes are based on linear wavelet transform (LWT) [17,18,19]. The LWT-based video-watermarking schemes are resistant to both image processing attacks and temporal video attacks such as impulse noise, Gaussian noise, filtering, compression attacks, but fails against geometric attacks [25,26,27,28, 30]. Further, it can be envisioned in Table 1 that existing video-watermarking schemes incorporate watermark in the non-motion frame of luminance components which results in poor imperceptibility [31]. Moreover, most video watermarking schemes employ single scaling factors (SEF) approach that significantly affects the balance between imperceptibility and robustness. To attain a better equilibrium between imperceptibility and robustness, the integration of video-watermarking schemes with multiple scaling factors (MSF) is a promising solution [20,21,22, 24, 32]. However, the selection of the optimum value of MSF is an NP-complete problem [33] which can be solved by employing meta-heuristic algorithms.

Meta-heuristic algorithms are optimization algorithms that imitate the optimization behavior of natural phenomena [34, 35]. Gravitational search algorithm (GSA) [36] is one of such meta-heuristic algorithms inspired by the Newtonian law of gravity. In GSA, the optimal solution is obtained through a collection of objects which co-ordinates with each other according to the law of gravity and law of motion [37]. In comparison to different meta-heuristic algorithms, GSA has a low computational cost and high convergence rate [38]. In addition, GSA has been broadly acknowledged in the literature for multimodal challenges, notably for clustering applications [39]. Moreover, GSA advantages in finding the best solution using the current positions only, and therefore, it is considered as a memory-less algorithm [36, 40]. However, due to the lack of demographic diversity and an inappropriate balance between exploration and exploitation, it often stagnates into local optima [41]. In the literature, researchers have proposed several variants of GSA. Liu et al. [42] suggested dynamically adapting inertia factors for improving position updation. Further, Olivas et al. [43] introduced an interval type-2 fuzzy system-based modified variant of GSA which improves the exploitation and exploration of the search space. The adaptive GSA (AGSA) presented by Mirjalili et al. [44] allows the exploitation of the GSA to be modified based on the current situation. A variation of GSA called an exponential kbest gravitation search algorithm (eKGSA) had been introduced by Mittal et al. [45] for optimal thresholds for multi-level image segmentation. The hierarchical gravitation search algorithm proposed by Wang et al. [46] deals with premature convergence and low search capacity. Rawal et al. [47] presented fast convergent GSA which utilizes a sigmoidal function and an exponential step size to accelerate convergence and exploitation. Recently, Mittal et al. [48] presented a new variant of GSA, intelligent gravitational search algorithm (IGSA), which outperformed GSA regarding convergence rate and solution precision.

Therefore, the key contributions of the paper are twofold, (1) a new video-watermarking technique has been proposed, termed as a lossless video-watermarking technique using intelligent gravitational search algorithm and Hessenberg transform in linear wavelet transform (IGSA-LH) and (2) to attain equilibrium between imperceptibility and robustness, intelligent gravitational search algorithm (IGSA) has been leveraged to acquire an optimal set of multiple scaling factors. For experimental analysis, the proposed technique has been evaluated on four standard benchmark videos against 12 image and video attacks in terms of imperceptibility parameters, namely mean peak signal to noise ratio (MPSNR) and mean structural similarity index (MSSIM) and robustness parameter, i.e., mean normalized correlation (MNC). Further, the obtained results are compared with five existing video watermarking techniques, namely Karmakar et al. [2], Bhardwaj et al. [49], Farri et al. [17], Kuraparthi et al. [24], and Agilandeeswari et al. [50].

The remaining paper is arranged in the following order. The preliminaries for the proposed technique are presented in the next section. The third section describes the proposed technique followed by the experimental findings presented in the fourth section. Finally, the last section draws the conclusion.

Fig. 1
figure 1

Linear wavelet transform

Preliminaries

Color space conversion from RGB to YUV

The RGB color space’s red (R), green (G), and blue (B) components are transformed to the YUV color space’s luminance (Y) and chrominance (U, V) components. The luminance components (Y) reflect the majority of the frame information and overall strength, while the chrominance components (U, V) indicate the color information of the frame [31]. The mathematical color conversion formula is given by Eq. (1):

$$\begin{aligned} \begin{bmatrix} {\text {Y}}\\ {\text {U}}\\ {\text {V}} \end{bmatrix}=\begin{bmatrix} 0\\ 127\\ 127 \end{bmatrix} \begin{bmatrix} 0.2989&{} 0.5866 &{}0.1145\\ -0.1688 &{}-0.3312 &{}0.5000\\ 0.5000 &{}-0.4184 &{}-0.0816 \end{bmatrix} \begin{bmatrix} {\text {R}}\\ {\text {G}}\\ {\text {B}} \end{bmatrix} \end{aligned}$$
(1)
Fig. 2
figure 2

Arnold transform at different iterations: a Watermark logo, b \({\text {itr}}=50\), c \({\text {itr}}=75\), d \({\text {itr}}=100\)

Linear wavelet transform

DWT is a first-generation wavelet that yields floating-point coefficients. At any time, these coefficients may alter during subsequent processing. As a result, information is lost during watermark embedding due to the truncation of floating-point pixel values. LWT replaces the up and downsampling of DWT with split and merges into each level, resulting in a nonlinear wavelet transform. Because of the split and merge procedure, the computational complexity is nearly half of DWT [51]. To alleviate the same, the linear wavelet transform (LWT) is an extended version of DWT based on second-generation wavelets [52]. LWT splits an image into low-frequency (LL) and high-frequency (LH, HL, HH) sub-band. In contrast to DWT, LWT transforms the value of an image pixel from integers to integers, resulting in lossless, computationally faster, and reliable execution. There are three key steps, namely split, predict, and update which are discussed below. The complete procedure of LWT is illustrated in Fig. 1.

  1. 1.

    Splitting: Consider an image F(m, n) that is split into two sections, i.e, even (\(F_e(m, n)\)) and odd (\(F_o(m, n)\)) which are defined by Eqs. (2) and (3), respectively:

    $$\begin{aligned} F_e(m, n)= & {} F(m, 2n) \end{aligned}$$
    (2)
    $$\begin{aligned} F_o(m, n)= & {} F(m, 2n+1) \end{aligned}$$
    (3)
  2. 2.

    Dual lifting (Predict): Generally, the prediction operator (\(P^\star \)) merges many even parts and applies the resulting value to the actual odd part. The odd part is evaluated using a prediction operator from the local neighborhood’s even coefficients. The error in odd part prediction defined by Eq. (4) yields high-frequency coefficients (h(mn)) and updated value of odd part is given by Eq. (5):

    $$\begin{aligned} h(m, n)= & {} F_o(m, n)-P^\star [F_e(m, n)] \end{aligned}$$
    (4)
    $$\begin{aligned} F_o(m, n)= & {} h(m, n)+P^\star [F_e(m, n)] \end{aligned}$$
    (5)
  3. 3.

    Primal lifting (update): Similarly, the low-frequency coefficients (\(h^\star (m, n)\)) are produced by modifying the even part along with the updating value (\(U_h(m, n)\)) which is given by Eq. (6). Further, \(U_h(m, n)\) is updated through F(mn) using Eq. (7):

    $$\begin{aligned} h^\star (m, n)= & {} \ F_e(m, n)+U_h(m, n) \end{aligned}$$
    (6)
    $$\begin{aligned} U_h(m, n)= & {} U[F(m, n)] \end{aligned}$$
    (7)

Arnold transform

Arnold transform (AT) is a technique for scrambling images that improve protection and identifies their true owner [52]. The AT is used to scramble a watermark logo before embedding. It shuffles the pixel positions to create the new chaotic image iteratively, resulting in a scrambled image. An unauthorized user cannot retrieve the watermark logo from the watermarked image without knowing the security key. Figure 2 depicts the watermark logo and its scrambled version at different iterations. The figure shows that as the number of iterations increase, the watermark logo scrambles efficiently. The AT is equated to a square watermark logo of dimension \(R_2\times R_2\) as depicted in Eq. (8):

$$\begin{aligned} \begin{bmatrix} a_0 \\ b_0 \end{bmatrix}= \begin{bmatrix} 1 &{}\quad 1 \\ 1 &{}\quad 1 \end{bmatrix} \begin{bmatrix} a \\ b \end{bmatrix} {\text {mod}} \ R_2, \end{aligned}$$
(8)

where (a, b) represents the watermark pixel values and \((a_0, b_0)\) represents the scrambled image pixel values at ith iteration.

The inverse Arnold transform (IAT) recovers the watermark logo through Eq. (9):

$$\begin{aligned} \begin{bmatrix} a \\ b \end{bmatrix}=\left( \begin{bmatrix} 2 &{} -1 \\ -1 &{} 1 \end{bmatrix} \begin{bmatrix} x_0 \\ y_0 \end{bmatrix}+ \begin{bmatrix} R_2 \\ R_2 \end{bmatrix}\right) {\text {mod}} R_2 \end{aligned}$$
(9)

Hessenberg transform

The Hessenberg transform (HT) is a procedure for factorizing a general matrix (S) using orthogonal similarity transformations [53, 54]. The HT of a matrix (S) is given by Eq. (10):

$$\begin{aligned} {\text {HT}}[S]=Q\times H\times Q^{\text {T}}, \end{aligned}$$
(10)

where Q and H are orthogonal and Hessenberg matrix, respectively, such that \(h_{ij}= 0\) and \(i > j + 1\). Usually, Hessenberg transform is performed on household matrices. The Householder matrix (P) is an orthogonal matrix defined by Eq. (11):

$$\begin{aligned} P=(I_r-2uu^{\text {T}})/u^{\text {T}}u, \end{aligned}$$
(11)

where \(I_r\) and u are \(m\times m\) identity matrix and nonzero vector in \(R^{m}\), respectively. The overall procedure consists of \(m-2\) steps for a matrix (S) of size \(m\times m\). Therefore, H is computed by Eq. (12):

$$\begin{aligned} H&=(P1P2\cdot P_{m-3}P_{m-2})^{\text {T}}\nonumber \\&\quad \times S(P1P2\cdot P_{m-3}P_{m-2}) \end{aligned}$$
(12)
$$\begin{aligned}&\implies H=Q^{\text {T}}\times S\times Q \end{aligned}$$
(13)
$$\begin{aligned}&\implies S=Q\times H\times Q^{\text {T}}, \end{aligned}$$
(14)

where \(Q=P1P2\cdot P_{m-3}P_{m-3}\).

For example, HT of a matrix (S) of size \(4\times 4\) is computed as follows:

$$\begin{aligned} S=\begin{bmatrix} 228 &{}\quad 39 &{}\quad 208 &{} \quad 51 \\ 245 &{}\quad 66 &{}\quad 63 &{}\quad 65 \\ 140 &{}\quad 215 &{} \quad 237 &{}\quad 158 \\ 36 &{}\quad 65 &{}\quad 90 &{} \quad 121 \\ \end{bmatrix} \end{aligned}$$
(15)

After performing HT on a matrix (S), matrix (Q) and matrix (H) are given as follows:

$$\begin{aligned} Q= & {} \begin{bmatrix} 1.0000 &{} \quad 0 &{} \quad 0 &{} \quad 0 \\ 0 &{}\quad -0.8613 &{}\quad 0.5039 &{}\quad 0.0656 \\ 0 &{} \quad -0.4921 &{} \quad -0.7950 &{}\quad -0.3546 \\ 0 &{} \quad -0.1266 &{} \quad -0.3377 &{} \quad 0.9327 \\ \end{bmatrix} \end{aligned}$$
(16)
$$\begin{aligned} H= & {} \begin{bmatrix} 228.0000 &{}\quad -142.4106 &{}\quad -162.9320 &{}\quad -23.6301 \\ -284.4662 &{} \quad 255.7505 &{}\quad 109.1450 &{}\quad -85.5923 \\ 0 &{}\quad 246.4589 &{}\quad 113.4365 &{}\quad -68.8549 \\ 0 &{} \quad 0 &{}\quad -29.5251 &{}\quad 54.8130 \\ \end{bmatrix}\nonumber \\ \end{aligned}$$
(17)
Fig. 3
figure 3

The proposed video-watermark embedding and extraction process

Intelligent gravitational search algorithm (IGSA)

Mittal et al. [48] proposed a variant of the gravitational search algorithm, intelligent gravitational search algorithm (IGSA), to improve the solution precision. IGSA focuses on enhancing the exploitation ability among the objects. To do so, the position equation is modified by including the global best solution (gBest) and global worst solution (gWorst). In IGSA, the attraction of objects towards gBest is proportionated with gWorst. Mathematically, the modified position equation of IGSA for an ith object in dth dimension at tth iteration is depicted in Eq. (18):

$$\begin{aligned} x_i^d(t+1)= & {} x_i^d(t) + v_i^d(t+1) \nonumber \\+ & {} \underbrace{r(t) \times \frac{({\text {gBest}}^d(t)-x_i^d(t))}{|(\rho \times {\text {gWorst}}^d(t))-x_i^d(t))|}}_{{\text {Intelligent component}}}, \end{aligned}$$
(18)

where \(r(t) \in [0, 1]\) and \(\rho \) is a constant whose value is taken as 0.9.

Fig. 4
figure 4

The workflow of identification of motion frames and keyframes

Fig. 5
figure 5

The extracted keyframes corresponding to a Silent, b Foreman, c Mobile, d Hall monitor

Proposed technique

The proposed technique, termed as a lossless video-watermarking technique using intelligent gravitational search algorithm and Hessenberg transform in linear wavelet transform (IGSA-LH), is explained in the following four sections: identification of motion frames and keyframes (“Identification of motion frames and keyframes”), embedding process (“Embedding process”), extraction process (“Extraction process”), and finding optimal scaling factors through IGSA (“Selection of multiple scaling factors using IGSA algorithm”). The overall procedure of the proposed technique is illustrated in Fig. 3.

Identification of motion frames and keyframes

Motion frames are identified by performing the histogram difference method on cover video [55]. The complete keyframe selection procedure is illustrated in Fig. 4. In this method, the absolute histogram difference of two video frames back to back is calculated. If the absolute histogram difference between consecutive video frames is greater than a predefined threshold, a frame is identified as a motion frame. Similarly, keyframes are detected from the selected motion frames followed by entropy. Suppose, the single motion frame entropy is greater than the average entropy of motion frames then that motion frame is identified as a keyframe. Figure 5 depicts the identified keyframes corresponding to each cover video. A detailed workflow for identifying keyframes are referred to motion frames which is given below:

  1. 1.

    The cover video (V) is divided into k frames of size \(R_1\times R_1\), namely \(f_1, f_2, \ldots , f_k\).

  2. 2.

    Compute the sum of the absolute histogram differences (\(h1-h2\)) between two adjacent frames.

  3. 3.

    The sum of the absolute differences is compared to a predefined threshold [55]. If the sum exceeds the threshold, the frame is referred as the motion frame (MF).

  4. 4.

    Continue until last frame of video.

  5. 5.

    Calculate the entropy of all the selected motion frames.

  6. 6.

    If the single frame entropy is greater than the compared average entropy of motion frames then that frame is selected as keyframe (\(k_f\)).

Embedding process

In this section, a scrambled watermark is incorporated into the keyframes. The following steps are considered while embedding watermark:

  1. 1.

    Convert keyframe (\(K_f\)) into YUV color space and select chrominance component \(( K_{v})\) by Eq. (19):

    $$\begin{aligned}{}[K_{y}\ K_{u}\ K_{v}]=\mathrm{rgb}2\mathrm{yuv}(K_f) \end{aligned}$$
    (19)
  2. 2.

    The chrominance component \((K_{v})\) is decomposed into four sub-bands (\(\mathrm{LL}_v, \mathrm{LH}_v, \mathrm{HL}_v, \mathrm{HH}_v\)) by performing 1-level LWT defined by Eq. (20) and consider sub-band \((\mathrm{LL}_v)\) for embedding purpose:

    $$\begin{aligned}{}[\mathrm{LL}_v, \mathrm{HL}_v, \mathrm{HL}_v, \mathrm{HH}_v]={\text {LWT}}(M_{v}) \end{aligned}$$
    (20)
  3. 3.

    Perform HT on sub-band \(\mathrm{LL}_v\) using Eq. (21):

    $$\begin{aligned} \mathrm{HT}[\mathrm{LL}_v]=Q_v\times H_v\times Q^{\text {T}}_v \end{aligned}$$
    (21)
  4. 4.

    Select watermark logo (w) of size \((R_2\times R_2)\) and perform AT to obtain scrambled watermark (\(s_w\)) for security enhancement purpose.

  5. 5.

    Scrambled watermark (\(s_w\)) is decomposed into four sub-bands by performing 1-level LWT using Eq. (22) and sub-band \(\mathrm{LL}_w\) is selected for further processing:

    $$\begin{aligned}{}[\mathrm{LL}_w, \mathrm{HL}_w, \mathrm{HL}_w, \mathrm{HH}_w]=\mathrm{LWT}(s_{w}) \end{aligned}$$
    (22)
  6. 6.

    Perform HT on sub-band \(\mathrm{LL}_w\) using Eq. (23):

    $$\begin{aligned} \mathrm{HT}[\mathrm{LL}_w]=Q_w\times H_w\times Q^{\text {T}}_w \end{aligned}$$
    (23)
  7. 7.

    Apply IGSA to obtain the set of MSF factors (\(\alpha \)) according to the procedure detailed in “Selection of multiple scaling factors using IGSA algorithm”.

  8. 8.

    Embed the component \(H_w\) into \(H_v\) using Eq. (24):

    $$\begin{aligned} {{\tilde{H}}}_v = H_v+\alpha \times H_w \end{aligned}$$
    (24)
  9. 8.

    Apply inverse HT (IHT) on modified matrix (\({{\tilde{H}}}_v\)) to obtain a Hessenberg embedded matrix (\({{\tilde{L}}}L_v\)) by Eq. (25):

    $$\begin{aligned} {{\tilde{L}}}L_v=Q_v \times {{\tilde{H}}}_v \times Q^{\text {T}}_v \end{aligned}$$
    (25)
  10. 9.

    Apply inverse LWT (ILWT) on matrix (\({{\tilde{L}}}L_v\)) to obtain the watermarked video frames (\(W_v\)).

    $$\begin{aligned} W_v=\mathrm{ILWT}[{{\tilde{\mathrm{L}}}}\mathrm{L}_v, \mathrm{LH}_v, \mathrm{HL}_v, \mathrm{HH}_v] \end{aligned}$$
    (26)
  11. 10.

    Finally, convert all YUV watermarked video keyframes into RGB watermarked video keyframes and merges with rest video frames to obtain watermarked video (\(V^\star \)).

Extraction process

In extraction, the reverse operation of the embedding process is performed to recover the watermark logo and is presented below:

  1. 1.

    Divide the RGB watermarked video (\(V^\star \)) into k frames, namely \(f^\star _1\), \(f^\star _2 , \ldots , f^\star _k\), and extract keyframes (\(k^\star _f\)).

  2. 2.

    Convert the watermarked video keyframes (\(k^\star _f\)) into YUV color space and select chrominance component (\(k^\star _v\)).

  3. 3.

    The chrominance component \((K^\star _{v})\) is decomposed into four sub-bands

    (\({\text {LL}}^\star _v, {\text {LH}}^\star _v, {\text {HL}}^\star _v, {\text {HH}}^\star _v\)) by performing 1-level LWT defined by Eq. (27) and consider sub-band \(({\text {LL}}^\star _v)\) for extraction purpose

    $$\begin{aligned}{}[{\text {LL}}^\star _v, {\text {LH}}^\star _v, {\text {HL}}^\star _v, {\text {HH}}^\star _v]={\text {LWT}}(k^\star _v) \end{aligned}$$
    (27)
  4. 4.

    Apply HT on sub-band \({\text {LL}}^\star _v\) of watermarked video keyframes by Eq. (28):

    $$\begin{aligned} HT[{\text {LL}}^\star _v]=Q^\star _v\times H^\star _v\times {Q^\star }^{\text {T}}_v \end{aligned}$$
    (28)
  5. 5.

    Extract watermark logo using Eq. (29):

    $$\begin{aligned} H^\star _w= \frac{(H^\star _v-H_v)}{\alpha } \end{aligned}$$
    (29)
  6. 6.

    Perform IHT on matrix (\(H^\star _w\)) using Eq. (30):

    $$\begin{aligned} {\text {LL}}^\star _w=Q_w\times H^\star _w\times Q^{\text {T}}_w \end{aligned}$$
    (30)
  7. 7.

    Apply ILWT on matrix (\({\text {LL}}^\star _w\)) to obtain the extracted and scrambled watermark (\({\text {LL}}_{ew}\)) using Eq. (31):

    $$\begin{aligned} {\text {LL}}_{ew}=ILWT[{\text {LL}}^\star _w, {\text {LH}}_w, {\text {HL}}_w, {\text {HH}}_w] \end{aligned}$$
    (31)
  8. 8.

    Again, Apply IAT to recover the watermark logo (\(w^\star \)).

Fig. 6
figure 6

Watermarked video frames and their corresponding extracted watermark logos a Silent, b Foreman, c Mobile, d Hall monitor

Table 2 The performance of the proposed technique under various attacks for considered videos in terms of robustness
Fig. 7
figure 7

The quality of watermarked video frames and extracted watermark logos after various attacks

Selection of multiple scaling factors using IGSA algorithm

The scaling factor is a critical parameter in watermark embedding. It regulates imperceptibility and robustness simultaneously. However, to maintain the trade-off between the above two parameters, a single scaling factor completely fails. The IGSA algorithm is employed to identify the optimal set of MSF through an objective function given in Eq. (32). The complete procedure for determining optimal scaling factors by IGSA is given below:

  1. 1.

    IGSA initializes a random population where, each individual consists of ’d’ scaling factors.

  2. 2.

    The objective function described by Eq. (32) is used to calculate the fitness of each individual:

    $$\begin{aligned} {\text {OF}}(i, j)_{\text {max}}=&\left( \frac{{\text {MSSIM}} (f, {{\tilde{f}}}) + {\text {MNC}} (f, \tilde{f})}{2}\right. \nonumber \\&\left. +\frac{\sum \nolimits _{x=1}^K {\text {MNC}} (w, {{\tilde{w}}})}{K}\right) \end{aligned}$$
    (32)

    where MSSIM (\(f, {{\tilde{f}}}\)) and MNC (\(f, {{\tilde{f}}}\)) evaluate mean structural similarity index and mean normalized correlation between cover video frames (f) and watermarked video frames (\({{\tilde{f}}}\)). MNC (\(w, {{\tilde{w}}}\)) measures the mean normalized correlation between the watermark logo (w) and extracted watermark logo (\({{\tilde{w}}}\)), and K denotes the total number of performed attacks.

  3. 3.

    Each individual of IGSA is updated according to Eq. (18).

  4. 4.

    This process is continued until the maximum iterations are reached.

Results and discussions

Experiments are simulated on MATLAB 2020a on a 2.5 GHz, I5 processor, and 8 GB RAM system. The efficacy of the proposed technique (IGSA-LH) has been evaluated on four standard benchmark videos: Silent, Foreman, Mobile, and Hall monitor in terms of imperceptibility and robustness. Imperceptibility is examined between cover video and watermarked video using MPSNR and MSSIM, while robustness is evaluated between watermark logo and extracted watermark logo using MNC. The cover videos and watermark logo are taken from the online available database [56, 57]. The parameter settings of the considered techniques are taken from the respective literature. The experimental results are studied in the following sections: “Imperceptibility analysis of the proposed technique” analyzes the imperceptibility while robustness is examined against supposed attacks in “Robustness analysis of the proposed technique”. “Performance analysis against existing techniques” presents the comparative analysis of the proposed technique (IGSA-LH) with five recent video-watermarking schemes under considered attacks in terms of MPSNR, MSSIM, and MNC values. In addition, statistical validation of the proposed technique is discussed in “Statistical Analysis of the proposed technique”. Finally, “Comparative analysis of time complexity” provides the comparative analysis of the considered techniques in terms of time complexity.

Imperceptibility analysis of the proposed technique

The imperceptibility evaluates the quality of watermarked video frames. The imperceptibility is assessed between cover video frames (f) and watermarked video frames (\({{\tilde{f}}}\)). The performance of the proposed technique (IGSA-HT) has been examined on all considered four videos in terms of MPSNR and MSSIM which are defined by Eqs. (33) and (34), respectively. Figure 6 shows the quality of watermarked video frames which are quite similar to keyframes depicted in Fig. 5). The proposed technique attains higher MPSNR values as 47.65, 48.97, 47.98, 48.07, 48.52 and MSSIM values as 0.9997, 0.9998, 0.9997, 0.9993, 0.9998 against each considered video. The figure depicts that the proposed technique performs effectively in the embedding phase.

$$\begin{aligned} {\text {MPSNR}} (f, {{\tilde{f}}})= & {} \frac{1}{F}\sum \limits _{i=1}^FPSNR_i \end{aligned}$$
(33)
$$\begin{aligned} {\text {MSSIM}} (f, {{\tilde{f}}})= & {} \frac{1}{F}\sum \limits _{i=1}^F SSIM_i, \end{aligned}$$
(34)

where F denotes the total number of video frames.

Table 3 The imperceptibility (MPSNR, MSSIM) of the considered techniques under no attack

Robustness analysis of the proposed technique

The quality of extracted watermark logo under the various image and video attacks is measured by robustness. The robustness is assessed between the original watermark logo and extracted watermark logo in terms of MNC which is defined by Eq. 35. A total of 12 attacks are confronted on the watermarked video to evaluate the robustness of the proposed technique. These attacks includes (i) median filtering (\(3\times 3\)), (ii) wiener filtering (\(3\times 3\)), (iii) Gaussian filtering (\(3\times 3\)), (iv) rotation (\(45^\circ \)), (v) translation (30, 30), (vi) cropping (center), (vii) sharpening, (viii) gamma correction (\(\gamma =0.6\)), (ix) histogram equalization, (x) Gaussian noise (0, \(10\%\)), (xi) salt and pepper noise (\(10\%\)), and (xii) Poisson noise. Table 2 tabulates the MNC values for all considered four videos against each attack. It can be visualized from the table that the proposed technique attains superior MNC values. The quality of extracted watermark logos corresponding to each attack for all videos are illustrated in Fig. 7. It can be observed from the figure that the proposed technique recovers a superior quality watermark against each attack except Gaussian noise, salt and pepper noise, and Poisson noise:

$$\begin{aligned} {\text {MNC}} (w, {{\tilde{w}}}) =\frac{1}{F}\sum \limits _{i=1}^FNC_i \end{aligned}$$
(35)
Fig. 8
figure 8

Comparative analysis of IGSA-LH technique against considered techniques in terms of imperceptibility

Performance analysis against existing techniques

The proposed technique (IGSA-LH) has been compared with five recent video-watermarking techniques, namely Karmakar et al. [2], Bhardwaj et al. [49], Farri et al. [17], Kuraparthi et al. [24], and Agilandeeswari et al. [50]. “Comparative analysis of imperceptibility” discusses the imperceptibility performance of the proposed technique against existing techniques in terms of MPSNR and MSSIM. While “Comparative analysis of robustness” studies the robustness of the considered techniques in terms of MNC parameters on 12 video attacks.

Comparative analysis of imperceptibility

Table 3 highlights the MPSNR and MSSIM values corresponding to each video against compared schemes. It can be noticed from the table that the proposed technique attains an average value of MPSNR as 48.39 and MSSIM value as 0.9996. These superior values confirm that the proposed technique outperforms the compared schemes. For the quantitative analysis, the MPSNR values are plotted against considered methods under all considered four videos which are shown in Fig. 8.

Comparative analysis of robustness

Tables 4, 5, 6, 7, 8, 9, 10, 11, 12 and 15 illustrate the comparative analysis of robustness of the considered techniques against 12 different video attacks.

  1. (i)

    Median filtering attack: The median attack with kernel size \(3\times 3\) is performed on watermarked video frames. Table 4 compiles the individual and average values of four videos for all considered schemes. The proposed technique achieves individual MNC values as (0.9990, 0.9984, 0.9991, 0.9986) and average MNC value as 0.9988 against this attack for Silent, Foreman, Mobile, and Hall monitor videos. These results confirm that the IGSA-LH technique outperforms compared schemes against median filtering attack.

  2. (ii)

    Wiener filtering attack: To evaluate the robustness, the watermarked video frames are attacked by wiener filtering. Table 5 compiles the MNC individual values and averaged MNC values for the considered four videos against each considered scheme. The averaged MNC values attained by the considered schemes are as (0.9993, 0.9830, 0.9956, 0.9986, 0.9880, 0.9869). These results confirm that the average MNC value of the IGSA-LH technique is superior to the compared schemes that show the IGSA-LH technique’s robustness under wiener filtering attack.

  3. (iii)

    Gaussian filtering attack: Gaussian filtering with kernel size \(3\times 3\) is encountered on watermarked video frames. Table 6 presents the average MNC values as (0.9992, 0.9837, 0.9965, 0.9990, 0.9910, 0.9876) corresponding to each considered schemes for all considered four videos. From the table, it can be confirmed that the proposed technique attains a higher average MNC value as compared schemes. Therefore, the proposed technique performs effectively under Gaussian filtering attacks.

  4. (iv)

    Rotation attack: A watermarked video is rotated clockwise by \(45^\circ \) angle to prove the robustness. For this attack, the average MNC values are as (0.9997, 0.9835, 0.9951, 0.9960, 0.9921, 0.9884) which are tabulated in Table 7. This table visualizes that the proposed technique’s average MNC value is superior to the compared schemes. Hence, the proposed technique outperforms under this attack.

  5. (v)

    Translation attack: To evaluate the effectiveness of the proposed technique, a translation attack (30, 30) shifts the pixels value of the watermarked video. Under this attack, Table 8 highlights the average MNC values corresponds to each considered scheme. These values are as (0.9996, 0.9834, 0.9950, 0.9958, 0.9920, 0.9890) which confirm that the proposed technique has the highest average MNC value against compared schemes. Therefore, the proposed technique performance is outstanding under this attack.

  6. (vi)

    Cropping attack: In cropping attack, a watermarked video frame is cropped \(25\%\) from the center. The average MNC values obtained by each scheme under this attack are as (0.9993, 0.9833, 0.9947, 0.9957, 0.9904, 0.9868) which are visualized in Table 9. It can be observed from the table that the proposed technique attains a higher average MNC value. This table concludes that the IGSA-LH technique is quite effective under cropping attack.

  7. (vii)

    Sharpening attack: A sharpening attack is performed on watermarked video frames. The average MNC values correspond to sharpening attack against considered schemes are as (0.9988, 0.9828, 0.9951, 0.9956, 0.9932, 0.9934) and visualized in Table 10. It can be envisioned from the table that the proposed technique attains a higher average MNC value which is reflected in bold. Hence, the proposed technique is quite resilient to sharpening attack.

  8. (viii)

    Histogram equalization attack: Histogram attack is confronted on watermarked video frames to compare the robustness. In histogram attack, the considered schemes attains average MNC values as (0.9979, 0.9824, 0.9948, 0.9954, 0.9926, 0.9909), respectively, and depicted in Table 12. Histogram attack gets maximum average MNC value (0.9979) corresponding to the proposed technique. This value confirms that the performance of the proposed technique is quite effective under this attack.

  9. (ix)

    Gamma correction attack: The contrast of the watermarked video frames is enhanced by gamma correction (0.6) and average MNC values under this attack are presented in Table 11. Table 11 tabulates the average MNC values as (0.9982, 0.9825, 0.9951, 0.9955, 0.9912, 0.9888) corresponding to each considered schemes. these results confirm that the average MNC value is higher for proposed technique.

  10. (x)

    Salt and pepper noise attack: Watermarked video frames are attacked by salt and pepper noise (\(1\%\)) and average MNC values corresponding to each scheme are depicted in Table 14. Farri et al. [17] gets maximum average MNC value as 0.9954. Therefore, the proposed technique is performing better than Karmakar et al. [2]. However, Farri et al. [17] outperforms.

  11. xi.

    Gaussian noise attack: In Gaussian noise, the watermarked video frames are destroyed by \(10\%\) noise density. Table 13 depicts the average MNC values as ( 0.9903, 0.9818, 0.9919, 0.9953, 0.9915, 0.9911). It can be visualized from the table that the IGSA-LH technique performs better than Karmakar et al. [2]. However, Farri et al. [17] attains a higher average MNC value 0.9953 which conforms that this scheme outperforms under the Gaussian noise attack.

  12. xii.

    Poisson noise attack: Poisson noise attack is performed on video-watermarked frames to compare the robustness. The average NC values against each scheme are envisioned in Table 15. The average MNC values are as (0.9910, 0.9823, 0.9922, 0.9960, 0.9925, 0.9930) for considered schemes, respectively. It can be observed from the table that Farri et al. [17] is also outperforming under this attack.

Table 4 The robustness (MNC) of the considered techniques under median filtering attack for considered videos
Table 5 The robustness (MNC) of the considered techniques under wiener filtering attack for considered videos
Table 6 The robustness (MNC) of the considered techniques under Gaussian filtering attack for considered videos
Table 7 The robustness (MNC) of the considered techniques under rotation attack for considered videos
Table 8 The robustness (MNC) of the considered techniques under translation attack for considered videos
Table 9 The robustness (MNC) of the considered techniques under cropping attack for considered videos
Table 10 The robustness (MNC) of the considered techniques under sharpening attack for considered videos
Table 11 The robustness (MNC) of the considered techniques under gamma correction attack for considered videos
Table 12 The robustness (MNC) of the considered techniques under histogram equalization attack for considered videos
Table 13 The robustness (MNC) of the considered techniques under Gaussian noise attack for considered videos
Table 14 The robustness (MNC) of the considered techniques under salt and pepper noise attack for considered videos
Table 15 The robustness (MNC) of the considered techniques under Poisson noise attack for considered videos
Table 16 Friedman’s test of the proposed technique and considered techniques over MPSNR, MSSIM, and MNC
Table 17 Time complexity with considered video-watermarking techniques (in s)

Moreover, the average MNC values of all the considered videos corresponding to each attack against considered schemes are also presented graphically for better visualization, shown in Fig. 9. It can be illustrated from the figure that the proposed technique outperforms all considered attacks except noise attacks. Therefore, the proposed technique is resilient to considered attacks.

Fig. 9
figure 9

Comparative analysis of IGSA-LH technique against considered techniques in terms of robustness

Statistical analysis of the proposed technique

To statistically validate the performance of the proposed technique (IGSA-LH), a non-parametric Friedman’s test is performed for each considered performance parameters, i.e., MPSNR, MSSIM, and MNC. The test contains two hypothesis, i.e., the null hypothesis (\(H_0\)) and the alternative hypothesis (\(H_1\)). As per the \(H_0\), all parameter values generated by comparative methods are significantly equal, while \(H_1\) says that comparative methods are significantly different. The p value return by test is 0.01 for 30 different executions which is less than considered significant level (\(\alpha =0.05\)). Therefore, it can be stated that \(H_0\) fails and the obtained results are significantly different. Further, Table 16 depicts the ranking of the considered techniques for each parameter, respectively, where the proposed technique ranked first. It can be confirmed from the tables that the proposed technique is statistically better than the considered techniques in terms of each parameter.

Comparative analysis of time complexity

The proposed technique has also been compared with five recent state-of-the-art schemes in terms of embedding and extraction time for 300 video frames of each considered video. Table 17 depicts the embedding and extraction time in seconds. It can be confirmed from the table that the proposed technique spans few seconds only as compared techniques. These results show that the proposed technique has low time complexity, making the proposed technique suitable for video watermarking in real-time. Hence, the proposed technique is computationally efficient.

Conclusion

This paper presents a lossless efficient video-watermarking technique based on an optimal keyframes selection using IGSA and HT in the LWT domain. In this scheme, a scrambled watermark logo is incorporated into the keyframes followed by a one-level LWT. IGSA algorithm acquires a set of MSF which maintains the equilibrium between imperceptibility and robustness. The security of the IGSA-LH technique has been improved by performing Arnold transform on the watermark logo prior to embedding. The experimental results were validated against 12 video attacks and compared five recent state-of-the-art schemes. The comparative analysis of imperceptibility and robustness validate that the IGSA-LH technique is quite resilient to attacks. In future work, the performance of the proposed technique can be evaluated on color watermark logos over different video attacks along with various parameters. Further, the elliptic curve cryptography (ECC) technique can be applied to a proposed technique for security enhancement. Moreover, the applicability of the proposed technique can be extended to big-data applications.