1 Introduction

With the fast evolution of network technology and multimedia applications, video applications such as video-on-demand (VOD), video meetings, pay-tv, and video surveillance have been widely used. Because the video transmission depends on different networks, the video content may be captured because of the anatomy of the public channels. Securing colored videos during transmission and storage has become a challenging topic in recent years. The general video security objectives are availability, integrity, and confidentiality [1]. In general, three methods, i.e., video encryption (cryptography) [2,3,4], video steganography [5,6,7,8], and video watermarking [9, 10] could be used to achieve security. Cryptography is the most efficient technique to provide security to the colored videos by converting the raw video into an unintelligible video form using a secret key. The plain video can be restored only with the knowledge of the secret key. Video encryption techniques use two building blocks proposed by Shannon diffusion and confusion [11].

In general, image and video encryption algorithms are divided into full encryption and compression-combined encryption (selective encryption) [12]. Each of them has advantages and limitations. In full encryption [13,14,15,16,17,18,19], the whole image or video content is encrypted with a novel method directly, as shown in Fig. 1b. The full encryption algorithms are applied to uncompressed or compressed videos using any compression method [20]. The full encryption algorithms provide high-security encryption but take a long processing time. They are used in significant applications such as military and medical applications. In selective encryption [21,22,23], the video data are partially distorted by the encryption process, and the encrypted video is still partially intelligible after the encryption, as shown in Fig. 1c. They are used in applications that require low processing time. The proposed scheme wants to combine the advantages of the two mentioned methods to achieve good encryption and low processing time.

Fig. 1
figure 1

Full and selective encryption methods

Because of the high correlation of the video frame neighboring pixels and the strong relationship between the video frames, traditional algorithms such as AES and DES could not guarantee high performance and low time processing for video encryption. The AES and DES are also unsuitable for encrypting colored video in real time [24]. Therefore, several algorithms for multimedia encryption were proposed [25,26,27,28,29,30,31,32,33,34]. These algorithms introduced by several academicians and researchers use different techniques such as DNA encoding and chaotic maps to encrypt images and videos securely and robustly. The most recent multimedia encryption methods are summarized in this section.

In [13], Li et al. presented a video encryption scheme that uses different chaotic algorithms and depends on the amount of information in each channel of a video frame. The video file is divided into a video stream and an audio stream. The video file stream is converted into YCbCr color space. The Arnold map and DNA encoding algorithm encrypt the Y channel, and the Lorenz hyperchaotic map is used to encrypt the other channels, where this scheme requires high-time processing. Yasser et al. proposed a multimedia encryption scheme based on hybrid-chaotic [19]. The proposed cryptosystem includes different media types such as videos, images, speech, and text. Alarifi et al. [16] developed a new hybrid cryptosystem for compressed video files based on chaotic maps, DNA sequences, and a modified Mandelbrot set. The scheme uses the Arnold map to generate three keys, and then, the encoding of the keys is performed with DNA sequences. The Hamming distance between the keys and a compressed YCbCr video frame is applied, encoding the result, and confusion and diffusion principles are applied. Valli and Ganesan[35] implemented a video encryption system that uses a substitution box to achieve diffusion and uses two different schemes. The first scheme is the higher-dimensional 12D chaos structure, and the other uses the Ikeda delay differential equation. The proposed drawbacks are the complexity of the key and the time that the encryption process takes. Kumar et al. [36] suggested a secure scheme based on chaos for video encryption. The algorithm provides a three-level of security: random selection of the frame, permutation order of the frame, and diffusion of the frame. In [37], Song et al. proposed a secure scheme to encrypt quantum videos. The proposed consists of three steps. First, permutation of the inter-frame position based on keys generated from an improved logistic map. Second, geometric transformation and improved logistic map change intra-frame pixels position. Finally, the quantum controlled-XOR operations and improved logistic map were used to encrypt the high 4-intra-frame-qubit-planes. In [38], Ye et al. used frequency domain encryption. First, the original image is transformed with the discrete wavelet transform and then compressed. Then, the carrier image is processed by lifting the wavelet transform and discrete cosine transform together with a Schur decomposition. Visually meaningful image encryption is achieved by embedding operation at the end. The encryption in the frequency domain improves encryption efficiency, but the implementation of frequency domain transformation leads to data loss. In [39], each channel of the color image was encrypted by the multi-parameter fractional discrete Tchebyshev moments. In [40], Gong et al. studied four-dimensional chaotic systems for image encryption applications. A new opto-digital color picture encryption scheme based on a compound chaotic map, the reality-preserving fractional Hartley transformation, and the piecewise linear chaotic map for image pixel replacement, optical processing, and permutation is suggested [41]. The proposed technique has a high sensitivity to keys and greater protection.

An overview of different schemes for securing colored video is introduced. Still, they have some drawbacks and vulnerabilities: (1) the running time of the related algorithms is high and does not meet the real-time applications. (2) Some related algorithms are complex and unsuitable for IoT devices. (3) Some related algorithms evaluate their proposed work based on test images and do not investigate the test videos. (4) Some related algorithms do not investigate the effect of different noises in the security performance analysis. Motivated by previous points, this paper introduces a new scheme for securing the colored video with high-quality encryption to improve such shortcomings. The proposed scheme consists of a video preprocessing step plus four main steps: colored video components extraction and padding, frame components splitting, frame components scrambling, key generation, and diffusion step. The input-colored video is preprocessed to extract individual frames. The three video components (channels), red, green, and blue, are separated from each frame and padded by zeros. The four main steps are applied to each frame channel independently. First, the plain video frame channel is split into blocks, and the blocks are further split into sub-blocks by applying a new frame channel dividing scheme. Second, a scrambled frame channel is obtained by applying a zigzag scan in the blocks and the sub-blocks; then, a counterclockwise rotation by a 90° is applied to all blocks, and then, the blocks are shuffled randomly. Third, a key is generated based on the logistic map. Finally, the encrypted frame channel is obtained by applying the XOR function between the generated key and the scrambled frame channel.

The paper's contributions are summarized as follows:

  1. 1.

    A novel splitting method is introduced for each frame channel.

  2. 2.

    Random shuffling is performed between blocks to get a scrambled frame channel.

  3. 3.

    Diffuse the scrambled component using the logistic map, where the initial value of the logistic map is based on the first input frame component, making the proposed method robust against differential attacks.

  4. 4.

    The results show that the proposed scheme takes low processing time to encrypt the colored videos compared to the literature.

The rest of this paper is coordinated as follows. The proposed scheme is demonstrated in Sect. 2 in detail. Section 3 presents the simulation results and security analysis. Eventually, the work is concluded in Sect. 4.

2 The proposed video encryption method

This section describes the proposed method in detail. The raw colored video is preprocessed and encrypted in an unintelligible format. The decryption process is applied to get the original colored video. Figure 2 shows an illustrative diagram of the total steps.

Fig. 2
figure 2

Colored video encryption visual diagram

2.1 Preprocessing the video

  1. A.

    Video components extraction the proposed method is applied to each frame channel independently, so the input colored video is preprocessed to extract individual frames. Then, the frame channels are separated from each frame.

  2. B.

    Frame components padding the encryption and decryption process needs the input video frame's size to be multiple of the block size. So, after the frame components are separated, it is needed to pad them by zeros according to the size of these components.

2.2 Encryption process

Here, the proposed scheme for encrypting colored video consists of four phases. These phases are performed on each channel independently. In the first phase, channel splitting is performed. In the second phase, channel scrambling (permutation) is applied. Key streams are generated from the logistic map in the third phase. The channel diffusion process is performed in the last phase.

2.2.1 Channel splitting

A raw frame channel is partitioned into blocks of equal size. The block size dimensions that the users can select from and are suitable for the scheme are 16, 32, and 64. Then, a random vector with a length equal to the number of blocks is generated. The blocks are further partitioned into sub-blocks or kept without partition based on the generated vector.

2.2.2 Channel scrambling

The arrangements of the frame channel's pixels are changed in this phase as follows:

  1. (a)

    The zigzag scan is used to permute the positions of the pixels in each block (undivided and subdivided blocks) of the divided channel.

  2. (b)

    Each block (undivided block and subdivided block) is rotated by 90°.

  3. (c)

    For every block in the divided channel, a random number is generated to create a vector \( r\).

  4. (d)

    Depending on the vector \( r\), a random permutation between blocks is performed to obtain the permuted frame channel.

2.2.3 Key generation

A new key vector \(K\) from the logistic map is generated for every frame channel. The mathematical equation of the logistic map is:

$$ Y_{n + 1} = bY_{n} \left( {1 - Y_{n} } \right) $$
(1)

where 0 < \(b\) ≤ 4, and a starting value 0 < \(Y_{0}\) < 1. When \(b\) ∈ [3.57, 4], the map is chaotic. The starting value \(Y_{0}\) depends on the input colored video. The key generation steps for every frame channel are:

  1. (a)

    The starting value of the logistic map is computed.

    • For the first key vector (for the first channel of the first frame), \(Y_{0}\) is calculated by:

      $$ Y_{0} = \frac{{\mathop \sum \nolimits_{i = 1}^{M} \mathop \sum \nolimits_{j = 1}^{N} C\left( {i,j} \right)}}{M \times N \times 255 \times 3} + 10^{ - 20} $$
      (2)

      where \(C \) is the input frame channel, and M and N are the input size.

    • For other key vectors (for the other channels in the same frame or other frames), \(Y_{0} \) value equals the last value of the previous key vector \( K\left( {MN} \right)\) (in the previously processed channel).

  2. (b)

    Get a sequence \(S_{{{\text{temp}}}} \) by iterating Eq. (1) \( N_{0} + MN \) times, then generate a new sequence \(S\) with size \( MN\) by discarding the first \(N_{0}\) values of \( S_{temp}\).

  3. (c)

    Generate the key vector \(K\) by equation (3):

    figure e
    $$ K\left( i \right) = mod\left( {floor\left( {S\left( i \right) \times 10^{14} } \right), 256} \right), \quad i = 1\;{\text{ to}} MN $$
    (3)

2.2.4 Channel diffusion

In this phase, a bit-wise exclusive OR function is applied between every value in the generated key vector and the corresponding value in the permuted frame channel vector. After the channel pixels values are changed, an encrypted frame channel is generated. Algorithm 1 presents the steps of the encryption process. Also, Fig. 3 shows the flowchart of the scheme phases.

Fig. 3
figure 3

Flowchart of the proposed scheme

2.3 Decryption process

The decryption process can be constructed by inverting the encryption phases with the original keys to get the plain channels of each frame. The decryption steps are:

  1. (1)

    The bit-wise exclusive OR function is performed between every value in the key vector and the corresponding value in the encrypted frame channel vector.

  2. (2)

    Reordering the channel blocks placements to their original placements based on the random vector.

  3. (3)

    Apply a rotation by -90° and inverse zigzag pattern to all blocks to rearrange the original placements of the pixels.

3 Simulation results and security analysis

This section examines the colored video encryption scheme for privacy and robustness. The colored videos used for testing are Train.avi (192 × 352 × 3), Rhinos.avi (240 × 320 × 3), Viptrain.avi (240 × 360 × 3) and Flamingo.avi (192 × 352 × 3) taken from Valli and Ganesan [35], and Foreman.avi (352 x 288 x 3) downloaded from YUV Sequences [42]. Figure 4 shows the test video samples. The proposed scheme is executed using MATLAB (R2015a) on a laptop that has the subsequent specifications: Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz 2.21 GHz, 16 GB memory, and Windows 11 OS. The algorithm's initial parameters are: In the channel splitting step, the dimension of the blocks is 16 (where n = 4), \(b = 3.9\) for the logistic map, and \(N_{0}\) = 1000 for the skipped elements.

Fig. 4
figure 4

Original, encrypted, and decrypted videos

3.1 Visual analysis

Different evaluation metrics have been used with the proposed scheme. The first metric used to evaluate the scheme is the visual inspection. The encryption/decryption results of the videos are displayed in Fig. 4. The results indicate that the scheme hides all details within the test videos, and the receiver side restores the original videos successfully.

3.2 Histogram analysis

A histogram is an essential tool in evaluating the efficiency of the encryption scheme. It represents the number of occurrences of each pixel value in a frame channel. The flat histogram indicates that the frame channel can resist different types of statistical attacks [43]. Figures 5, 6, 7, 8 and 9 show the histograms for various videos' 10th original, encrypted, and decrypted frames. It is observed that the encrypted frames histograms have a uniform distribution form and are not similar to their corresponding original frames histograms.

Fig. 5
figure 5

Histogram for Flamingo video

Fig. 6
figure 6

Histogram for Rhinos video

Fig. 7
figure 7

Histogram for Train video

Fig. 8
figure 8

Histogram for Viptrain video

Fig. 9
figure 9

Histogram for Foreman video

Consequently, the proposed scheme hides any pattern in the frames of the test videos. Additionally, the decrypted frames histograms and their corresponding original frames are the same. So, the scheme can recover the original frame from the encrypted one successfully.

3.3 Correlation analysis

Principally in each video frame, there is a high correlation between neighboring pixels as the intensity values are nearly the same. These relationships must be reduced to protect the video frame against different attacks. The adjacent pixels pair's correlation can be calculated using the following equations.

$$ r_{A,B } = \frac{{E\left( {\left( {A - E\left( A \right)} \right)\left( {B - E\left( B \right)} \right)} \right)}}{{\sqrt {D\left( A \right)D\left( B \right)} }} $$
(4)
$$ E\left( A \right) = \frac{1}{s} \mathop \sum \limits_{i = 1}^{s} A_{i} $$
(5)
$$ D\left( A \right) = \frac{1}{s} \mathop \sum \limits_{i = 1}^{s} \left( {A_{i} - E\left( A \right)} \right)^{2} $$
(6)

where \(A\) and \(B\) represent the two adjacent pixel values, and \(s\) is the total number of selected pairs. Figures 10, 11 and 12 show the horizontal (H), vertical (V), and diagonal (D) correlation distributions of 6000 random pairs of neighboring pixels selected for the 10th original and encrypted frame of the Flamingo test video. The correlation values of 6000 random pairs of adjacent pixels for the 10th original and encrypted frame of various videos, along with H, V, and D directions, are presented in Table 1. From the results, the values of the original frames are close to one. On the contrary, the values of the encrypted frames are very low and very close to zero. So, there is no correlation between pixels in the frames encrypted by the proposed scheme. Therefore, the proposed scheme can resist statistical attacks.

Fig. 10
figure 10

Correlation distribution for the red channel of Flamingo video

Fig. 11
figure 11

Correlation distribution for the green channel of Flamingo video

Fig. 12
figure 12

Correlation distribution for the blue channel of Flamingo video

Table 1 Correlation coefficients for various videos

3.4 Entropy analysis

The information entropy is used to measure the randomness of the video frames. The Shannon entropy defines the degree of randomness of a video frame. The mathematical definition of entropy is calculated by

$$ H\left( m \right) = \mathop \sum \limits_{i = 1}^{w} p\left( {m_{i} } \right)\log_{2} \frac{1}{{p\left( {m_{i} } \right)}} $$
(7)

where the \(m_{i}\) represents the ith gray value in a video frame, and \(p\left( {m_{i} } \right)\) is the probability of \(m_{i}\) in a video frame. To ensure the randomness of the encrypted video frame with the suggested scheme, the entropy value of the encrypted frame should be near 8. The entropy values for the 10th frame of various videos are presented in Table 2. From the table, all values are close to 8, which indicates that the videos protected by the proposed scheme are robust against entropy attacks.

Table 2 Entropy values for various videos

3.5 Differential attack

An adversary can conjecture information about the video frame by changing an original video frame and then encrypting the original video frame and the modified original video frame using the same encryption method. The adversary compares the two encrypted frames with the plain frame and searches for the relationships between them. Therefore, the encryption scheme should generate a different encrypted frame with every little change in the original. The metrics used to evaluate algorithm performance for this aim are NPCR (Number of Pixels Change Rate) and UACI (Unified Average Changing Intensity). The mathematical calculations for the metrics are:

$$ {\text{NPCR}} = \frac{1}{MN} \mathop \sum \limits_{i = 1}^{M} \mathop \sum \limits_{j = 1}^{N} D\left( {i,j} \right) \times 100\left( \% \right) $$
$$ D\left( {i,j} \right) = \left\{ {\begin{array}{*{20}c} 0 & {{\text{if}}\; C_{1} \left( {i,j} \right) = C_{2} \left( {i,j} \right),} \\ 1 & {{\text{if}}\; C_{1} \left( {i,j} \right) \ne C_{2} \left( {i,j} \right),} \\ \end{array} } \right. $$
(8)
$$ {\text{UACI}} = \frac{1}{MN} \mathop \sum \limits_{i = 1}^{M} \mathop \sum \limits_{j = 1}^{N} \frac{{\left| {C_{1} \left( {i,j} \right) - C_{2} \left( {i,j} \right)} \right|}}{255} \times 100\left( \% \right) $$
(9)

where \(C_{1}\) and \(C_{2}\) are the encrypted video frame (plain and modified video frames). The modified frame is made by changing one pixel in the plain video frame. M and N are the video frame size. The ideal values for NPCR and UACI are 99.6094 and 33.4635%, respectively. The NPCR and UACI values for the proposed scheme applied on the 10th frame of various videos are presented in Table 3. From the table, NPCR and UACI are very close to the ideal values. Therefore, the videos encrypted by the proposed scheme have great resistance against differential attacks.

Table 3 The NPCR and UACI values for various videos.

3.6 Encryption quality analysis

3.6.1 Histogram deviation (\({\varvec{D}}_{{\varvec{H}}}\))

A metric is used to evaluate the quality of encryption for the proposed scheme by measuring the deviation in pixels values between the original video frame and the encrypted one. The maximum deviation can be estimated by:

$$ D_{H} = \frac{{K_{0} + K_{255} }}{2} + \mathop \sum \limits_{i = 1}^{254} K_{i} $$
(10)

where \(K_{i}\) is the difference at gray value \( i\). The large value of maximum deviation states a high deviation in the encrypted video frame from the original one. Table 4 presents the \(D_{H}\) values for the 10th original frame and encrypted frame for various videos using the proposed scheme. From the table, it is observed that the \(D_{H}\) values between the original and encrypted videos are large, proving that the quality of the videos encrypted by the proposed scheme is good enough.

Table 4 Histogram deviation and irregular deviation values for various videos

3.6.2 Irregular deviation (\({\varvec{D}}_{{\varvec{I}}}\))

A metric used to measure the maximum irregular deviation quantity in an encrypted video frame caused by an encryption algorithm. The irregular deviation can be estimated by:

$$ D_{I} = \frac{{\mathop \sum \nolimits_{i = 0}^{255} \left| {H\left( i \right) - A} \right|}}{M \times N} $$
(11)

where \(H_{i}\) refers to the histogram of the difference between the original and encrypted video frame at index \( i\), and \(A,\) is the mean value of the histogram for the encrypted video frame. The lower value of \(D_{I} \) indicates that the pixel distribution is uniform, and the quality of the encrypted video is high. Table 4 presents the \(D_{I}\) values for the 10th original and encrypted frames for the various videos using the proposed scheme. The results in the table show that the \(D_{I}\) values are low, proving the high quality of the encrypted videos and hence the strength of the proposed scheme.

3.7 PSNR, SSIM, and FSIM analysis

The peak signal-to-noise ratio (PSNR), structural similarity index (SSIM), and feature similarity (FSIM) metrics are used to estimate the quality performance of the encryption and decryption processes. This experiment evaluates the PSNR, SSIM, and FSIM between the original and encrypted video frames. The encryption process is efficient if the result values are low. Also, the PSNR, SSIM, and FSIM are evaluated between the original and decrypted video frames. The decryption process is efficient if the result values are high.

  1. (1)

    The PSNR measures the ratio between the highest possible power of a signal and the power of distorted noise. The PSNR for a grayscale video frame is measured by:

    $$ {\text{PSNR}} = 10 \times \log_{10} \left( {\frac{255 \times 255}{{{\text{MSE}}}}} \right) \left( {{\text{dB}}} \right) $$
    $$ {\text{MSE}} = \frac{1}{MN} \times \mathop \sum \limits_{i = 1}^{m} \mathop \sum \limits_{j = 1}^{n} \left| {F_{1} \left( {i,j} \right) - F_{2} \left( {i,j} \right)} \right|^{2} $$

    where \(F_{1}\) is the original video frame component, while \(F_{2}\) is the encrypted one. Small values of PSNR between the original video frame component and the corresponding encrypted one indicate a good encryption process. Table 5 shows the results of PSNR values between the 10th original and encrypted frames for various videos. The proposed scheme has low PSNR values from the table, indicating that the encryption process is efficient. Also, Table 6 shows the results of PSNR values between the 10th original and decrypted frames for various videos. The proposed scheme has high PSNR values from the table, indicating that the decryption process is efficient.

  2. (2)

    The SSIM index measures the similarity between two video frames and ranges from − 1 to 1 decimal value. The SSIM value can be calculated using:

    $$ {\text{SSIM}}\left( {x,y} \right) = \frac{{\left( {2\mu_{x} \mu_{y} + c_{1} } \right)\left( {2\sigma_{xy} + c_{2} } \right)}}{{\left( {\mu_{x}^{2} + \mu_{y}^{2} + c_{1} } \right)\left( {\sigma_{x}^{2} + \sigma_{y}^{2} + c_{2} } \right)}} $$

    where \( \mu_{x}\) and \( \mu_{y}\), respectively, represent the average value of original frame \(x\) and encrypted frame \( y\), \(\sigma_{x}^{2} \) and \(\sigma_{y}^{2}\), respectively, represent the corresponding variance value, \(\sigma_{xy}\) is the covariance of \( x and y\), and \(c1\) and \(c2\) are constants. Table 5 presents the SSIM values between the 10th original and encrypted frames for various videos. Table 6 presents the SSIM values between the 10th original and decrypted frames for various videos. Getting lower SSIM values between the original and encrypted frames is recommended to prove the encryption process's efficiency. It is recommended to get higher SSIM values between the original and decrypted frames to prove the efficiency of the decryption process. From Tables 5 and 6, the SSIM values between the original and encrypted frames are low, and the SSIM values between the original and decrypted frames are high, proving the quality of the encryption and decryption processes.

  3. (3)

    FSIM evaluates the local symmetry between the original and encrypted video frames. The FSIM value can be calculated using:

    $$ {\text{FSIM}} = \frac{{\mathop \sum \nolimits_{{x \in {\Omega }}} S_{L} \left( x \right) \cdot PC_{m} \left( x \right)}}{{\mathop \sum \nolimits_{{x \in {\Omega }}} PC_{m} \left( x \right)}} $$

    where \(S_{L} \left( x \right) \) represents the total anticipated similarity between two video frames, \({\Omega }\) is the spatial video frame, and \(PC_{m} \left( x \right) \) is the congruency phase value. Table 5 presents the FSIM values between the 10th original and encrypted frames for various videos. Table 6 presents the FSIM values between the 10th original and decrypted frames for various videos. It is recommended to get lower FSIM values between the original and encrypted frames to prove the efficiency of the encryption process. It is recommended to get higher FSIM values between the original and decrypted frames to prove the efficiency of the decryption process. From Tables 5 and 6, the FSIM values between the original and encrypted frames are low, and the FSIM values between the original and decrypted frames are high, proving the quality of the encryption and decryption processes.

Table 5 PSNR, SSIM, and FSIM values between the original and encrypted frames of various videos
Table 6 PSNR, SSIM, and FSIM values between the original and decrypted frames of various videos

3.8 Chosen-plaintext and known-plaintext attacks analysis

In this section, the resistance of the proposed scheme against chosen-plaintext and known-plaintext attacks is tested. Two different videos are used in this experiment. The first video is white, and the second is a black video. The videos are encrypted using the proposed scheme. The original and encrypted videos are shown in Fig. 13. The encrypted videos have no valuable information. So the proposed scheme has higher robustness against chosen‑plaintext and known‑plaintext attacks. Also, Table 7 shows the entropy value of the original and encrypted videos. From the table, the entropy value of the encrypted videos is very close to the optimal value, reflecting the strength of the proposed scheme.

Fig. 13
figure 13

Original and encrypted version of white and black videos

3.9 Edges detection analysis

The encryption scheme must guarantee to protect the edge information of the encrypted video. The edge differential ratio (EDR) metric is used in this experiment to estimate the edge distortion and is defined by:

$$ EDR = \frac{{\mathop \sum \nolimits_{i,j = 1}^{k} \left| {P\left( {i,j} \right) - \overline{P}\left( {i,j} \right)} \right|}}{{\mathop \sum \nolimits_{i,j = 1}^{k} \left| {P\left( {i,j} \right) + \overline{P}\left( {i,j} \right)} \right|}} $$

Where the pixel values in the edges within the binary form of the original video and encrypted video are \(P\left( {i,j} \right)\) and \(\overline{ P} \left( {i,j} \right)\), respectively. The EDR value should be close to one to ensure that the original and encrypted video is dissimilar. The EDR values between the 10th original and encrypted frames for various videos are presented in Table 8. From the table, the values are close to one, and the proposed scheme guarantees the original video and encrypted video are different. The Laplacian of Gaussian edge detection for the 10th original encrypted and decrypted frames for various videos is displayed in Fig. 14. The displayed results show a big difference between the original and encrypted frames on the edges. So the proposed scheme can hide the main details in the videos. Also, the edges in original frames are similar to those in decrypted frames, proving the proposed scheme's efficiency in decryption.

3.10 Keyspace analysis

The colored video encryption scheme should have a large keyspace to be robust and secure. The scheme can escape from the brute-force attacks if the keyspace \( \ge 2^{100}\). The proposed scheme uses different initial values to generate the secret key: the starting value \( Y_{0}\), and the control parameter \( b\) of the logistic map, and the number of skipped elements \( N_{0}\). We consider the \( Y_{0} and b \) precision is 1016, and \( N_{0}\) precision is 103; therefore, \(10^{35}\) is the total space of the key. So, the proposed scheme can withstand such attacks because the keyspace is larger than 2100.

3.11 Key sensitivity analysis

Any slight modification to the secret key of the encryption scheme should generate considerable changes in the result. The adversary uses a similar secret key to break the encryption scheme in the decryption process. The test video frame is encrypted with a key 1 generated from the chaotic map, with starting value \( Y_{0} = z\) as shown in Fig. 15B to test the key sensitivity. Later the encrypted frame is decrypted twice: once with a slight change in the starting value \( Y_{0} = z + 10^{ - 10}\) as shown in Fig. 15C and again with key 1 as shown in Fig. 15D. It is concluded that only the same secret key used in the encryption process can restore the original frame in the decryption process, and any slight change in the secret key will fail to break the encryption scheme.

3.12 Channel noises attack analysis

After the video is encrypted, it can be transmitted through different communication channels. During the transmission, the encrypted video may be affected by some noise. So, different types of noises are used to prove the efficiency of the proposed scheme's decryption process.

Table 7 Entropy values for original and encrypted white and black videos
Table 8 EDR values for various videos

3.12.1 Salt & peppers noise

In this experiment, the salt and pepper noise with variance value 0.005 is added to various encrypted video frames, and then, the decryption process is performed. The effect of this type on a video frame results in black and white dots on the video frame. Figure 16 shows that the decrypted videos are still intelligible, despite the effect of the noise on the video frames, proving the proposed scheme's power.

Fig. 14
figure 14

Laplacian of Gaussian edge detection results of the original, encrypted, and decrypted frame number 10 for various videos

Fig. 15
figure 15

Sensitivity of the key for Train video

Fig. 16
figure 16

Salt and pepper noise

3.12.2 Gaussian noise

This type of noise occurs due to the limitation of the sensor during the acquisition of the video frames under low-light conditions. In this experiment, the Gaussian noise with a variance value of 0.005 is added to various encrypted video frames, and the decryption process is performed. Figure 17 shows that the decrypted videos are still intelligible, despite the effect of the noise on the video frames.

Fig. 17
figure 17

Gaussian noise

3.13 Occlusion attack analysis

This section clarifies the decryption capability of the proposed scheme during the transmission of an encrypted video in case part of it has been dropped or lost. The experiment proves that the proposed scheme can resist the occlusion attack. Figure 18 shows the occlusion attacks on the various encrypted video frames and the decrypted frames.

Fig. 18
figure 18

Occlusion analysis

3.14 Execution time

The security scheme should encrypt/decrypt a colored video with low processing time. Various videos have been used to test the proposed scheme processing time. The experiment is carried out multiple times, and the average results are presented in Table 9. It is proven that the proposed scheme encrypts/decrypts the videos with high speed to fit the IoT devices' requirements.

Table 9 The average execution time

3.15 Time complexity analysis

Each phase's complexity in the proposed scheme is computed, and then, the overall complexity for the proposed scheme is calculated. When the number of rows and columns is \( M and N\) for the input video frame, \(g = 2^{n} \) is the block dimension where \(n = 4\). Therefore, the complexity for the channel splitting phase and scrambling phase is \( O\left( {\left( {M \times N} \right)/g^{2} } \right)\) and for the key generation and the channel diffusion phases is \( O\left( {M \times N} \right)\). Then, \(O\left( {M \times N} \right) \) is the overall complexity for one frame channel. Since, each frame has three channels, and each input video has several frames \( K\). The \(O\left( {M \times N \times K} \right) \) refers to the overall complexity of the proposed scheme.

3.16 Comparison with existing methods

A comparison between the proposed scheme and other recent encryption schemes is conducted to test the efficiency of the proposed scheme. The metrics used in this experiment are time complexity, execution time, execution time improvement ratio, correlation coefficient, NPCR, UACI, and entropy values.

The existing schemes have been implemented and executed in the same environment. Table 10 shows the time complexity and the average running time of 20 frames for various videos for the proposed scheme compared to the methods in [13, 16]. Table 11 presents the proposed scheme's execution time improvement ratio (ETIR) [44]. Also, Fig. 19 shows a visualized execution time. From the results, it is proven that the proposed scheme is faster than the methods in [13, 16], reflecting the proposed scheme's power.

Table 10 Speed comparison
Table 11 Execution time improvement ratio (ETIR)
Fig. 19
figure 19

Visual execution time

Table 12 presents the average values of the correlation coefficient between adjacent pixels in the horizontal, vertical, and diagonal directions for the proposed scheme compared to the methods in [19, 35, 36, 45, 46] applied on Flamingo, Rhinos, Train, and Viptrain videos. The table shows that the proposed scheme has correlation coefficient values closer to zero than the mentioned works. Also, Table 13 presents the average values of adjacent pixels in the horizontal, vertical, and diagonal directions for the proposed scheme compared to the methods in [47,48,49,50] applied to Foreman video. The table shows that the proposed scheme has correlation coefficient values close to zero with the mentioned works.

Table 12 Average correlation coefficient comparison applied on Flamingo, Rhinos, Train, and Viptrain videos

Table 14 presents the average values of NPCR and UACI for the proposed scheme compared to the methods in [19, 35, 36, 45, 46] applied to Flamingo, Rhinos, Train, and Viptrain videos. Also, Table 15 presents the average values of NPCR and UACI for the proposed scheme compared to the methods in [47,48,49,50] applied to the Foreman video. The results show that the proposed scheme has NPCR and UACI values closer to the NPCR and UACI optimal values than the other related works.

Table 16 presents the average values of Entropy for the proposed scheme compared to the methods in [19, 35, 36, 45, 46] applied to Flamingo, Rhinos, Train, and Viptrain videos. Also, Table 17 presents the average values of entropy for the proposed scheme compared to the methods in [47,48,49,50] applied in the Foreman video. The results show that the proposed scheme has entropy values close to the optimal value compared to the other related works.

4 Conclusion

This paper proposes a new scheme for securing the colored videos based on a frame channel scrambling and multi-key generation from a chaotic map. The proposed scheme is conducted independently on each of the three channels of the video frame to increase security. The performance of the proposed scheme is evaluated using visual analysis, histogram, correlation, entropy, differential attack, encryption quality analysis, PSNR, SSIM, and FSIM analysis, chosen‑plaintext and known‑plaintext attacks analysis, edges detection, keyspace, key sensitivity, channel noise attack analysis, occlusion attack analysis, computational processing time, and time complexity. The results proved that the proposed scheme is efficient in encrypting colored videos at high speed, does not require high computation resources, and is suitable for IoT devices. The proposed scheme is compared to the preceding related works, and the experiments prove that the proposed scheme has a high quality in securing the colored videos.

Table 13 Average correlation coefficient comparison applied on Foreman video
Table 14 NPCR and UACI comparison applied on Flamingo, Rhinos, Train, and Viptrain videos
Table 15 NPCR and UACI comparison applied to Foreman video
Table 16 Entropy comparison applied on Flamingo, Rhinos, Train, and Viptrain videos
Table 17 Entropy comparison applied on Foreman video