1 Introduction

Image watermarking (IW) methods have been thoroughly researched by the signal processing community [10, 17, 34, 44]. IW aims to insert a piece of data into the image of interest (host) so that the inserted data can be retrieved later from the potentially modified host to serve a specific purpose. IW has contributed in a variety of applications including copyright protection, image authentication, tampering detection and localization, to name a few [9, 19, 31]. Recently, IW has also been adopted to combat the poisoning of neural network by using malicious training images [37, 39]. More importantly, nowadays anyone with a smart device, which is very affordable, can capture/copy, modify, and broadcast images at one’s fingertip. Images can be forged to defame certain individual or to gain an upper hand over certain parties. Therefore, it is crucial to be able verify whether an image is genuinely coming from the claimed source [11, 29].

While various IW techniques were proposed [12, 53], these schemes focus on improving the image quality and/or robustness but they lack in capacity. Among the innovative IW proposals such as watermarking in the encrypted domain and strategic embedding in different frequency transformation domains [3, 8, 27, 52], saliency based IW methods [4, 23, 55] are particularly interesting. It is because the saliency of an image is arguably the most eye-catching or interesting region, hence it is likely that the attacker will retain that region of the image. Therefore, researchers consider to embed the watermark in the salient regions of the image. However, such schemes lacks in terms of payload due to its limited embedding capacity or they focus on only improving one of the three aspect, i.e., image quality, payload, or robustness [1, 2, 13, 20, 50].

Therefore, this paper proposes a unique blend of saliency based robust IW and a homomorphic encryption to simultaneously improve both aspects of IW, namely, quality of the watermarked image and watermark capacity, which depart from the conventional IW methods. First, we extend Bhowmik et al.’s IW technique by (a) performing less DWT decompositions (i.e., only 2-level decomposition), and (b) exploiting the local and global minima (that represents the image background) for more accurate saliency detection purposes. Next, multiple morphological processes are applied to extract the background of the host image, which is subsequently used in extracting the foreground. Our method can embed a watermark that has the same dimension and bit-depth as the host image, which is the first of its kind. Our method also produces high quality watermarked image. Furthermore, each channel of the watermark is encrypted with Arnold, Triple DES (TDES), Multi Flipping Permutation Encoding (MFPE) and the keys are secured with OUHE [41, 49]. Experiment results suggest that our proposed method outperforms the current state-of-the-art (SOTA) methods.

The proposed method can be deployed for copyright protection purposes. Furthermore, in case of multiple watermarks embedding, the proposed method can be enhanced for dual-authentication to realize hierarchical-integrity checking. For example, a well-established content creator/company can protect his/her digital properties (i.e., images) by using his/her watermark W1. A new small-scale startup can then purchase these images from the established company and further watermarked them with W2. In case of any malicious usage/attack, W2 might be destroyed/effected but W1 remains intact. This is when the hierarchical-copyright protection feature can contribute. This paper makes the following contributions: 1) embed an image-based watermark which is of the same dimension and bit-depth as the host image; 2) secure keys for decryptions by using OUHE and MFPE; 3) achieve high imperceptibility and robustness; 4) improve saliency detection, and; 5) enhancement to embed two or more watermarks (but of lower bit-depth) into one host image.

The rest of this paper is organized as follows: Section 2 reviews the related work, Section 3 details our proposed method, Section 4 presents the experiment results and finally, Section 5 concludes this work.

2 Related work

This section reviews two classes of image watermarking methods, namely, 1) transformation based IW methods, and 2) bitplane based IW methods.

2.1 Transformation based image watermarking methods

Our literature survey suggests that DWT is one of the most commonly adopted techniques to realize watermarking. For example, Qi et al. [38] proposed an IW method to achieve both robustness and quality. First, the host image is divided into 4 × 4 blocks and the resulting blocks are processed by applying a one-way hash function. Next, 1-level DWT transform is applied on each block to obtain the subbands LL, LH, HL, and HH. The watermark image is processed by using Mersenne Twister algorithm (MTA) and the output is subsequently embedded into the LL subbands. In addition, Chen et al. [7] proposed an IW method to embed a sequence of pseudo-randomly generated bits as the watermark into the DWT-LL subband of the host image. Similarly, Liu et al. [22] embed a watermark of size 64 × 64 bits into the DWT-LL subband of the host image.

Taking a different approach, Liu et al. [23] propose to divide the host image into N × N blocks, but instead, only the high intensity blocks (i.e., salient image region) are selected for watermarking purposes. Here, DWT is applied on each block to obtain the subbands. The watermark is then embedded into LL subband of the host image. Similarly, Bhowmik et al. [4] proposed a saliency detection based image watermarking method. First, the host image is converted into the YUV color space and it is then transformed by performing DWT. Upon performing the subbands intensity centering operation, a saliency map is produced, which is used in segmenting the foreground and background regions of the host image. Next, the background of the watermark (i.e., secret image) is embedded into the background of the host image with low strength. In contrast, the foreground of the watermark image is embedded in the foreground of the host image with high strength. Likewise, Zhang et al.’s method [55] also relies on saliency map to embed watermark. Here, the saliency map is produced from the host image by performing logarithmic quantization. Instead of using DWT, Contourlet Transformation (CT) is applied to the host image to get the approximated subbands, which are in-turn serving as the venues to host the watermark image. Another work based on CT is proposed by Najafi et al’s. [32]. First, a 2-level Sharp Frequency Localized Contourlet Transformation (SFLCT) is applied to the host image to obtain the approximation and detailed subbands. 1-level SFLCT is also applied to the watermark image. Next, Singular Value Decomposition (SVD) is performed to the detailed subbands of the host and the watermark images. Finally, the S matrix of the subbands of the host image is replaced by the S matrix of the subbands of the watermark by using α-strength diffusion. There are also watermarking methods that are based solely on a single transformation using SVD [24, 26]. However, this class of watermarking methods suffers from limited embedding capacity (i.e., restricted to the number diagonal entries in the S matrix) and low image quality when the embedding rate is high.

It should be noted that other transformations such as Discrete Cosine Transformation (DCT) [16] and Discrete Fourier Transformation (DFT) [33] have also been adopted to embed watermark. Interested readers may refer to these references.

2.2 Bitplane based IW methods

Bitplane based IW methods are usually innovated to serve specific purposes such as authentication and tampering detection. These enhancements are achieved in a few common ways, including (a) spreading the watermark across the host image by means of logistic mapping or permutation, (b) increasing capacity by reserving more bitplanes for watermark embedding purposes, and (c) embedding far less than 1 bit per pixel to achieve high watermarked image quality.

Specifically, Liu et al. [25] applied the logistic map to permute the (binary) watermark, then embed the processed randomized watermark via least significant bit (LSB) embedding. To improve the quality of the watermarked image, Lin et al. [21] and Chang et al. [5] proposed to use absolute moment block truncation coding (AMBTC). Both methods embed the watermark into two LSBs of the host image. To enhance the embedding capacity, Yu et al. [54] reserve 2 bitplanes to embed the watermark.

To realize the application of authentication, Molina et al. [30] and Shehab et al. [42] proposed to embed two types of watermark bits, namely, 1) recovery bits (RB), and 2) authentication bits (AB). The RB’s are embedded into the LSBs of the halftoned luminance component of the host image, followed by the embedding of AB’s into the LSB of the host. Similar to Molina et al. [30], Tohidi et al. [51] and Chen et al. [6] are also embedding two types of watermark bits. However, they both require auxiliary data to operate, which reduces the embedding efficiency, i.e., number of changes performed per embedded watermark bit.

In addition, there are other innovative ways of using the bitplanes. For example, Singh et al. [45] proposed a fragile watermarking method to improve the quality of the watermarked image. First, the 4th to 8th bitplanes are divided into non-overlapping blocks to generate the ABs. Next, the host image is transformed by using DCT and the ABs are embedded into the LSBs of the selected AC components of the DCT-transformed host image. Furthermore, Sidiropoulos et al. [43] proposed an LSB based embedding method where the randomly generated watermark bits are embedded into the bitplanes of the host image. Similarly, Su et al. [48] scramble the watermark by using Arnold mapping and the bitplanes of the processed watermark are then embedded into the LSB bitplane of the host image. Prasad et al.’s method [36] also follows the same approach put forward by Su et al. [48] but they utilise Logistic map.

Although researchers have put forward various proposals for image watermarking purposes, based on our literature survey, it is noticed that the embedding capacity remains low due to the dependency between capacity and watermarked image quality. Therefore, in this work, we aim to improve both capacity and quality simultaneously.

3 Proposed methodology

Given host image, the salient object in the host color image I is first determined. Segmentation is then performed to obtain its foreground and background. Next, chaotic symmetric crypto-models and homomorphic encryption are applied to the RGB-channels of the watermark W and the outputs are embedded into I as shown in Fig. 1.

Fig. 1
figure 1

Proposed saliency model and watermark embedding method

3.1 Salient object detection

We extend the work by Bhomwik et al. [4] by using less number of DWT (as opposed to ≥ 3 level) and considering only the subtraction of local maxima. Furthermore, in our work, a 2-level DWT is applied and the local maxima is subtracted from the image minima to achieve better foreground segmentation. Moreover, our method applies threshold-based segmentation to the average saliency maps generated from the subbands (except LL subbands), which produces a more accurate binary mask to segment the foreground from the background. Specifically, I is first converted to the Y CbCr color space. A 2-level DWT is then applied to the luminance (Y ) and chrominance channels (Cb,Cr) by using the Haar kernel filter

$$ \varphi_{(j,k)} = 2^{\frac{j}{2}}\varphi.\psi(2^{j}x-k), $$
(1)

where j and k are the momentous point on the signal and x denotes the magnitude. This operation generates different levels of sub-bands, including \(LL_{{2}_{\varphi }}\) and \((HL, LH, HH)_{{2}_{\psi }}\). An up-scaling operation is then performed on the \(HL_{{2}_{\psi }}\), \(LH_{{2}_{\psi }}\) and \(HH_{{2}_{\psi }}\) sub-bands to match the size of \(LL_{{1}_{\varphi }}\). Let \(HL_{{2}_{\psi }}^{\uparrow }\), \(LH_{{2}_{\psi }}^{\uparrow }\) and \(HH_{{2}_{\psi }}^{\uparrow }\) respectively denote these up-scaled subbands. Next, a saliency map S is computed by taking the summation of the obtained sub-bands of both levels of DWTs:

$$ \begin{array}{@{}rcl@{}} S =\sum\limits_{u}\sum\limits_{v}(HL_{{1}_{\psi}}(u,v)+ LH_{{1}_{\psi}}(u,v)+ HH_{{1}_{\psi}}(u,v)+ \\ HL_{{2}_{\psi}}^{\uparrow}(u,v)+ LH_{{2}_{\psi}}^{\uparrow}(u,v)+ HH_{{2}_{\psi}}^{\uparrow}(u,v)). \end{array} $$
(2)

In order to generate a binary mask Ms, a simple thresholding process is applied to the saliency map S, i.e.,

$$ M_{s}(u,v) = \left\{\begin{array}{ll} 1 & S(u,v)\geq\mu; \\ 0 & otherwise, \end{array}\right. $$
(3)

where μ = S/N is the cumulative mean of map S. For further refinement on Ms, morphological erosion, holes filling and black-white open-area filtering are performed to remove small objects from the binary image. A closed operation (dilation-then-erosion) is then applied by using a disk structuring element (SE) with 19 neighborhoods (viz., diameter, d = 10). Subsequently, the resulting binary image \(M^{\prime }_{s}\) segments the salient (viz., foreground) objects in I by using

$$ I_{F} = I * M^{\prime}_{s}, $$
(4)

where ∗ denotes the element-wise multiplication operation, and the background is produced by subtracting the foreground from the image, i.e.,

$$ I_{B} = I * (1-M^{\prime}_{s}) $$
(5)

Finally, the mask \(M^{\prime }_{s}\) is applied on the watermark image W, i.e., \(W_{F} = W * M^{\prime }_{s}\) and \(W_{B} = W * (1-M^{\prime }_{s})\). The output of the intermediate steps are shown in Fig. 2.

3.2 Encryption and embedding of watermark

Each color channel of the host image I will be utilized to host one channel of the watermark W. Specifically, the chaotic symmetric crypto-model and the homomorphic encryption proposed by Okamoto-Uchiyama (denoted by OUHE) are applied to all RGB-channels of the foreground WF and background WB regions of the watermark W. The outputs are then embedded correspondingly into IF and IB. The embedding process is as follows:

Step 1::

The red channel from the foreground of the watermark image \({W_{F}^{R}}\) is embedded into the red channel from the foreground of the host image \({I_{F}^{R}}\) by using sub-band up-scaling and Haar wavelet based interpolated DWT-SWT approach. First, 1-level DWT is applied to \({I_{F}^{R}}\) and 1-level SWT is applied to \({W_{F}^{R}}\). The outputs are correspondingly denoted by \(\{LL, HL, LH, HH\}_{{I_{F}^{R}}}\) and \(\{LL, HL, LH, HH\}_{{W_{F}^{R}}}\). Down-scaling is then applied on all sub-bands \(\{LL, HL, LH, HH\}_{{W_{F}^{R}}}\) to attain the size of the sub-bands of \({I_{F}^{R}}\). A variant of the Arnold map [15] is then applied to shuffle the pixels in \(\{LL, HL, LH, HH\}_{{W_{F}^{R}}}^{\downarrow }\) by using the positive integer key k1 to generate \(\{LL^{\prime }, HL^{\prime }, LH^{\prime }, HH^{\prime }\}_{{W_{F}^{R}}}^{\downarrow }\) where v is the value at position (x,y):

$$ \left[\begin{array}{c} x^{\prime}\\ y^{\prime} \\ v^{\prime} \end{array}\right] = \left[ \begin{array}{ll} 1 \qquad c_{1} \quad\quad\quad\quad\quad {}c_{2} \\ c_{3} 1+c_{1}c_{3} {\kern21.5pt}c_{2}c_{3} \\ c_{4} c_{1}c_{2}c_{3}c_{4} 1+{\kern2pt}c_{2}c_{4}{\kern -.12pt} \end{array}\right]\left[\begin{array}{c} x\\ y\\ v \end{array}\right]\text{mod~}M. $$
(6)

Here, we set ci = k1, and M = 256 so that values are clipped to [0,255]. This scrambles the value at position (x,y) to position \((x^{\prime },y^{\prime })\) and changes the value from v to \(v^{\prime }\). The aforementioned processes are repeated for the red channel of the background of the watermark and host images, and the positive integer key k2 is used in generating \(\{LL^{\prime }, HL^{\prime }, LH^{\prime }, HH^{\prime }\}_{{W_{B}^{R}}}\). Finally, α and β are introduced to control the strength in embedding the shuffled sub-bands of the watermark by computing

$$ LL_{I_{F}^{R^{\prime\prime}}} = LL_{{I_{F}^{R}}} + \alpha * LL_{W_{F}^{R^{\prime}}}^{\downarrow} $$
(7)

and

$$ LL_{I_{B}^{R^{\prime\prime}}} = LL_{{I_{B}^{R}}} + \beta * LL_{W_{B}^{R^{\prime}}}^{\downarrow}. $$
(8)

The HL,HL and HH subbands are computed in the same manner. Note that, in general, αβ leads to higher imperceptibility. Furthermore, SWT− 1 is applied on the LL,LH,HL, and HH subbands of \(I_{F}^{R^{\prime \prime }}\) and \(I_{B}^{R^{\prime \prime }}\), which form the watermarked red channel \(I^{\prime \prime }_{R} = I_{F}^{R^{\prime \prime }}+ I_{B}^{R^{\prime \prime }} \). Subsequently, {k1,k2} are randomly embedded into the blue channel of I at locations \(\{\vec {x_{1}}, \vec {x_{2}}\}\). In addition, as a form of enhancement, more than one watermark can be embedded into the host image. However, the watermark is of lower bit-depth. Specifically, to embed 2, 4, and 8 watermarks, M = 128,64 can be adopted, respectively. However, when embedding more watermarks, the proposed method can only take watermark of lower quality (i.e., lower bit-depth) as the input.

Step 2::

The green channel from the watermark partitions \(\{{W_{F}^{G}}, {W_{B}^{G}}\}\) are encrypted by using TDES [28, 40] crypto model to achieve confidentiality. According to NIST [28], TDES has a security level between DES and AES. Three 56-bit keys, i.e., k3,k4 and k5 are utilized to encrypt \(\{{W_{F}^{G}}, {W_{B}^{G}}\}\) with modulo M = 256 (as mentioned in (6)) to produce \(\{W_{F}^{G^{\prime }}, W_{B}^{G^{\prime }}\}\). The ciphertext watermark segments are then embedded into the respective segments in the green channel of the host, i.e., \(\{{I_{F}^{G}}, {I_{B}^{G}}\}\). Note that k3k4k5, and they are randomly embedded in the blue channel of I at locations \(\{\vec {x_{3}}, \vec {x_{4}}, \vec {x_{5}}\}\). Finally, the same parameters α and β (introduced in Step 1) are utilized to control the embedding strength, i.e., \(I_{F}^{G^{\prime \prime }} = {I_{F}^{G}} + \alpha * W_{F}^{G^{\prime }}\) and \(I_{B}^{G^{\prime \prime }} = {I_{B}^{G}} + \beta * W_{B}^{G^{\prime }}\) to produce \(\{{I_{F}^{G}}", {I_{B}^{G}}"\}\), which are subsequently combined to form the green channel of the watermarked image \(I^{\prime \prime }_{G} = I_{F}^{G^{\prime \prime }}+ I_{B}^{G^{\prime \prime }} \).

Step 3::

The blue channel of the watermark partitions \(\{{W_{F}^{B}}, {W_{B}^{B}}\}\) are shuffled by MFPE [18]. The resulting encrypted partitions are embedded into the respective blue channel partitions of the host, i.e., \(\{{I_{F}^{B}}, {I_{B}^{B}}\}\). Specifically, in MFPE, every 2nd row and 2nd column of the partitions \(\{{W_{F}^{B}}, {W_{B}^{B}}\}\) are flipped horizontally (i.e., left to right) and vertically (i.e., up to down), respectively. Furthermore, a random permutation ϕ (viz., a PRNG series that shuffle the position of the pixels) is applied to \(\{{W_{F}^{B}}, {W_{B}^{B}}\}\) to produce \(\{W_{F}^{B^{\prime }}, W_{B}^{B^{\prime }}\}\). Furthermore, ϕ is stored as diagonal entries of the singular matrix S (having same dimensions of the original image) of a SVD decomposition. Next, OUHE is applied to diffuse the processed \(\{W_{F}^{B^{\prime }}, W_{B}^{B^{\prime }}\}\), which are later embedded into \(\{{I_{F}^{B}}, {I_{B}^{B}}\}\). First, we compute the following to generate public and private keys:

$$ n = p^{2}q, $$
(9)

where p and q are two large prime numbers. We then select g ∈{2,...,n − 1} that satisfies

$$ g^{p-1} \not \equiv 1 \text{ mod } p^{2} $$
(10)

and finally compute

$$ h \equiv g^{n} \text{ mod } n. $$
(11)

Hence, the generated public and private keys are {h,n,g} and {p,q}, respectively. Next, the pixel values are encrypted by using {h,n,g} by computing

$$ c\equiv g^{m}h^{r} \text{ mod } M, $$
(12)

where M is the modulo operation from Step 1, \(m \in W_{F}^{B^{\prime }} \bigcup W_{B}^{B^{\prime }}\), m < p and \(c \in W_{F}^{B"}\bigcup W_{B}^{B"}\). Further diffusion is performed by using \(I_{F}^{B^{\prime \prime }} = {I_{F}^{B}} + \alpha * W_{F}^{B^{\prime \prime }}\) and \(I_{B}^{B^{\prime \prime }} = {I_{B}^{B}} + \beta * W_{B}^{B^{\prime }}\) to jointly form \(I^{\prime \prime }_{B} = I_{F}^{B^{\prime \prime }}+ I_{B}^{B^{\prime \prime }}\). Subsequently, the final watermarked image is produced by computing \(I_{W} = I^{\prime \prime }_{R}+f^{\prime \prime }_{G}+I^{\prime \prime }_{B}\). A sample of watermarked image is shown in Fig. 2.

Fig. 2
figure 2

The intermediate images produced by the proposed watermarking method

3.3 Decryption and extraction of watermark

First, the proposed saliency detection method is applied to segment the watermarked image IW and the host image I into foreground and background partitions, denoted by \(\{I_{W_{F}}, I_{W_{B}}\}\) and {IF,IB}, respectively. The encrypted foreground and background of the image are extracted by computing

$$ W_{F}^{B^{\prime}} = \frac{I^{B}_{W_{F}}- {I_{F}^{B}}}{\alpha}, $$
(13)

and

$$ W_{B}^{B^{\prime}} = \frac{I^{B}_{W_{B}}- {I_{B}^{B}}}{\beta}. $$
(14)

Next, \(W_{F}^{B^{\prime }}\) is decrypted by using p and q by calculating

$$ A = \frac{((W_{F}^{B^{\prime}})^{ew(p-1)}\mod p^{2})-1}{p} $$
(15)

and

$$ B = \frac{(g^{p-1}\mod p^{2})-1}{p}, $$
(16)

followed by

$$ B^{\prime} = \delta(B)\mod p, $$
(17)

where δ(B) refers to the matrix containing the reciprocal of all elements in B.Footnote 1 Note that in (15), Xew(p) refers to the output of taking each element in X and raised it to the power of p (i.e., element-wise exponential function). Finally, the decrypted watermark \(M = W_{F}^{B^{\prime \prime }}\) is computed by using the corresponding decryption function for OUHE:

$$ M = (A \otimes \delta(B) )\mod p, $$
(18)

where ⊗ denotes the element-wise multiplication. In a similar manner, \(W_{B}^{B^{\prime \prime }}\) is obtained by replacing \(W_{F}^{B^{\prime \prime }}\) by \(W_{B}^{B^{\prime \prime }}\) in (15).

Furthermore, the SVD decomposition is applied to extract the random permutation order ϕ from the diagonal values of the singular matrix. Multi Flipping Permutation Decoding (MFPD) is then applied on both \(W_{F}^{B^{\prime \prime }}\) and \(W_{B}^{B^{\prime \prime }}\) to produce the decoded blue partitions \({W_{F}^{B}}\) and \({W_{B}^{B}}\}\), which are then put together to form the blue watermarked channel \(W^{B} = {W_{F}^{B}}+{W_{B}^{B}}\). Subsequently, \(W_{F}^{G^{\prime }}\) and \(W_{B}^{G^{\prime }}\) are extracted from the green watermarked partitions where

$$ W_{F}^{G^{\prime}} = \frac{I^{G}_{W_{F}}- {I_{F}^{G}}}{\alpha} $$
(19)

and

$$ W_{B}^{G^{\prime}} = \frac{I^{G}_{W_{B}}- {I_{B}^{G}}}{\beta}. $$
(20)

Moreover, the keys {k3,k4,k5} are extracted from locations {x3,x4,x5} in \({W_{F}^{B}}\) and \({W_{B}^{B}}\). The TDES decryption modulo 256 operation is performed on \(W_{F}^{G^{\prime }}\) and \(W_{B}^{G^{\prime }}\) to produce the decoded green partitions \({W_{F}^{G}}\) and \({W_{B}^{G}}\), which jointly form WG. Subsequently, the keys {k1,k2} are extracted from \({W_{F}^{B}}\) and \({W_{B}^{B}}\) at locations {x1,x2}. Then,

$$ W_{F}^{R^{\prime}} = \frac{I^{R}_{W_{F}}- {I_{F}^{R}}}{\alpha} $$
(21)

and

$$ W_{B}^{R^{\prime}} = \frac{I^{R}_{W_{B}}- {I_{B}^{R}}}{\beta} $$
(22)

are computed. Next, the inverse Arnold map is computed for \(W_{F}^{R^{\prime }}\) and \(W_{B}^{R^{\prime }}\) by using {k1,k2} to produce the decoded red partitions \({W_{F}^{G}}\) and \({W_{B}^{G}}\), which jointly form the red component of watermark \(W^{R}=W_{F}^{R^{\prime }}+W_{B}^{R^{\prime }}\). Finally, the extracted watermarked image \(W^{\prime }\) is formed by computing

$$ W^{\prime}=W^{R}+W^{G}+W^{B}. $$
(23)

An example of \(W^{\prime }\) is shown in Fig. 2(j).

4 Experiments

The proposed watermarking method is implemented in MATLAB 2020 running on a Core i7-7th Gen 7500u 2.9GHz processor with 16GB ram. The standard test images from the SIPI and the MSRA dataset (10K images) [4] are considered for evaluation purposes. Here, the MSRA images and watermarks are resized to 512 × 512 × 3 using the MATLAB function imresize. For all experiments conducted, α = 0.04 and β = 0.02 are set, which makes the background slightly blurry while the salient object remains completely imperceptible. We utilize the 24-bit image shown in Fig. 3 as the watermark and embed it into the host image. For evaluation purposes, we consider the following scenarios: (a) PSNR and aSSIM [4] of the watermarked image after embedding a watermark, i.e., 24-bit color image W or a binary image Wb, (b) aSSIM of the extracted W and, (c) normalized coefficient (NC) of the extracted Wb. Here, aSSIM refers to the average of the SSIM values for each of the RGB channels. In addition, for the case of binary watermark Wb, we embed the same binary watermark into all three color channels of the host image.

Fig. 3
figure 3

The watermark images used in the experiments. Each binary watermark is the most significant bitplane of the grayscale image of its color image counterpart

4.1 Saliency detection

First, the performance of the proposed salient object detection (SOD) method is evaluated. Some representative results are shown in Fig. 4. By visual inspection, it is verified that the proposed method is able to identify the salient objects (regions). It produces boundary that confines the salient object with a wider dynamic range. For example, for the bird image (3rd column), our method can detect the feather and beak completely and the detected objects are brighter than that of [4]. To quantify the results, the Mean Absolute Error (MAE), F1-score, Area Under Receiver Operating Characteristic (AUROC) are recorded in Table 1.Footnote 2 Specifically, the metrics are computed as follows:

$$ MAE=\frac{1}{M \times N} {\sum\limits_{x}^{M}}{\sum\limits_{y}^{N}} {|S(x,y)-G(x,y)|}, $$
(24)

where S and G are the saliency map and the ground truth, respectively. On the other hand, F1 is defined as:

$$ F1 =\frac{TP}{TP + 0.5 \times (FP + FN)}, $$
(25)

where TP, FN and FP are the true positive, false negative, and false positive, respectively. Likewise, specificity is defined as:

$$ FPR = 1 - \frac{TN}{TN+FP}, $$
(26)

where TN refer to the true negative. On average, 0.035, 0.790 and 0.900 are attained for MAE, F1 and AUROC, respectively. Here, we also consider the results for Peng et al.’s method [35] and Singh et al.’s method [46]. Results suggest that our proposed method outperforms these SOTA SOD methods by at least MAE≥ 66%, F1 ≥ 60%, and AUROC≥ 53%. Therefore, we conclude that our salient object detection method outperforms [4, 35] and [46].

Fig. 4
figure 4

Comparison of saliency map produced by our proposed method and Bhowmik et al.’s method [4] . The original images, saliency map by our proposed method and [4] are shown in the 1st, 2nd and 3rd row, respectively

Table 1 Performance of the generated saliency maps. Results are presented in the format of “proposed”/“Peng et al. [35]”/“Singh et al. [46]” method for MSRA dataset

4.2 Quality of watermarked image

The quality of the watermarked image after embedding a 24-bit color watermark W and binary watermark Wb is evaluated in terms of PSNR and aSSIM. The results are recorded in Table 2. The PSNR value of the watermarked image after embedding W and Wb are consistently ≥ 43dB and ≥ 54dB, respectively. It is noticed that, in comparison to embedding Wb, the quality of the watermarked image is significantly lower after embedding W. This is due to the fact that embedding W will affect all bitplanes in all channels of the host image, but in the case of embedding Wb, only one bitplane in each channel of the host image is modified. Besides embedding in all bitplanes of the RGB channels of the host image, the difference is also due to the precision error in floating point representation and its related operations. Similar observation is made for the case of aSSIM for embedding W and Wb. The mean aSSIM of the watermarked image for W is ≥ 0.9986, whereas aSSIM for Wb is ≥ 0.9999.

Table 2 Quality analysis of the watermarked image after embedding the color watermark W and the binary watermark Wb (format: PSNR/aSSIM) for the proposed method and [4, 23, 55]

To investigate how the quality of the watermarked image changes when using different parameter settings, results are collected by varying α and β within the range of [0.01,0.2]. The results are summarized in Fig. 5. In case of embedding W, the PSNR value ranges from 5 to 48dB, while the aSSIM range is 0.5612 to 0.9995. However, for embedding Wb, the observed PSNR value ranges from 15 to 56dB, and the aSSIM value ranges from 0.5789 to 0.9999. It is apparent that the quality of the watermarked image decreases when either α or β increases, and vice versa (Fig. 6).

Fig. 5
figure 5

PSNR and aSSIM graphs of the watermarked image with various α and β

Fig. 6
figure 6

Attacks on watermarked Leena (from SIPI) and Flower image (from MSRA dataset)

When compared to the conventional SOTA methods, we focus on the results collected for embedding Wb because the conventional methods embed a binary image as the watermark. Results in Table 2 suggest that the PSNR attained by our proposed method is the highest, i.e., + 18dB for the case of [4], and + 13 for both [23, 55]. On the other hand, the aSSIM value shows mixed performances. As compare to [23, 55], our proposed method performs better by 8%, but our proposed method is on par with [4]. It is noteworthy that our proposed method still outperforms [23, 55] when embedding W, i.e., a significantly larger payload (watermark) size. In addition, our proposed method (embedding W ) is marginally inferior in comparison to [4] (embedding Wb). These results suggest that, despite embedding highly diffused watermark images of the same dimension and bit-depth as the host image, the output watermarked image produced by our proposed method is of high quality. Hence, the proposed method produces high quality watermarked images as compared to the conventional saliency-based IW methods [4, 23, 55]. In addition, for the SOTA methods considered in this work [5, 21, 47], the binary watermark image is embedded into the LSB bitplane of the host images, which makes them vulnerable to malicious attacks hence less robust.

4.3 Robustness of the embedded watermark

This section reports the robustness of the embedded watermark by calculating aSSIM and Normalized Correlation (NC). Specifically, NC is computed as follows:

$$ NC=\frac{{\sum}_{i=1}^{M}{\sum}_{j=1}^{N}(W_{(i,j)}-\mu_{W})(W^{\prime}_{(i,j)}-\mu_{W^{\prime}})}{\sqrt{{\sum}_{i=1}^{M}{\sum}_{j=1}^{N}(W_{(i,j)}-\mu_{W})^{2}}\sqrt{{\sum}_{i=1}^{M}{\sum}_{j=1}^{N}(W^{\prime}_{(i,j)}-\mu_{W^{\prime}})^{2}}}, $$
(27)

where μW and \(\mu _{W^{\prime }}\) are the mean of the original watermark W and the extracted watermark \(W^{\prime }\), respectively. Specifically, the proposed method is evaluated by embedding watermark in two different ways, namely, a) embedding a 24-bit watermark W and, b) a binary watermark Wb. Various attacks including mean-filtering, median-filtering, shearing, noise, rotate, cropping and JPEG compression attacks are performed on the watermarked images for both scenarios and the results are recorded. Table 3 records the aSSIM value for the case of 24-bit watermark W, and Table 4 records the NC value for the case of binary watermark Wb.

Table 3 The aSSIM values of the extracted 24-bit color watermark
Table 4 NC results after applying various attacks on watermarked Leena

First, we consider scenario a) where a 24-bit watermark is embedded. When there are no attacks applied on the watermarked image, the aSSIM value is 0.9999, which suggests that the extracted watermarks are of high quality. When any form of attack is applied, the aSSIM value drops, but the average aSSIM for the extracted W remains high at 0.9300. The proposed method appears to be particularly robust against mean-filtering and noise attack. As expected, the aSSIM value for cropping is particularly low because certain image information is completely removed from the watermarked image. Since the proposed method is the only known method that embeds a 24-color image as the watermark, there are no existing methods for comparison purposes.

Next, we consider scenario b), where a binary watermark (Fruits and Male as shown in Fig. 3(b) and (d), respectively) is embedded, and the results are recorded in Table 4. Here, the majority vote strategy is adopted because three copies of the same watermark (one from each of the RGB channels) are available. In addition, the discussion here is based on the Leena (as watermarked) image because it is the only image that is commonly evaluated by all the existing methods considered for the comparison. Among the seven types of attacks, results suggest that Jiang et al.’s method [14] is the most robust IW method against both mean-filter and median-filter attacks, which is, on average, ≥ 6% higher than the rest. On the other hand, Zhang et al.’s method [55] outperforms other methods for the cases of rotation and JPEG-compression attacks, which is, on average, ≥ 5% higher than the rest. For the remaining three attacks, namely shearing, noise, and cropping, our proposed methods exhibits the highest resistance, with an average margin of 4%. It is noteworthy that the proposed method is originally designed to embed 24-bit watermark, but it has then been modified to embed binary image solely for comparison purposes, because most conventional methods can only embed a binary image as the watermark.

For completion of discussion, the aSSIM and NC values of the extracted watermarks are analyzed when considering different parameter values. The results are shown in Fig. 8 for various values of α and β. Similar to the observations on the quality of the watermarked image, the quality of the extracted watermark is higher when either α or β is small, and vice versa (see Figs.7 and 8). However, for security reasons, one should not always choose the smallest α and β when using the proposed method, although the best quality is achieved with these settings for both the watermarked image as well as the extracted watermark.

Fig. 7
figure 7

The first and third rows show the extracted color watermark W from the watermarked image Leena. The second and fourth rows show the extracted binary watermark Wb

Fig. 8
figure 8

The aSSIM and NC graphs for the extracted watermarks for various values of α and β

5 Conclusion

In this work, we proposed a salient-based image watermarking method. Specifically, we improve Bhowmik et al.’s salient object detection method [4]. Specifically, a salient object detection model is first proposed to extract the visually attentive area in the host image for generating a saliency mask. This mask is then applied to divide the foreground and background of the host and watermark images. Then, the red, green and blue watermark channels are encrypted by using Arnold, TDES and MFPE, respectively. Furthermore, the principal key is embedded in the singular diagonal of the blue channel that can be used subsequently to produce all dependent keys. Next, the blue channel is encrypted by using Okamoto-Uchiyama homomorphic encryption. These scrambled and encrypted watermark channels are then embedded into the respective host channels. Unlike the conventional method that hides a binary image into the host image, the proposed method can embed a 24-bit image of the same dimension as the host. In addition, more than one watermark can be embedded when such need arises, but at the expenses of a lower quality watermark. Analysis results also indicate that the proposed method outperforms SOTA methods in terms of imperceptibility, payload and robustness.

As future work, we want to analyze the effect of different wavelets and decomposition levels on the performance of proposed watermarking method. Furthermore, in case of multiple watermarks embedding, high resolution reconstruction of the watermark images along with contrast enhancement can be explored to extract high quality watermark.