High payload watermarking based on enhanced image saliency detection

Nowadays, images are circulated rapidly over the internet and they are subject to some risk of misuses. To address this issue, various watermarking methods are proposed in the literature. However, most conventional methods achieve a certain trade-off among imperceptibility and high capacity payload, and they are not able to improve these criteria simultaneously. Therefore, in this paper, a robust saliency-based image watermarking method is proposed to achieve high payload and high quality watermarked image. First, an enhanced salient object model is proposed to produce a saliency map, followed by a binary mask to segments the foreground/background region of a host image. The same mask is then consulted to decompose the watermark image. Next, the RGB channels of the watermark are encrypted by using Arnold, 3-DES and multi-flipping permutation encoding (MFPE). Furthermore, the principal key used for encryption is embedded in the singular matrix of the blue channel. Moreover, the blue channel is encrypted by using the Okamoto-Uchiyama homomorphic encryption (OUHE) method. Finally, these encrypted watermark channels are diffused and embedded into the host channels. When the need arises, more watermarks can be embedded into the host at the expense of the quality of the embedded watermarks. Our method can embed watermark of the same dimension as the host image, which is the first of its kind. Experimental results suggest that the proposed method maintains robustness while achieving high image quality and high payload. It also outperforms the state-of-the-art (SOTA) methods.


Introduction
Image watermarking (IW) methods have been thoroughly researched by the signal processing community [10,17,34,44]. IW aims to insert a piece of data into the image of interest (host) so that the inserted data can be retrieved later from the potentially modified host to serve a specific purpose. IW has contributed in a variety of applications including copyright protection, image authentication, tampering detection and localization, to name a few [9,19,31]. Recently, IW has also been adopted to combat the poisoning of neural network by using malicious training images [37,39]. More importantly, nowadays anyone with a smart device, which is very affordable, can capture/copy, modify, and broadcast images at one's fingertip. Images can be forged to defame certain individual or to gain an upper hand over certain parties. Therefore, it is crucial to be able verify whether an image is genuinely coming from the claimed source [11,29].
While various IW techniques were proposed [12,53], these schemes focus on improving the image quality and/or robustness but they lack in capacity. Among the innovative IW proposals such as watermarking in the encrypted domain and strategic embedding in different frequency transformation domains [3,8,27,52], saliency based IW methods [4,23,55] are particularly interesting. It is because the saliency of an image is arguably the most eye-catching or interesting region, hence it is likely that the attacker will retain that region of the image. Therefore, researchers consider to embed the watermark in the salient regions of the image. However, such schemes lacks in terms of payload due to its limited embedding capacity or they focus on only improving one of the three aspect, i.e., image quality, payload, or robustness [1,2,13,20,50].
Therefore, this paper proposes a unique blend of saliency based robust IW and a homomorphic encryption to simultaneously improve both aspects of IW, namely, quality of the watermarked image and watermark capacity, which depart from the conventional IW methods. First, we extend Bhowmik et al.'s IW technique by (a) performing less DWT decompositions (i.e., only 2-level decomposition), and (b) exploiting the local and global minima (that represents the image background) for more accurate saliency detection purposes. Next, multiple morphological processes are applied to extract the background of the host image, which is subsequently used in extracting the foreground. Our method can embed a watermark that has the same dimension and bit-depth as the host image, which is the first of its kind. Our method also produces high quality watermarked image. Furthermore, each channel of the watermark is encrypted with Arnold, Triple DES (TDES), Multi Flipping Permutation Encoding (MFPE) and the keys are secured with OUHE [41,49]. Experiment results suggest that our proposed method outperforms the current state-of-the-art (SOTA) methods.
The proposed method can be deployed for copyright protection purposes. Furthermore, in case of multiple watermarks embedding, the proposed method can be enhanced for dual-authentication to realize hierarchical-integrity checking. For example, a wellestablished content creator/company can protect his/her digital properties (i.e., images) by using his/her watermark W 1 . A new small-scale startup can then purchase these images from the established company and further watermarked them with W 2 . In case of any malicious usage/attack, W 2 might be destroyed/effected but W 1 remains intact. This is when the hierarchical-copyright protection feature can contribute. This paper makes the following contributions: 1) embed an image-based watermark which is of the same dimension and bit-depth as the host image; 2) secure keys for decryptions by using OUHE and MFPE; 3) achieve high imperceptibility and robustness; 4) improve saliency detection, and; 5) enhancement to embed two or more watermarks (but of lower bit-depth) into one host image.
The rest of this paper is organized as follows: Section 2 reviews the related work, Section 3 details our proposed method, Section 4 presents the experiment results and finally, Section 5 concludes this work.

Related work
This section reviews two classes of image watermarking methods, namely, 1) transformation based IW methods, and 2) bitplane based IW methods.

Transformation based image watermarking methods
Our literature survey suggests that DWT is one of the most commonly adopted techniques to realize watermarking. For example, Qi et al. [38] proposed an IW method to achieve both robustness and quality. First, the host image is divided into 4 × 4 blocks and the resulting blocks are processed by applying a one-way hash function. Next, 1-level DWT transform is applied on each block to obtain the subbands LL, LH, HL, and HH. The watermark image is processed by using Mersenne Twister algorithm (MTA) and the output is subsequently embedded into the LL subbands. In addition, Chen et al. [7] proposed an IW method to embed a sequence of pseudo-randomly generated bits as the watermark into the DWT-LL subband of the host image. Similarly, Liu et al. [22] embed a watermark of size 64 × 64 bits into the DWT-LL subband of the host image.
Taking a different approach, Liu et al. [23] propose to divide the host image into N × N blocks, but instead, only the high intensity blocks (i.e., salient image region) are selected for watermarking purposes. Here, DWT is applied on each block to obtain the subbands. The watermark is then embedded into LL subband of the host image. Similarly, Bhowmik et al. [4] proposed a saliency detection based image watermarking method. First, the host image is converted into the YUV color space and it is then transformed by performing DWT. Upon performing the subbands intensity centering operation, a saliency map is produced, which is used in segmenting the foreground and background regions of the host image. Next, the background of the watermark (i.e., secret image) is embedded into the background of the host image with low strength. In contrast, the foreground of the watermark image is embedded in the foreground of the host image with high strength. Likewise, Zhang et al.'s method [55] also relies on saliency map to embed watermark. Here, the saliency map is produced from the host image by performing logarithmic quantization. Instead of using DWT, Contourlet Transformation (CT) is applied to the host image to get the approximated subbands, which are in-turn serving as the venues to host the watermark image. Another work based on CT is proposed by Najafi et al's. [32]. First, a 2-level Sharp Frequency Localized Contourlet Transformation (SFLCT) is applied to the host image to obtain the approximation and detailed subbands. 1-level SFLCT is also applied to the watermark image. Next, Singular Value Decomposition (SVD) is performed to the detailed subbands of the host and the watermark images. Finally, the S matrix of the subbands of the host image is replaced by the S matrix of the subbands of the watermark by using α-strength diffusion. There are also watermarking methods that are based solely on a single transformation using SVD [24,26]. However, this class of watermarking methods suffers from limited embedding capacity (i.e., restricted to the number diagonal entries in the S matrix) and low image quality when the embedding rate is high.
It should be noted that other transformations such as Discrete Cosine Transformation (DCT) [16] and Discrete Fourier Transformation (DFT) [33] have also been adopted to embed watermark. Interested readers may refer to these references.

Bitplane based IW methods
Bitplane based IW methods are usually innovated to serve specific purposes such as authentication and tampering detection. These enhancements are achieved in a few common ways, including (a) spreading the watermark across the host image by means of logistic mapping or permutation, (b) increasing capacity by reserving more bitplanes for watermark embedding purposes, and (c) embedding far less than 1 bit per pixel to achieve high watermarked image quality.
Specifically, Liu et al. [25] applied the logistic map to permute the (binary) watermark, then embed the processed randomized watermark via least significant bit (LSB) embedding.
To improve the quality of the watermarked image, Lin et al. [21] and Chang et al. [5] proposed to use absolute moment block truncation coding (AMBTC). Both methods embed the watermark into two LSBs of the host image. To enhance the embedding capacity, Yu et al. [54] reserve 2 bitplanes to embed the watermark.
To realize the application of authentication, Molina et al. [30] and Shehab et al. [42] proposed to embed two types of watermark bits, namely, 1) recovery bits (RB), and 2) authentication bits (AB). The RB's are embedded into the LSBs of the halftoned luminance component of the host image, followed by the embedding of AB's into the LSB of the host. Similar to Molina et al. [30], Tohidi et al. [51] and Chen et al. [6] are also embedding two types of watermark bits. However, they both require auxiliary data to operate, which reduces the embedding efficiency, i.e., number of changes performed per embedded watermark bit.
In addition, there are other innovative ways of using the bitplanes. For example, Singh et al. [45] proposed a fragile watermarking method to improve the quality of the watermarked image. First, the 4th to 8th bitplanes are divided into non-overlapping blocks to generate the ABs. Next, the host image is transformed by using DCT and the ABs are embedded into the LSBs of the selected AC components of the DCT-transformed host image. Furthermore, Sidiropoulos et al. [43] proposed an LSB based embedding method where the randomly generated watermark bits are embedded into the bitplanes of the host image. Similarly, Su et al. [48] scramble the watermark by using Arnold mapping and the bitplanes of the processed watermark are then embedded into the LSB bitplane of the host image. Prasad et al.'s method [36] also follows the same approach put forward by Su et al. [48] but they utilise Logistic map.
Although researchers have put forward various proposals for image watermarking purposes, based on our literature survey, it is noticed that the embedding capacity remains low due to the dependency between capacity and watermarked image quality. Therefore, in this work, we aim to improve both capacity and quality simultaneously.

Proposed methodology
Given host image, the salient object in the host color image I is first determined. Segmentation is then performed to obtain its foreground and background. Next, chaotic symmetric crypto-models and homomorphic encryption are applied to the RGB-channels of the watermark W and the outputs are embedded into I as shown in Fig. 1.

Salient object detection
We extend the work by Bhomwik et al. [4] by using less number of DWT (as opposed to ≥ 3 level) and considering only the subtraction of local maxima. Furthermore, in our work, a 2level DWT is applied and the local maxima is subtracted from the image minima to achieve better foreground segmentation. Moreover, our method applies threshold-based segmentation to the average saliency maps generated from the subbands (except LL subbands), which produces a more accurate binary mask to segment the foreground from the background. Specifically, I is first converted to the Y C b C r color space. A 2-level DWT is then applied to the luminance (Y ) and chrominance channels (C b , C r ) by using the Haar kernel filter In order to generate a binary mask M s , a simple thresholding process is applied to the saliency map S, i.e., where μ = S/N is the cumulative mean of map S. For further refinement on M s , morphological erosion, holes filling and black-white open-area filtering are performed to remove small objects from the binary image. A closed operation (dilation-then-erosion) is then applied by using a disk structuring element (SE) with 19 neighborhoods (viz., diameter, d = 10). Subsequently, the resulting binary image M s segments the salient (viz., foreground) objects in I by using where * denotes the element-wise multiplication operation, and the background is produced by subtracting the foreground from the image, i.e.,

Encryption and embedding of watermark
Each color channel of the host image I will be utilized to host one channel of the watermark W . Specifically, the chaotic symmetric crypto-model and the homomorphic encryption proposed by Okamoto-Uchiyama (denoted by OUHE) are applied to all RGB-channels of the foreground W F and background W B regions of the watermark W . The outputs are then embedded correspondingly into I F and I B . The embedding process is as follows: Step Here, we set c i = k 1 and The H L, H L and H H subbands are computed in the same manner. Note that, in general, α ≥ β leads to higher imperceptibility. Furthermore, SW T −1 is applied on the LL, LH, H L, and H H subbands of I R F and I R B , which form the watermarked red channel I R = I R F + I R B . Subsequently, {k 1 , k 2 } are randomly embedded into the blue channel of I at locations { x 1 , x 2 }. In addition, as a form of enhancement, more than one watermark can be embedded into the host image. However, the watermark is of lower bit-depth. Specifically, to embed 2, 4, and 8 watermarks, M = 128, 64 can be adopted, respectively. However, when embedding more watermarks, the proposed method can only take watermark of lower quality (i.e., lower bit-depth) as the input.
Step 2: The green channel from the watermark partitions {W G F , W G B } are encrypted by using TDES [28,40] crypto model to achieve confidentiality. According to NIST [28], TDES has a security level between DES and AES. Three 56-bit keys, i.e., k 3 , k 4 and k 5 are utilized to encrypt {W G F , W G B } with modulo M = 256 (as mentioned in (6)) to produce {W G F , W G B }. The ciphertext watermark segments are then embedded into the respective segments in the green channel of the host, i.e., {I G F , I G B }. Note that k 3 = k 4 = k 5 , and they are randomly embedded in the blue channel of I at locations { x 3 , x 4 , x 5 }. Finally, the same parameters α and β (introduced in Step 1) are utilized to control the embedding strength, i.e., I G where p and q are two large prime numbers. We then select g ∈ {2, ..., n−1} that satisfies and finally compute h ≡ g n mod n.
Hence, the generated public and private keys are {h, n, g} and {p, q}, respectively. Next, the pixel values are encrypted by using {h, n, g} by computing where M is the modulo operation from Step Fig. 2.

Decryption and extraction of watermark
and Next, W B F is decrypted by using p and q by calculating and followed by where δ(B) refers to the matrix containing the reciprocal of all elements in B. 1 Note that in (15), X ew(p) refers to the output of taking each element in X and raised it to the power of p (i.e., element-wise exponential function). Finally, the decrypted watermark M = W B F is computed by using the corresponding decryption function for OUHE: and and  Fig. 2(j).

Experiments
The proposed watermarking method is implemented in MATLAB 2020 running on a Core i7-7th Gen 7500u 2.9GHz processor with 16GB ram. The standard test images from the SIPI and the MSRA dataset (10K images) [4] are considered for evaluation purposes. Here, the MSRA images and watermarks are resized to 512 × 512 × 3 using the MATLAB function imresize. For all experiments conducted, α = 0.04 and β = 0.02 are set, which makes the background slightly blurry while the salient object remains completely imperceptible. We utilize the 24-bit image shown in Fig. 3 as the watermark and embed it into the host image. For evaluation purposes, we consider the following scenarios: (a) PSNR and aSSIM [4] of the watermarked image after embedding a watermark, i.e., 24-bit color image W or a binary image W b , (b) aSSIM of the extracted W and, (c) normalized coefficient (NC) of the extracted W b . Here, aSSIM refers to the average of the SSIM values for each of the RGB channels. In addition, for the case of binary watermark W b , we embed the same binary watermark into all three color channels of the host image.

Saliency detection
First, the performance of the proposed salient object detection (SOD) method is evaluated. Some representative results are shown in Fig. 4. By visual inspection, it is verified that the proposed method is able to identify the salient objects (regions). It produces boundary that confines the salient object with a wider dynamic range. For example, for the bird image (3rd column), our method can detect the feather and beak completely and the detected objects are brighter than that of [4]. To quantify the results, the Mean Absolute Error (MAE), F1score, Area Under Receiver Operating Characteristic (AUROC) are recorded in Table 1. 2 Specifically, the metrics are computed as follows: Fig. 4 Comparison of saliency map produced by our proposed method and Bhowmik et al.'s method [4] . The original images, saliency map by our proposed method and [4] are shown in the 1st, 2nd and 3rd row, respectively where S and G are the saliency map and the ground truth, respectively. On the other hand, F1 is defined as: where TP, FN and FP are the true positive, false negative, and false positive, respectively. Likewise, specificity is defined as: where TN refer to the true negative. On average, 0.035, 0.790 and 0.900 are attained for MAE, F1 and AUROC, respectively. Here, we also consider the results for Peng et al.'s Table 1 Performance of the generated saliency maps. Results are presented in the format of "proposed"/"Peng et al. [35]"/"Singh et al.  [46]. Results suggest that our proposed method outperforms these SOTA SOD methods by at least MAE≥ 66%, F1 ≥ 60%, and AUROC≥ 53%. Therefore, we conclude that our salient object detection method outperforms [4,35] and [46] .

Quality of watermarked image
The quality of the watermarked image after embedding a 24-bit color watermark W and binary watermark W b is evaluated in terms of PSNR and aSSIM. The results are recorded in Table 2. The PSNR value of the watermarked image after embedding W and W b are consistently ≥ 43dB and ≥ 54dB, respectively. It is noticed that, in comparison to embedding W b , the quality of the watermarked image is significantly lower after embedding W . This is due to the fact that embedding W will affect all bitplanes in all channels of the host image, but in the case of embedding W b , only one bitplane in each channel of the host image is modified. Besides embedding in all bitplanes of the RGB channels of the host image, the difference is also due to the precision error in floating point representation and its related operations. Similar observation is made for the case of aSSIM for embedding W and W b . The mean aSSIM of the watermarked image for W is ≥ 0.9986, whereas aSSIM for W b is ≥ 0.9999. To investigate how the quality of the watermarked image changes when using different parameter settings, results are collected by varying α and β within the range of [0.01, 0.2]. The results are summarized in Fig. 5. In case of embedding W , the PSNR value ranges from 5 to 48dB, while the aSSIM range is 0.5612 to 0.9995. However, for embedding W b , the observed PSNR value ranges from 15 to 56dB, and the aSSIM value ranges from 0.5789 to 0.9999. It is apparent that the quality of the watermarked image decreases when either α or β increases, and vice versa (Fig. 6).
When compared to the conventional SOTA methods, we focus on the results collected for embedding W b because the conventional methods embed a binary image as the watermark. Results in Table 2 suggest that the PSNR attained by our proposed method is the highest, i.e., +18dB for the case of [4], and +13 for both [23,55]. On the other hand, the aSSIM value shows mixed performances. As compare to [23,55], our proposed method performs better by 8%, but our proposed method is on par with [4]. It is noteworthy that our  proposed method still outperforms [23,55] when embedding W , i.e., a significantly larger payload (watermark) size. In addition, our proposed method (embedding W ) is marginally inferior in comparison to [4] (embedding W b ). These results suggest that, despite embedding highly diffused watermark images of the same dimension and bit-depth as the host image, the output watermarked image produced by our proposed method is of high quality. Hence, the proposed method produces high quality watermarked images as compared to the conventional saliency-based IW methods [4,23,55]. In addition, for the SOTA methods considered in this work [5,21,47], the binary watermark image is embedded into the LSB bitplane of the host images, which makes them vulnerable to malicious attacks hence less robust.

Robustness of the embedded watermark
This section reports the robustness of the embedded watermark by calculating aSSIM and Normalized Correlation (NC). Specifically, NC is computed as follows: where μ W and μ W are the mean of the original watermark W and the extracted watermark W , respectively. Specifically, the proposed method is evaluated by embedding watermark in two different ways, namely, a) embedding a 24-bit watermark W and, b) a binary watermark W b . Various attacks including mean-filtering, median-filtering, shearing, noise, rotate, cropping and JPEG compression attacks are performed on the watermarked images for both scenarios and the results are recorded. Table 3 records the aSSIM value for the case of 24-bit watermark W , and Table 4 records the NC value for the case of binary watermark W b . First, we consider scenario a) where a 24-bit watermark is embedded. When there are no attacks applied on the watermarked image, the aSSIM value is 0.9999, which suggests that the extracted watermarks are of high quality. When any form of attack is applied, the aSSIM value drops, but the average aSSIM for the extracted W remains high at 0.9300. The proposed method appears to be particularly robust against mean-filtering and noise attack. As expected, the aSSIM value for cropping is particularly low because certain image information is completely removed from the watermarked image. Since the proposed method is the only known method that embeds a 24-color image as the watermark, there are no existing methods for comparison purposes.
Next, we consider scenario b), where a binary watermark (Fruits and Male as shown in Fig. 3(b) and (d), respectively) is embedded, and the results are recorded in Table 4. Here, the majority vote strategy is adopted because three copies of the same watermark (one from each of the RGB channels) are available. In addition, the discussion here is based on the Leena (as watermarked) image because it is the only image that is commonly evaluated by all the existing methods considered for the comparison. Among the seven types of attacks, results suggest that Jiang et al.'s method [14] is the most robust IW method against both mean-filter and median-filter attacks, which is, on average, ≥ 6% higher than the rest. On the other hand, Zhang et al.'s method [55] outperforms other methods for the cases of rotation and JPEG-compression attacks, which is, on average, ≥ 5% higher than the rest. For the remaining three attacks, namely shearing, noise, and cropping, our proposed methods exhibits the highest resistance, with an average margin of 4%. It is noteworthy that the proposed method is originally designed to embed 24-bit watermark, but it has then been modified to embed binary image solely for comparison purposes, because most conventional methods can only embed a binary image as the watermark. For completion of discussion, the aSSIM and NC values of the extracted watermarks are analyzed when considering different parameter values. The results are shown in Fig. 8 for  However, for security reasons, one should not always choose the smallest α and β when using the proposed method, although the best quality is achieved with these settings for both the watermarked image as well as the extracted watermark.

Conclusion
In this work, we proposed a salient-based image watermarking method. Specifically, we improve Bhowmik et al.'s salient object detection method [4]. Specifically, a salient object detection model is first proposed to extract the visually attentive area in the host image for generating a saliency mask. This mask is then applied to divide the foreground and background of the host and watermark images. Then, the red, green and blue watermark channels are encrypted by using Arnold, TDES and MFPE, respectively. Furthermore, the principal key is embedded in the singular diagonal of the blue channel that can be used subsequently to produce all dependent keys. Next, the blue channel is encrypted by using Okamoto-Uchiyama homomorphic encryption. These scrambled and encrypted watermark channels are then embedded into the respective host channels. Unlike the conventional method that hides a binary image into the host image, the proposed method can embed a 24bit image of the same dimension as the host. In addition, more than one watermark can be embedded when such need arises, but at the expenses of a lower quality watermark. Analysis results also indicate that the proposed method outperforms SOTA methods in terms of imperceptibility, payload and robustness. As future work, we want to analyze the effect of different wavelets and decomposition levels on the performance of proposed watermarking method. Furthermore, in case of multiple watermarks embedding, high resolution reconstruction of the watermark images along with contrast enhancement can be explored to extract high quality watermark.

Data Availability
The datasets analyzed during the current study are available in the Google drive repository -https://tinyurl.com/5p5zpwhw Declarations Mentioned authors have no conflict of interest upon this article. This article does not contain any studies with human participants or animals performed by any of authors.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.