1 Introduction

Image is arguably the most frequently generated and transmitted data among multimedia contents [1,2,3]. It is constantly generated through CCTV / surveillance / dashcam recording, in addition to smart devices such as smart phones and tablets [4]. To put numbers in context, in every minute, 136,000 images are uploaded to Facebook [5] while another 59,700 images are uploaded to Instagram [6]. Furthermore, in recent years, as big data processing and visualization techniques advance, more and more infographics are generated to convey simple and straight-to-the-point messages for complex topics [7]. Since images are valuable from various perspectives including commercial, health, safety, to name a few, researchers are interested in protecting them from unauthorized uses [8,9,10].

Image watermarking (IW) is a technique designed to insert (embed) a piece of information called watermark into the image of interest [8, 11]. The embedded watermark is commonly utilized for verification purposes when there is a dispute over ownership, or in other words, violation of copyright [12]. In most cases, the illegal user would process the image with the aim to destroy the embedded watermark. Therefore, it is imperative to design IW method that is robust. In addition, it is crucial to ensure that the embedded watermark does not cause noticeable distortion to the host image while hosting as much watermark information as possible [2, 3]. As such, over the past decades, researchers have designed and implemented a number of innovative IW techniques [13, 14] to achieve higher robustness, quality, and capacity, as well as extra features such as reversibility [15] and blind extraction [1, 8, 10, 16].

However, these methods typically exhibit trade-offs in terms of image quality, robustness and capacity constrains, where they optimize on one constrain while settling with the two other constrains. Crucially, the significance of each constraint relies on the application scenario of IW. Therefore, the trade-off between these three constraints has to be taken into account for IW, and such trade-offs inhibit the ubiquity of IW across various application scenarios.

In order to simultaneously achieve high embedding capacity, image quality, and robustness, one approach here is to leverage on different saliency regions of an image for a IW process. Therefore, in this paper, we propose a trade-off independent IW method that is able to embed a watermark which is of the same dimension, bit-depth, and number of color channels as the host image, e.g., 24-bit watermark image into a 24-bit host image of the same dimension. To achieve this, we design and implement an enhanced version of a structured matrix decomposition technique for salient object detection in an image. This technique is then used to realize a trade-off independent IW. Our proposed work contributions are summarized as follows:

  1. 1.

    First, we design an Enhanced Structured Matrix Decomposition (E-SMD) method, which significantly improves the saliency detection for the host image. Specifically, E-SMD precisely extracts the salient (i.e., eye catching) regions in the host image to produce a saliency mask. The resulting mask is then applied to segment the foreground and background regions of the host and watermark images. The segmented regions of the watermark image are shuffled independently, and the outputs are embedded into the wavelet domain of foreground and background regions of the host image.

  2. 2.

    Then, we extend the proposed E-SMD to operate in blind mode. To achieve this outcome, we generate an approximation of the original host image from the watermarked image, which is subsequently used in place of the original image during the watermark extraction process.

  3. 3.

    In terms of its capacity, the proposed method is able to embed a watermark of the same resolution and bit-depth of the host image, which realizes a trade-off independent IW.

  4. 4.

    Finally, we analyze the performance of our proposed IW method through comprehensive experiments. In terms of saliency detection, the E-SMD method outperforms the baseline method [17]. As for IW, for both non-blind and blind modes, the proposed method demonstrates all-round improvements in comparison with the benchmarked methods in terms of capacity, image quality and robustness against malicious attacks. In addition, the proposed method preserves the watermark from being destroyed in the approximated host image while blind watermark extraction scenario.

The outcomes of this work suggest that our proposed IW method simultaneously achieves high embedding capacity, image quality, and robustness which is a first of its kind in the domain of IW.

The rest of this paper is organized as follows: Section 2 reviews the state-of-the-art (SOTA) methods. Section 3 details the proposed saliency detection method and the watermark embedding process. Next, details the watermark extraction process, where a filtering method is put forward to estimate the original host image to facilitate the watermark extraction process (i.e., blind mode). Experimental results and discussions are detailed in the Section 4, Section 5, and Section 6 concludes this work.

2 Related work

There are various ways to the categorize IW methods, and one way is through the requirement of the original host image during watermark extraction [2, 3, 18, 19]. Specifically, to extract watermark from the watermarked image, non-blind methods [12, 14, 20,21,22,23,24,25] require the original host image, while blind methods [15, 16, 18, 26,27,28,29,30] need not refer to the original host image. While IW methods are improved over the years, the advancement and innovations can be broadly categorized into the following areas: (a) performing one or more suitable transformations; (b) better selection of venues to host watermark, and; (c) pre/post-processing (including cryptographic operations) on watermark before / after embedding. The following subsections review the recent developments in non-blind and blind IW methods.

2.1 Non-blind watermarking methods

Najafi et al. [12] and Zhou et al. [21] recently developed IW methods aiming to achieve higher robustness and imperceptibility via the combination of different transformations and singular value decomposition (SVD). Specifically, Najafi et al. [12] use sharp frequency localized contourlet transform (SFLCT) to decompose the host as well as the watermark images into the approximation and detailed sub-bands. Next, SVD is applied on the detailed sub-bands of both the host image, and detailed sub-bands of the watermark is embedded into detailed sub-bands of the host image, and vice versa. In Zhou et al.’s method [21], the discrete cosine transform (DCT) and DWT (i.e., DCT-DWT) as well as the discrete fractional random (DFRN) transformation with chaotic maps are applied to design a IW method. Specifically, DCT is applied on each \(8\times 8\) block of the LL subbands to generate the feature vector. Subsequently, watermark is embedded in the mid-frequency components of DFRN. On the other hand, Moosazadeh et al. [20] and Pourhadi et al. [22] contributed in venue selection for watermark embedding in order to enhance the embedding capacity and robustness. Specifically, Moosazadeh et al.’s method [20] uses self-learning mechanism and they embed watermark in the high frequency AC coefficients of the host image. Likewise, [22] utilize stationary wavelet transform (SWT) and optimization algorithm to speed-up the process of attaining higher robustness.

Recently, [23] and [24] developed IW methods which rely on salient object detection to achieve robustness. Specifically, Jiang et al. [23] presented tensor mode expansion (TME) based IW method and embeds watermark (\(64 \times 64\)) into DCT transformed SVD matrix of the host image (of dimension \(512 \times 512\)). On the other hand, Zhang et al. [24] apply sharp salient features in the LL subband of the contourlet transformation to embed pseudorandom binary sequence as watermark in the salient region of the host image.

2.2 Blind watermarking methods

Researchers also introduced innovative ways to extract the watermark without relying on the original host image. Roy et al. [26] and Hamidi et al. [27] recently developed blind IW methods to achieve high robustness and imperceptibility via innovative combinations of transformations. Specifically, Roy et al. [26] proposed to embed the shuffled watermark into the green and blue channels of the DCT transformed mid-frequency AC coefficients of the host image. On the other hand, in Hamidi et al.’s method [27], DCT is applied to the magnitude part of Discrete Fourier Transformed (DFT) host image. The Arnold-mapped watermark is then embedded into the low frequency coefficients of host image.

On other hand, innovations in venue selection for IW purposes have also been observed in the literature. Among them, SVD appears to be a commonly considered decomposition to achieve IW. For example, DWT and dual SVD based IW methods are independently proposed by [29, 31] to achieve high robustness and high embedding capacity. Specifically, the SVD is applied to the singular matrix of the DWT-SVD decomposed host image. The singular matrix of the watermark (\(64 \times 64\)-bit) is then embedded into the dual SVD of the HH sub-band of the host (of dimension \(512 \times 512\)). Similarly, Prabha et al. [16] embed the 2D-Logistic mapped watermark into the Walsh Hadamard Transform (WHT) and SVD (i.e., WHT-SVD) transformed host image. In Haghighi et al.’s IW method [28], lifting wavelet transformation (LWT) and genetic algorithm are combined to recover the lost watermark information due to tampering of the host. First, LWT and half-toning are applied on the host image. Subsequently, the Chebyshev transformation is applied to the LL approximation of the host image to embed the pseudorandom bits as watermark.

Besides that, Liu et al. [30] embed an Affine transformed watermark into the blocks (i.e., upper triangular matrix) of the Schur decomposition of the host image. However, both [30] and [32] are vulnerable [33] and low in terms of embedding capacity. Furthermore, Hu et al. [34] embed gray-scale watermark into the selected low frequency Zernike coefficients of the host image. The Zernike coefficients are considered because the Zernike quantized moments provide quality preserving compression of the watermark, which in turns contribute in terms of robustness against channel attacks. In addition to combined transformation and better venue selection, saliency detection [35] has also received attention in the community [36, 37]. Specifically, Liu et al. [36] first detect saliency by analyzing the Laplacian distribution of the wavelet coefficients in DWT and then embed the pseudorandom binary sequence watermark into visually sharp edges of the host image. Furthermore, Bhowmik et al. [37] presented a multi-level DWT decomposition based visual attention model to detect the salient object in the host image. The operations in [37] depend on the global maxima and mean calculations. Using alpha strength, watermark bits are then embedded into the subbands of host’s foreground and background.

2.3 Drawbacks

After analyzing the aforementioned SOTA non-blind and blind IW methods, the drawbacks are identified (as shown in Table 6), which motivate the development of our trade-off independent IW method. First, despite the non-blind methods [12, 23, 38] producing high-quality watermarked images, the SVD-based IW method has been argued to be vulnerable in nature [33, 39]. In addition, the watermark is not encrypted before embedding, which leads to low security in [23, 40, 41]. Secondly, although new venues are identified for watermark embedding, a major drawback in [20] and [22] methods is the low embedding capacity, e.g., only a \(32 \times 32\)-bit watermark can be embedded into \(512 \times 512\) grayscale host image, i.e., 1:16.

Furthermore, blind methods [26] and [27] are lacking in terms of embedding capacity, where only \(64 \times 64\) bits can be embedded into a \(512 \times 512\) host, i.e., 1:8. Moreover, in addition to the low embedding capacity, the methods [16, 29, 31] are also vulnerable [33]. In the case of saliency-based IW, [24, 36, 37, 42, 43] are also low in terms of embedding capacity. All-in-all, our analysis concludes that the conventional methods optimize one requirement while settling with the other two requirements. Therefore, a trade-off independent IW scheme needs to be developed.

3 Proposed watermarking method

Generally, the equally sized host (I) and watermark (W) images are segmented (to get the corresponding foreground and background regions) using the proposed E-SMD method. Figure 1 illustrates an overview of the proposed method. As seen in Stage A of this figure, first, the proposed E-SMD method segments the host and watermark images into their corresponding foreground and background regions (discussed in Section 3.1). Furthermore, the watermark’s foreground and background regions are embedded into host’s corresponding regions, using the proposed IW method (discussed in Section 3.2).

Fig. 1
figure 1

An overview of the proposed E-SMD method for trade-off independent multi-region IW, which comprises of three stages. In Stage A, we produce a saliency mask to partition the foreground and background of host image. In Stage B, the corresponding watermark regions are embedded into the wavelet domain of foreground and background regions of host image. In Stage C, the foreground and background regions of the watermark image are extracted from the foreground and background regions of host image. For blind extraction, an approximated host image is derived from the watermarked image instead of host image

First, we aim to improve the saliency detection method proposed by Peng et al. [17]. Specifically, [17] uses low rank regularization process on the features of image patches. Although SVD is applied, Peng et al. do not utilize the fine-details of the image, which is captured by the upper-triangular matrix U of the SVD. As a remedy, we propose E-SMD that collects the coarse and fine edges of an image. The E-SMD is the core part of our proposed IW method that dictates the host and watermark segments for the purpose of embedding. Specifically, the E-SMD method is implemented for multi-region watermarking to enhance resistance against the typical IW attacks [44, 45]. Simply said, we embed the watermark segments based on saliency of the host image. Prior to embedding, the watermark W is shuffled using multi-level random chaotic-maps scrambling. The shuffling process is introduced to scatter the watermark information throughout the host image. In fact, multiple chaotic maps are utilized to scramble the watermark image segments before embedding into the corresponding host image segments. As a result, the watermark is scrambled “differently” using different chaotic maps. This data scattering process leads to robustness against common IW attacks. Equations (6), (7), and (8) explain how the foreground and background regions of W are then inserted into the foreground and background regions of I, respectively.

Sections 3.1 and 3.2 provide detailed explanation (mathematically) along with the intermediate images for the better readability of the readers. The following subsections details these processes.

3.1 Salient object detection using E-SMD

Given a color host image I, its corresponding grayscale image A is computed. The coarse and fine image features are then extracted from A to form the segmentation mask \(\varSigma \). Stage A of Fig. 1 illustrates the processes performed in our proposed E-SMD-based salient object detection method.

Step 1: To extract the coarse features, the Gaussian gradient operator \(G_{\sigma }\) is applied on A to obtain the coarse image \(g = G_{\sigma } \bigotimes A\), where

$$\begin{aligned} G_{\sigma } = \frac{1}{\sqrt{2\pi \sigma ^2}}\exp \bigg (- \frac{x^{2} + y^{2}}{2\sigma ^{2}}\bigg ). \end{aligned}$$
(1)

Here, \(\sigma \) refers to the standard deviation, \(\bigotimes \) denotes the convolution operation and (xy) refers to the row and column of the image. The coarse-edges \(\epsilon \) is then extracted by applying the Canny edge detector to g, and its compliment is added to g to form \(E^{c} = g + (255-\epsilon )\). Likewise, the fine features of A are extracted by applying Laplacian filter \(L=[0,1,0;1,-4,1;0,1,0]\) and the output is then added to A, viz., \(E^f = A + A \bigotimes L\).

As an example, the grayscale image A, coarse image g, coarse-edge-enhanced image \(E^c\) and fine-edge-enhanced image \(E^f\) are shown in Fig. 2(a), (b), (c) and (d), respectively. Next, the coarse and fine edges are combined to produce the accumulated edges and the output is denoted by E where \(E = E^c + E^f\). The features in E are crucial for merging different sets of pixels to form patches and eventually a complete object. The accumulated-edge-enhanced image E of Fig. 2(a) is shown in Fig. 2(e).

Fig. 2
figure 2

The intermediate outputs of Step 1

Fig. 3
figure 3

The intermediate output of M in Step 4 of the proposed salient object detection method after performing the bilateral filtering and merging operations for the first five rounds

Step 2: The accumulated-edge-enhanced image E in Step 1 form the patches in A. To refine these patches, the variable-sized patch decomposition (VSPD) method is put forward and performed on each patch (Fig. 3). Specifically, the morphological dilation operation is performed on E, using the structuring element of \(SE = [1,0,1;0,1,0;1,0,1]\), to obtain the output image \(E^d\). To magnify the visual details in the patches, contrast enhancement is then applied to \(E^d\) using histogram equalization to produce the output \(E^e\). Each connected region in \(E^e\) forms a patch, denoted by \(n^j\) for \( j = 1, 2, \cdots .\) The corresponding output is shown in Fig. 4.

Step 3: The Gaussian summation [46] is performed on each patch \(n_j.{ Specifically},\,{ theoperationisperformedto}.0{ onerowofthepixelatatime}.{ Let}R_x{ denotethearrayofpixelsonthe}x\)-th row and let \(m_{j,x} = R_x \cap n_j.{ Unlessspecifiedotherwise},\,{ theunionandintersectionoperations}({ denotedby}`\cup \)’ and ‘\(\cap \)’) are performed for the position of the pixels instead of their value. Suppose \(m_{j,x} = \{p_1, p_2, \cdots , p_{d_j}\},\,{ where}d_j{ denotesnumberofpixelsin}m_{j,x}.{ Then},\,{ thepixelsin}m_{j,x}{} { aresortedinascendingorder}\{p_{s(1)}, p_{s(2)}, \cdots , p_{s(d_j)}\},\,{ where}p_{s(1)} \le p_{s(2)} \cdots \le p_{s(d_j)}.{ Next},\,\lfloor \frac{d_j}{2}\rfloor { numberofvaluesarecomputedbytakingthesums}\mu _k = p_{s(k)} + p_{s(d_j-k)}{} { for}k = 1, 2, \cdots , \lfloor \frac{d_j}{2}\rfloor .{ Thevaluesin}m_{j,x}{} { arethenupdatedbyusingtheresultingvalues}.{ Asanexample},\,{ suppose}m_{j,x} = \{99, 101, 103, 105, 111, 121, 141\}{} { aftersorting}.{ Fromthese}d_j = 7{ numbers},\,{ wecompute}\lfloor 7/2\rfloor = 3{ averagevalues}.{ Specifically},\,{ theaverageofthemaximumandminimumpixelvalues},\,{ i}.{ e}.,\,141+99 = 240,\,{ becomesthemiddlevalue}.{ Theremainingtwoaveragedvalues},\,{ i}.{ e}.,\,121+101 = 222{ and}111+103 = 214,\,{ areutilizedtofillupthearray}.{ Inotherwords},\,{ thearray}m_{j,x}{} { isupdatedas}m_{j,x} \leftarrow [222, 222, 222, 240, 214, 105, 105].{ Asanotherexamplewithevennumberofelements},\,{ if}m_{j,x} \leftarrow [99,101,103,105,111,121,141,145],\,{ itsupdatedto}[242,242,242,244,224,216,216,216]{ afterapplyingtheGaussiansummation}.{ Theoperationisrepeatedforallrowsofpixels}R_x{ forallpatches}n_j\).

Fig. 4
figure 4

Decomposition in multidimensional patches. Here, only the patches in the red channel are shown because those in the green and blue channels are exactly the same by construction

Step 4: Let \(m_j{ denotethepatch}n_j\) after performing the row-wise Gaussian summation in Step 3, viz.,

$$\begin{aligned} m_j \leftarrow \bigcup _{x}m_{j,x}. \end{aligned}$$
(2)

The image \(M = \bigcup _{j} m_{j}{} \) is then formed. Subsequently, an edge-preserving bilateral filtering operation is performed on \(M{ torefinetheboundaryofthepatches}m_j.{ Specifically},\,{ twofiltersareappliedinasequentialmanner},\,{ i}.{ e}.,\,M' = (M \bigotimes H_1) \bigotimes H_2,\,{ where}H_1=[1,1,0,1,1; 1,0,0,0,1;0,0,0,0,0; 1,0,0,0,1;1,1,0,1,1]{ and} H_2=[1,0,1;0,0,0;1,0,1].{ Thepatches}m_j{ arethenupdated},\,{ i}.{ e}.,\,m_j \leftarrow M' \bigcap m_j{ forall}j.{ Next},\,{ the}\theta = \arg \min \{m_j\}{} { iscomputedtosearchforthesmallestpatchsothat}\forall j, |m_{\theta }| \le |m_{j}|,\,{ where}|m_j|{ denotesthenumberofpixelsinthepatch}m_j.{ Theneighboringpatchesof}m_{\theta }{} { arethenidentified},\,{ and}m_{\theta }{} { ismergedwiththelargestneighboringpatch}m_{\lambda (\theta )}.{ Inotherwords},\,{ thepatch}m'_{\lambda (\theta )} \leftarrow m_{\theta } \cup m_{\lambda (\theta )}{} { isformed}.{ Subsequently},\,{ theimage}M\) is updated as follows:

Fig. 5
figure 5

The saliency map \(S{ andthebinarysegmentationmap}\varSigma \) for the image shown in Fig. 2(a)

$$\begin{aligned} M \leftarrow \left\{ \bigcup \{m_j\} \setminus m_{\theta } \setminus m_{\lambda {(\theta })}\right\} \cup m'_{\lambda (\theta )}, \end{aligned}$$
(3)

and the indices of the patches \(m_j\) are updated accordingly. The same operations (i.e., bilateral filtering and patch-merging) are repeatedly performed on the \(\{m_j\}{} { untilthenumberofuniquegrayscalevalueislessthanapredefinedthreshold}\tau _{v}{} \). Figure 3 shows the intermediate results after applying the first five rounds of the aforementioned operations.

Step 5: The output \(M{ fromStep}~4({ withthenumberofuniquecolors}< \tau _{v}){ isflipped},\,{ i}.{ e}.,\,M \leftarrow 255 - M.{ Subsequently},\,M \leftarrow (M \bigotimes H_3) \bigotimes H_4{ isperformedtoobtainthesaliencymap}S\) (see Fig. 5(a)), where \(H_3 = [1 1 0 1 1; 1 0 0 0 1; 0 0 1 0 0;1 0 0 0 1;1 1 0 1 1]/13{ and}H_4 = [2 0 2;0 2 0;2 0 2]/10.{ Asimplethresholdingprocessisthenperformedon}S{ toobtainthebinarysegmentationmask}{\varSigma }{} \), where

$$\begin{aligned} {\varSigma }(x,y) \leftarrow \left\{ \begin{array}{cl} 1 &{} \text {if }M(x,y) > \tau _{s};\\ 0 &{} \text {otherwise.} \end{array}\right. \end{aligned}$$
(4)

The binary segmentation map for \((.\tau _{v}, \tau _{s}) = (5, 0.5)\) is shown in Fig. 5(b).

3.2 Watermark embedding

Stage B of Fig. 1 illustrates the processes performed in our proposed watermark embedding method. Once the binary mask \({\varSigma }{} { ofthehostimage}I{ isobtained},\,{ itisappliedtothewatermark},\,{ whichisofthesamedimensionashostimage}I.{ Tofacilitatethediscussion},\,{ thebackgroundof}I{ isdefinedas}B_I = (1 - {\varSigma }) \cap I,\,{ andtheforegroundisdefinedas}F_I = {\varSigma } \cap I.{ Similarly},\,{ thebackground}B_W = (1 - {\varSigma }) \cap W{ andforeground}F_W = {\varSigma } \cap W{ aredefinedforthewatermarkimage}W.{ Tospreadthewatermarkthroughoutthehostimage},\,{ both}F_W{ and}B_W\) are scrambled using a 3D-Arnold map [22], followed by the row and column-wise shuffling. In other words, each pixel value, i.e., a triplet of \((r,g,b)\) values, is first processed as follows:

$$\begin{aligned} \left[ \begin{array}{c} r'\\ g'\\ b' \end{array} \right] = \Bigg ( \left[ \begin{array}{ccc} 1 &{} c_{1} &{} c_{2} \\ c_{3} &{} 1+c_{1}c_{3} &{} c_{2}c_{3} \\ c_{4} &{} c_{1}c_{2}c_{3}c_{4} &{} 1+c_{2}c_{4} \end{array} \right] \left[ \begin{array}{c} r\\ g\\ b \end{array} \right] \Bigg ) \text {mod }M, \end{aligned}$$
(5)

where \(c_i\)’s are positive numbers and \(M = 512.{ Next},\,{ theprocessedpixelvaluesareshuffledbyusingthelogisticmaptoproduce}F^e_W{ and}B^e_W.{ Here},\,{ thelogisticmapisdefinedas}d_{i+1} = d_{i}(1-d_{i}) f.{ Subsequently},\,F^e_W{ isembeddedintothe}F_I{ inthetransformedwaveletdomain}.{ Specifically},\,{ allcolorchannels}\kappa \in \{r,g,b\}{} { in}F^e_{W}{} { and}F_{I}{} \) are transformed using the Haar wavelet, and LL-subband of the corresponding channel of \(I\) is updated as follows:

$$\begin{aligned} LL'(F_{I,\kappa }) \leftarrow LL(F_{I,\kappa }) + \alpha \times LL(F^e_{W,\kappa }). \end{aligned}$$
(6)

Similarly, the LL-subband of \(B_I\) is updated as follows:

$$\begin{aligned} LL'(B_{I,\kappa }) \leftarrow LL(B_{I,\kappa }) + \beta \times LL(B^e_{W,\kappa }). \end{aligned}$$
(7)

Specifically, the wavelet subbands representing the foreground segments of the host image are modified to embed the foreground segments of the watermark image. Similarly, the wavelet subbands representing the background segments are modified to embed the background segments of the watermark image. Note that \(\alpha \)-blending is utilized to embed the watermark foreground into the foreground of the host image. Similarly, \(\beta \)-blending is utilized to embed the host background into the background of the watermark image. Note that \(0.01 \le \alpha , \beta \le 0.2{ andthesameparametervaluesarerequiredforwatermarkextraction}.{ Thewatermarkedforegroundimage}F'_I{ isdeterminedbyperformingthe}F'_{I,\kappa } \leftarrow iDWT(LL',LH,HL,HH)_{F_{I,\kappa }},\,{ andthewatermarkedbackgroundimage}B'_{I,\kappa }{} { isformedinasimilarmanner}.{ Thewatermarkedimage}I'_{\kappa }{} { isformedbycombining}F'_{I,\kappa }{} { and}B'_{I,\kappa }{} \) as follows:

$$\begin{aligned} I'_{\kappa }(x,y) = \left\{ \begin{array}{cl} F'_{I,\kappa }(x,y) &{} \text {if } {\varSigma }(x,y) = 1;\\ B'_{I,\kappa }(x,y) &{} \text {otherwise.} \end{array}\right. \end{aligned}$$
(8)

Finally, the image \(I' \leftarrow \{I'_{r}, I'_{g}, I'_{g}\}{} \) is output.

It is note worthy that salient region is considered for embedding the watermark because it is a region of importance. Therefore, even if an attacker is tampering the watermarked image, to a certain extent, he would want to maintain the quality of salient region. This indirectly adds robustness to all salient-based IW methods because it is likely that the attacker will retain the salient region, which is the main content of the watermarked image.

3.3 Watermark extraction (Non-blind and blind)

Stage C of the Fig. 1 illustrates the processes performed for non-blind and blind watermark extraction from the attacked watermarked image. First, we present the extraction steps when the original host image \(I{ isavailable}.{ Specifically},\,{ themask}{\varSigma }{} { iscomputedbyusing}I.{ Then},\,{ thesameforeground}F_I{ andbackground}B_I{ imagesareformedusingtheoriginalimage}I.{ Similarly},\,{ thewatermarkedforeground}F''_I{ andbackground}B_I''{} { imagesareextractedfromthewatermarkedimage}I''{} \), which could potentially be attacked. From (8), the watermark can be extracted by computing

$$\begin{aligned} LL(F^{e'}_{W}) = \frac{LL''(F_I) - LL(F_I)}{\alpha } \end{aligned}$$
(9)
$$\begin{aligned} LL(B^{e'}_{W}) = \frac{LL''(B_I) - LL(B_I)}{\beta }. \end{aligned}$$
(10)

Next, an estimate of the scrambled foreground watermark image \(F^{e'}_{W}{} { isreconstructedbycomputing}F^{e'}_{W} = iDWT[LL(F^{e'}_{W}), HL(F^{e}_{W}), LH(F^{e}_{W}), HH(F^{e}_{W})].{ Theinverseshufflingprocessandinverse}3{ DArnoldmapareappliedto}F^{e'}_{W}{} { toobtaintheextractedforegroundwatermark}F'_{W}.{ Thesameoperationsareperformedtoobtainanestimateofthebackgroundwatermark}B'_{W}.{ Finally},\,{ theextractedwatermark}W'{} \) is formed as follows:

$$\begin{aligned} W'(x,y) = \left\{ \begin{array}{cl} F'_W(x,y) &{} \text {if }{\varSigma }(x,y) = 1;\\ B'_W(x,y) &{} \text {otherwise.} \end{array}\right. \end{aligned}$$
(11)

On the other hand, when the original host image \(I{ isnotavailable},\,{ anapproximatedversionof}I,\,{ denotedby}\tilde{I},\,{ isconstructed}.{ Thepurposeofproducingtheapproximatedimage}({ thatisalmostequivalenttotheoriginalhostimage}){ istoallowextractionoftheembeddedwatermarkwithouttheoriginalhostimage}({ i}.{ e}.,\,{ blindmode}),\,{ fromapotentiallyattackedwatermarkedimage}.{ First},\,{ weset}\tilde{I} \leftarrow I''.{ Foreachcolorchannel}\tilde{I}_{\kappa }{} { for}\kappa \in \{r,g,b\}{} \), the following operation is performed:

$$\begin{aligned} \tilde{I}_{\kappa } = \tilde{I}_{\kappa } - \omega _{\kappa } \times \left( \dfrac{\max _{\kappa } - \mu _{\kappa }}{2}\right) , \end{aligned}$$
(12)

where \(\max _{\kappa }{} { denotestothemaximumvaluein}\{\tilde{I}_{\kappa }(x,y)\}{} { and}\mu _{\kappa }{} { denotestheaveragevalueof}\tilde{B}_{\kappa }.{ Inaddition},\,\omega _r =.40,\,\omega _g =.35,\,{ and}\omega _b = .25{ areset}.{ Subsequently},\,{ aGaussiansmoothing}3\times 3{ filterisappliedtotheresultingimage},\,{ i}.{ e}.,\,\tilde{I}_{\kappa } \leftarrow \tilde{I}_{\kappa } \bigotimes H_4,\,{ where}H_4 = [1,1,1;1,1,1;1,1,1]/9.{ Finally},\,\tilde{I} \leftarrow \{\tilde{I}_r, \tilde{I}_g, \tilde{I}_b\}{} { becomestheapproximateoftheoriginalhostimage}I\), and the same steps for handling the non-blind mode are performed to extract the watermark from \(I''{} \).

4 Experiments and results

The proposed IW method is implemented in MATLAB 2020 running on a Core i7-7th Gen 7500u 2.9GHz processor with 16GB of random access memory. The standard test images and the MSRA dataset (containing 10K images) are considered for the experiments purposes. It is noteworthy that the MSRA dataset comes with the ground truth, and we have produced our own ground truth for the standard test images, which could be accessed onlineFootnote 1. Here, the MSRA images are resized to \(512 \times 512 \times 3\) by using the MATLAB function imresize. Unless specified otherwise, (\(\tau _{v}, \tau _{s}) = (5, 0.5){ and}(\alpha , \beta ) = (0.04, 0.02)\) are set, which makes the background slightly blurry while the salient object remains completely imperceptible. In addition, two types of watermark images are considered, namely: (a) the 24-bit color image shown in the Fig. 6(a), and; (b) the MSB of the gray-scale of Fig. 6(b). For both scenarios, the watermark has the same dimension as the host image, i.e., \(512 \times 512 \times 3 (512 \times 512 \times 3 \times 8 = 6,291,456\) bits are embedded into channels of host image). The following sub-sections evaluate the proposed method against conventional and deep learning based SOTA approaches.

Fig. 6
figure 6

Watermark images

Fig. 7
figure 7

Comparison of saliency maps produced by the proposed method, Peng et al.’s method [17], Singh et al.’s method [47], and Zhang et al.’s method [24]. The first row shows the original color images. The second row shows the saliency maps produced by our proposed method. The third, forth and fifth rows show the saliency maps produced by [17, 47], and [24], respectively

4.1 Saliency detection

First, the performance of the proposed salient object detection method (Section 3.1) is evaluated. Fig. 7 shows the input images, and the salient maps produced by the proposed method are shown in the second row. Based on visual inspection, it is observed that the proposed method is able to identify salient object (viz., foreground) and produce a boundary resembling that of the ground truth. To quantify the results, the Mean Absolute Error (MAE), Weighted F-measure (WFM), Area Under Receiver Operating Characteristic (AUROC) are recorded in Table 1 for each salient map. On average, 0.035, 0.775 and 0.87 are attained for MAE, WFM, and AUROC, respectively.

Next, we compare our salient map to those produced by Peng et al.’s method [17], Singh et al.’s method [47], and Zhang et al.’s method [24], where the results are shown in the thrid, fourth, and fifth rows of Fig. 7, respectively. It is observed that, in general, all methods achieve the basic functionality of saliency detection. However, comparatively, the saliency map generated by the proposed method confines the salient region completely and renders a wider dynamic range. For example, consider the legs of the sofa chair (first column). They are clearly detected and connected to the seat in our saliency map, but not clearly captured by other methods. As another example, consider the flamingo’s head (fourth column), which is clearly segmented in our saliency map, while others either perceive the head as less-salient or the segmented head region contain textural noise. These observations are supported by the numerical results recorded in Table 1, which suggest that the proposed method achieves better saliency detection. Hence, we conclude that our saliency detection method outperforms Singh et al.’s [47], Peng et al.’s [17], Borji et al.’s [48], and Zhang et al.’s [24] method.

Last but not least, an ablation analysis is performed on the proposed E-SMD model, where we decompose the model into five components and observe the outcome attained by each component, one-by-one, on an incremental basis. As shown in Table 2, the ablation test is performed in five steps. First, feature extraction is performed to obtain both the coarse and fine features from the host image. From the first figure in Table 2, it can be observed that the salient object is highlighted over the background region. Subsequently, when feature extraction and patch decomposition phases are both performed, the boundaries of the salient object become more prominent over the background region. However, the low-rank patches are still surrounded by the salient object. In order to preserve saliency, low-rank regularization is performed, where patches are merged and the resulting image salient objects are starting to take better shape. Next, Bilateral filtering is applied in order to preserve the edges of the salient object while suppressing the background region, which makes the object more prominent. Finally, the saliency map is produced after applying the Laplacian filterings, followed by the thresholding process to generate the binary map. From Table 2, we can observe that the MAE value is consistently reduced from 5.330 to the final value of 0.037. On the other hand, although mixed trends are observed for WFM and AUROC, the values are, in general, improving (i.e., increasing) and the best results of 0.769 and 0.873, respectively, are observed when all five components are included. Hence, based on this ablation analysis, we conclude that each component included in the proposed model plays its role towards obtaining the final salient object.

Table 1 Performance of the generated saliency maps
Table 2 Ablation study of the Proposed E-SMD method on a sample image

4.2 Quality of watermarked image

First, we consider Scenario (a) where an 24-bit watermark image is embedded into the host image. Here, quality of the watermarked image is measured in terms of PSNR and aSSIM, i.e., the average SSIM for all three color channels [2, 3, 19]. It is observed that \(PSNR \ge 45{ dBand}SSIM =0.9999\) for all host images. This implies that, despite embedding an image of the same dimension and same bit-depth, the output watermarked image is of high quality. Note that both the proposed non-blind and blind methods produce the same watermarked image because these modes of operation only differ in the extraction process. For further comparison, we consider the deep learning based data hiding method put forward by [9]. Note that the Baluja’s method is not designed for the IW purposes, but it is capable of embedding an image of the same bit depth and resolution as the host (cover) image, hence we considered Baluja’s method for the comparison purpose. Specifically, Baluja’s method achieves an average of 41.2dB and 0.98 for PSNR and SSIM, respectively, which are lower than those achieved by our proposed IW method (i.e., \(\ge 45dB\) and SSIM = 0.9999).

Table 3 Comparison between proposed method \(^1\) and [12] \(^2\), [20] \(^3\), [21] \(^4\), [22] \(^5\), [24] \(^6\),[13] \(^7\), [26] \(^8\), [27] \(^9\), [28] \(^{10}{} \), [29] \(^{11}{} \), [30] \(^{12}{} \), [16] \(^{13}{} \), [34] \(^{14}{} \) in terms of quality (PSNR (db) and aSSIM) of the watermarked image when embedding a binary image as watermark into the host where no attacks are involved

Next, we consider Scenario (B), i.e., embedding a binary image into the a 24-bit host image. The results are recorded in Table 3 (see third column). Here, the binary watermark is first mapped to a corresponding 24-bit image by using two simple rules, i.e., \(0 \rightarrow (0,0,0){ and}1 \rightarrow (255,255,255).{ Therefore},\,{ thenewlymappedwatermarkimagehasonly}2{ possiblepixelvalues},\,{ namely},\,(0,0,0){ or}(255,255,255).{ Subsequently},\,{ thesamemechanisminScenario}~({ a}){ isappliedtoembedthewatermarkintothehostimage}.{ AlthoughaSSIMremainsat}0.9999,\,{ itisapparentthatthequalityofwatermarkedimageissignificantlyhigherintermsofPSNR}({ i}.{ e}.,\,\ge 55dB\)) when embedding a binary watermark image (as oppose to 24-bit image). The reason is that, for Scenario (b), less modifications are done to the host image when handling a binary watermark.

Table 4 Quality of the extracted 24-bit color watermark (aSSIM) for the proposed method

For comparison with the conventional methods, the results for [12, 20,21,22,23,24] (i.e., non-blind methods) as well as [16, 26,27,28,29,30] (i.e., blind methods) are also recorded in Table 3. Similarly, other SOTA methods [13, 34] are compared with the proposed method and the watermarked image quality related results are reported in the Table 3. Note that the above-mentioned methods are embedding a binary image (or bits) and hence the comparison is performed under Scenario (b). It is apparent that, when compared to the conventional IW methods, the proposed method produces high quality watermarked image despite the fact that it is embedding significantly larger amount of watermark payload (see 2nd column of Table 5). These observations suggest the image quality is well-preserved in our proposed method and the produced watermarked image is imperceptible. Results suggest that the proposed method outperforms conventional SOTA methods in terms of image quality. In addition, it is noteworthy that SOTA methods considered here are embedding a watermark which is of smaller dimension in comparison to the host image.

4.3 Robustness of embedded watermark

To verify the robustness of watermark embedded using our proposed method, various (image processing based) attacks are performed on the watermarked image. The attacks include mean-filtering, median-filtering, noise addition, rotation, cropping, JPEG lossy compression at quality factor set to 75, and shear. As a baseline, the embedded watermark is also extracted directly from the watermarked image without undergoing any form of attack. To quantify the robustness, aSSIM is considered for Scenario (a), while the bit error rate (BER) and Normalized Correlation (NC) are adopted for Scenario (b) [2, 3, 19]. Here, BER is defined as:

$$\begin{aligned} BER = \dfrac{\sum _{x}\sum _{y}|W'(x,y) - W(x,y)|}{R \times C}, \end{aligned}$$
(13)

where \(W'(x,y){ referstotheextractedwatermarkwhichcanpotentiallybeattackedand}R \times C\) refers to the dimension of the watermark. In addition, NC is defined as follows:

$$\begin{aligned} NC=\frac{\sum _{i}\sum _{j}(W_{(i,j)}-\mu _{W})(W'_{(i,j)}-\mu _{W'})}{\sqrt{\sum _{i}\sum _{j}(W_{(i,j)}-\mu _{W})^{2}}\sqrt{\sum _{i}\sum _{j}(W'_{(i,j)}-\mu _{W'})^{2}}}, \end{aligned}$$
(14)

where \(\mu _{W}{} { and}\mu _{W'}{} { arethemeansoftheoriginalwatermark}W{ andtheextractedwatermark}W'{} \), respectively.

The results for Scenario (a) are shown in Table 4. Overall, aSSIM \(\ge \) 0.9, implying a high similarity between the original and extracted watermark even after attack. As expected, when operating in blind mode (i.e., extract without using the original host image), the extracted watermark is consistently of lower quality in comparison to the results for non-blind mode. From another perspective, this outcome implies that when the original host image is available, the proposed method can utilize it to extract the watermark of higher quality. For illustration purpose, Fig. 8 shows the attacked-watermarked Lena images and the corresponding extracted watermarks.

Fig. 8
figure 8

Watermarked images and their attacked counterparts. The watermark extracted from attacked-watermarked image is shown in the corresponding column

Among the attacks, cropping and compression exert greater impact to the quality of the extracted watermark. This is an expected outcome because: (a) cropping directly removes data from the watermarked image, and; (b) JPEG compression causes a change in all pixel values as well as introducing discontinuity at the \(8\times 8\) block boundaries. Despite the changes to the pixel values caused by these attacks, the extraction of watermark, although incomplete, can still be achieved as illustrated in Fig. 8. Specifically, under the cropping attack, a small region of watermark image vanishes because part of the watermarked image has been removed. We also reported the average results collected by using MSRA dataset in the last row of Table 4. In general, the performance trend observed for MSRA dataset is similar to that of the individually reported images, and therefore, we omit the detailed discussions here. As a comparison, the extracted image in Baluja’s method [9] achieves an average aSSIM of 0.97 when no attacks are applied. This outcome is slightly better than our proposed method (i.e., 0.96 for blind mode). However and as described in the preceding subsection, the scope and method of [9] is for general data hiding and not specifically for IW purposes, with the ability to obfuscate the content of the hidden message (image). Apart from that, its robustness against common watermark attacks are not evaluated and correspondingly no results are reported.

On the other hand, the results for Scenario (b) are recorded in Table 5. It should be noted that a majority vote is applied after extracting the watermark since three copies of the watermark (one from each of the RGB channels) are available. Here, the results for Lena are presented for direct comparison purposes, followed by the average results of the MSRA dataset. Similar to Scenario (a), both NC and BER are better when operating in the non-blind mode. Likewise, cropping and compression are also observed to be exerting greater impact on the quality of the extracted watermark. Nonetheless, NC \(\ge 0.9147{ andBER}\le 0.1124\) are observed in the blind mode. When compared to the conventional methods, the proposed method shows the highest robustness against the noise and rotation attacks. For the mean, median, cropping and compression attacks, the proposed non-blind method is only inferior to the conventional methods by a small margin, i.e., on average,  0.04 and 0.11 for NC and BER, respectively. Overall, the results in Table 5 suggest that, the proposed blind method outperforms the conventional methods by an average margin of  0.02 and  0.10 for NC and BER, respectively.

For completion of discussion, we also recorded the average results obtained from 10,000 images in the MSRA dataset in last two rows of Table 5. Results suggest that the average performance of the proposed method (non-blind and blind) on the MSRA dataset agree, in general, with the performance observed for the Lena image. Hence, we omit the detailed discussions here, although it is noteworthy that the performance of non-blind is higher than that of blind, as expected, and the robustness against attacks is consistently high.

Table 5 Normalized correlation and bit error rate analysis for binary watermark extraction for the proposed and SOTA methods

4.4 Further analysis

In this section, we evaluate the performances of the proposed IW method when using different parameter settings for \(\alpha { and}\beta { aswellasthequalityoftheapproximatedoriginalimage}\tilde{I}{} \). Specifically, the PSNR, aSSIM, and NC (non-blind and blind) for the proposed method are collected by varying both \(\alpha { and}\beta { from}0.01{ to}0.2\). The results obtained from 10,000 images in MSRA are shown in Fig. 9. Interestingly, all graphs exhibit a common trend, where the considered evaluation metrics increase when both \(\alpha { and}\beta { decrease},\,{ althoughtherateofincrementvaries}.{ Therefore},\,{ itisapparentthatthesmallestpossible}\alpha { and}\beta { value}({ i}.{ e}.,\,0.01){ willyieldthebestoutcomeforqualityandrobustness}.{ However},\,{ itisnoteworthyothervaluesshouldbeconsideredbecausethe}\alpha { and}\beta { valuesarerequiredtoextractthewatermark}.{ Ifthesamevalues}({ e}.{ g}.,\,0.01){ isutilizedallthetime},\,{ thewatermarkcanbeextractedoramoretargetedattackcanbelaunched}.{ Therefore},\,{ foralltheexperimentsconductedinearliersections},\,{ weconsidered}\alpha = 0.04{ and}\beta = 0.02,\,{ whichdifferfrom}0.01\).

Fig. 9
figure 9

The graph of PSNR, aSSIM and NC (non-blind and blind) against various \(\alpha ,\beta \) values

Furthermore, for blind IW, the similarity between the original and approximated host images is calculated by using PSNR and aSSIM. The average PSNR value of the approximated host image is \(44.10{ dB}(\max = 44.48, \min = 43.60, \sigma = 0.2237),\,{ whichislowerthanthatofthewatermarkedimages}({ i}.{ e}.,\,>45db).{ Similarly},\,{ theaverageaSSIMvaluesoftheapproximatedhostimageis}0.9579 (\max = 0.9764, \min = 0.9361, \sigma = 0.0086),\,{ whichisalsolessthanthatofthewatermarkedimage}({ i}.{ e}.,\,0.9999\)). Since the quality of approximated host images are high, it is feasible to use them in the watermark extraction process, as demonstrated above.

Based on the results and observations presented above, we conclude that, in comparison to the considered conventional methods, the proposed method is able to produce watermarked image of higher quality, even when embedding more information as in Scenario (a). In terms of robustness, the proposed method also leads the performance for three out of six types of considered attacks, while no results were presented for the shear attack in the conventional methods. These suggest that all three main criteria in IW are simultaneously improved, i.e., quality, capacity, and robustness.

5 Discussion

One strategy here is to exploit multiple saliency regions of an image for a IW process in order to achieve high embedding capacity, image quality, and robustness all at the same time. As a result, in this research, we offer a trade-off independent IW approach capable of embedding a watermark of the same dimension, bit-depth, and number of colour channels as the host image. To do this, we build and implement an improved version of E-SMD approach for detecting prominent objects in images. This method is then applied to achieve trade-off independent IW. First, we developed an E-SMD approach that considerably enhances the host image’s saliency detection. E-SMD, in particular, accurately isolates the salient areas in the host image to provide a saliency mask. The generated mask is then used to segment the host and watermark’s foreground and background areas.The watermark image’s segmented sections are shuffled individually, and the outputs are inserted in the wavelet domain of the host’s foreground and background regions. The suggested E-SMD is then extended to work in blind mode. To accomplish this result, we build a watermarked image approximation of the original host image, which is then employed in place of the original image throughout the watermark extraction procedure. In terms of embedding capability, our approach embeds a watermark with the same resolution and bit-depth as the host, allowing for trade-off independent IW.

Finally, we conduct extensive tests to evaluate the performance of our proposed IW technology. The E-SMD technique beats the baseline method in terms of saliency detection [17]. In terms of IW, for both non-blind and blind modes, the suggested method outperforms the benchmarked techniques in terms of capacity, image quality, and resilience against malicious assaults. Furthermore, when evaluated under the blind watermark extraction scenario, the suggested technique prevents the watermark from being deleted in the estimated host image.

6 Conclusion

In this work, a IW method based on saliency detection is proposed. Specifically, the E-SMD is proposed to extract the salient areas in the host image for producing a saliency mask. The resulting mask is then applied to partition the foreground and background of the host and watermark images. The watermark is then shuffled by using chaotic maps, and the resulting shuffled-watermark is embedded into the wavelet domain of the host image. A filtering operation is also designed to approximate the original host image when it is not available, hence allowing the proposed IW method to operate in both non-blind and blind modes. In the best case scenario, a 24-bit image can be embedded as the watermark into another 24-bit image of the same dimension while maintaining an aSSIM of 0.9999. The proposed IW method also exhibits comparable performance for robustness, where it achieves leading performance for three out of six commonly applied watermark attacks. Experiment results also suggest that the proposed IW method outperforms the SOTA methods in terms of the image quality, capacity and robustness, hence achieving a trade-off independent IW method.

6.1 Limitations

The proposed IW method achieves subpar performances in terms of robustness against the mean and median filtering attacks, mainly due to the convolution processes involved. Therefore, the performance of IW could be highly dependent on the settings of specific parameters of the chaotic maps, which requires careful adjustments.

6.2 Future work

In the future, we plan to study how varying the decomposition level impacts our method’s quality, robustness, and capacity to address the limitations. In addition, we intend to explore the performance when employing different wavelets and more computationally efficient encryption algorithms for watermark pre-processing prior to embedding.