1 Introduction

Recent advances in information technologies have enabled users to access, manipulate and distribute digital multimedia easily, allowing massive production and sharing of digital data. However, issues have arisen regarding the protection of intellectual property because the current technology also facilitates unauthorized copying and illegal distribution of multimedia. To overcome these issues, security approaches such as encryption, watermarking, and perceptual hashing have been reported in the literature.

Digital watermarking has evolved very quickly and is gaining more and more interest in practical applications. Besides copyright protection, digital watermarking has been introduced in digital copy tracking, broadcast monitoring, steganography and data authentication. There are two classes of digital watermarking; multi-bit and one-bit watermarking. In multi-bit watermarking, the watermark consists of a sequence of bits representing a meaningful information such as an ID or a binary logo. In this case, the role of the decoding scheme is to extract, bit by bit, the full version of the watermark in order to recover the hidden information [1, 7, 15, 16, 24,25,26,27, 35, 37, 39, 40, 43]. In one-bit watermarking, however, the watermark serves as a verification code where the role of the detector is to check the presence/absence of the watermark [2, 13, 21, 22, 30]. In practice, one-bit watermarking can be used in copy detection, copyright protection, and broadcast monitoring. The key idea of watermark embedding is to introduce controlled modifications to all or some selected samples of the host data. These modifications can be performed in the spatial domain or in the transform domain. Although spatial domain methods are simple and easy to apply and implement, embedding in the transform domain provides higher performance in terms of imperceptibility and robustness. Commonly used transforms include the Discrete Wavelet Transform (DWT), the Discrete Cosine Transform (DCT), the Singular Value Decomposition (SVD) and the Discrete Fourier Transform (DFT). Due to its desirable features, especially the ability to exploit the Human Visual System (HVS) characteristics in a better way, the DWT is viewed as one of the most broadly used and studied domain in the field of digital watermarking. However, this transform has two major drawbacks: (i) lack of shift invariance, which means that small shifts in the input data leads to major variations in the distribution of the energy between DWT coefficients at different scales; and (ii) poor directional selectivity for diagonal features [20].

To overcome these limitations, Kingsbury has derived a new kind of wavelet transform called the Dual-Tree Complex Wavelet Transform (DTCWT) [19] which combines desirable properties from the DWT and the Complex Wavelet Transform (CWT), namely: (i) nearly shift invariance; (ii) good directional selectivity; (iii) Perfect reconstruction; (iv) Limited redundancy; and (v) low computational complexity [20]. Due to these advantageous properties, the DTCWT has become an attractive embedding domain for designing efficient watermarking systems. The first work in this context has been proposed by Loo and Kingsbury [29] and then many works have built upon the idea of DTCWT domain for watermarking images [3, 7, 12, 21, 27, 31, 43] and videos [4, 5, 9, 34]. In image watermarking, most of the work published in the literature is concerned with the multibit approach [3, 7, 12, 27, 31, 43] and to the best of our knowledge, very little effort has been put on one-bit watermarking [21]. The only work that is worth mentioning here was reported in [21] where the authors have proposed two blind additive watermark detection structures in the DTCWT domain. The authors have first demonstrated that the concatenated real and imaginary components of the DTCWT detail sub-bands can be statistically modeled by the Generalized Gaussian Distribution (GGD). Then, they adjusted a Likelihood-ratio based detector, initially proposed in [13] and the Rao detector as reported in [33] to operate in the DTCWT domain. The authors have found that the Rao-based detector is more practical and provides better results than the Likelihood-ratio based detector. In video watermarking, a number of one-bit watermarking techniques have been published [4, 5, 9, 34]. Recently, Asikuzzaman et al. [5] have presented three versions of a blind additive watermarking algorithm to combat illegal video distribution. The watermark is additively embedded in all the 3rd level DTCWT sub-bands of the video chrominance channel and the detection was carried out using a normalized cross correlation rule. In their first version, the authors built upon a previous work published in [4], to detect the watermark only from the sub-bands where it was originally embedded (i.e. the sub-bands of level 3 of the DTCWT decomposition). The second version was designed to resist the downscaling in resolution attack, by extracting the watermark from any level of DTCWT decomposition depending on the downscaling resolution rate, rather than extracting it from the sub-bands of the 3rd level. Unlike these two versions that use a symmetric key approach, the third version is based on a keyless detection approach where the watermark can be detected by only using information extracted from the frames. This version can resist temporal de-synchronization attacks, such as frame dropping, frame insertion or frame rate conversion.

In this paper, a blind additive watermarking system for still images operating in the DTCWT domain is proposed. In order to overcome the problem of controlling the watermark imperceptibility in additive watermarking, a new perceptual masking model is proposed. This model builds upon the work of [44], but adjusted here to operate in one-bit additive watermarking in the DTCWT domain. Note that the system developed in [44] is a multi-bit watermarking scheme using a multiplicative rule and operating in the DWT domain. The proposed model exploits HVS characteristics, namely: the frequency band sensitivity, the brightness masking and the texture masking to quantify the amount of unnoticeable changes in the DTCWT domain. It is worth mentioning that there has not been any masking model reported in the literature exploiting the aforementioned characteristics and operating in the DTCWT domain. At the watermark detection stage, we have introduced and adapted a well known watermark detector, which is based on the Rao-Test. As known, the performance of this detector relies heavily on the statistical modeling of the host data. Therefore, the DTCWT coefficients are modeled by a GGD as suggested in previous works. Extensive experiments have been carried out to assess the performance of the proposed system and results show its efficiency in terms of imperceptibility and robustness with a clear superiority over related schemes. Also, through experiments, we have demonstrated that it is possible to achieve a good detection performance with fixed GGD parameters rather than estimating them for each image. This reduces the computational complexity at the detection stage.

The rest of the paper is structured as follows. Section 2 provides a brief introduction to the DTCWT. Section 3 describes the proposed watermarking system. Experimental results are reported and discussed in Section 4. Conclusions are drawn in Section 5.

2 Introduction to the dual tree complex wavelet transform

The Dual Tree Complex Wavelet Transform was first introduced by Kingsbury [19]. This transform has gained a special attention because it exhibits the desirable properties of the DWT and CWT. That is, perfect reconstruction, computational efficiency, approximate shift invariance and directionally selective filters [20]. Instead of using one filter tree in the original DWT, the DTCWT uses two filter trees to produce two sets of coefficients which can be combined to obtain complex coefficients. In practice, the DTCWT is implemented by using two real DWTs that use different sets of filters. The first DWT generates the real part of the transform while the second DWT gives the imaginary part. This makes this transform redundant with a factor of 2d for d-dimension signals. To obtain the inverse of the DTCWT, the real part and the imaginary part are each inverted using the inverse of each of the two real DWTs to get two real signals. These two signals are then averaged to reconstruct the final signal [36].

In the case of digital images (i.e. 2-D signals), the DTCWT generates two complex low frequency sub-bands and six high frequency complex sub-bands at each level of decomposition, representing the outputs of six directional filters oriented at angles of ± 15, ± 45 and ± 75 [9] (Fig. 1). Mathematically, the low frequency coefficients can be expressed by

$$ x(\lambda, L, u, v) = Re(x(\lambda, L, u,v)) + \text{j}Im(x(\lambda, L, u,v)); \text{where :} L \in \lbrace L_{1}, L_{2}\rbrace $$
(1)

and the high frequency coefficients can be written as

$$ x({\Lambda}, \theta, u, v) = Re(x({\Lambda}, \theta, u,v)) + \text{j}Im(x({\Lambda}, \theta, u,v)) $$
(2)

for

$$\begin{array}{@{}rcl@{}} \theta &\in&\lbrace-75, -45, -15, + 15, + 45, + 75\rbrace\\ 0 &\leq& u\leq \frac{N}{2^{\Lambda}}-1\\ 0 &\leq& v\leq \frac{M}{2^{\Lambda}}-1 \end{array} $$

where Re(.) and Im(.) are the real and the imaginary parts, respectively. L1 and L2 represents the low-frequency sub-bands obtained from the first and the second tree of decomposition, respectively. Λ is the decomposition level. 𝜃 is the direction of the sub-band. N and M represent the size of the input image. The variables u and v indicate the location of the coefficient in each sub-band.

Fig. 1
figure 1

Example of the 2-level DTCWT sub-bands structure

The DTCWT has been introduced in many image processing applications such as image denoising, classification, segmentation and sharpening, digital watermarking, textures analysis and synthesis, etc. In the field of watermarking, the nearly shift invariance property of the DTCWT is particularly important since the watermark can resist geometric distortions. Also, the DTCWT offers powerful perceptual characteristics as it exhibits better directional sensitivity in high frequency sub-bands when compared to the DWT, hence, offering higher imperceptibility of embedded watermarks [28].

3 Proposed watermarking system

As depicted in Fig. 2, the proposed watermarking system comprises two parts: watermark embedding and watermark detection. At the embedding stage, the 2-D binary watermark is first decomposed using a 1-level DTCWT and the obtained high frequency coefficients are embedded in the high frequency coefficients of the DTCWT transformed image by using an additive rule. To overcome the drawback of the additive rule in controlling the amplitude of the inserted watermark, a new visual masking model is used. In the detection phase, the high frequency coefficients of the 1-level DTCWT transformed candidate watermark along with the high frequency DTCWT coefficients of the watermarked image are presented to the watermark detector in order to verify the presence of the candidate watermark.

Fig. 2
figure 2

Block diagram of the processes of the proposed watermarking system

3.1 Proposed masking model

In the literature, little research has been devoted to the use visual masking models in DTCWT watermarking. These models have been applied on still images [27, 29] and and video [5, 9]. Liu et al. [27] have adopted the perceptual masking model proposed which was initially proposed for DWT coefficients in [23]. However, the technique was not blind as it requires the original image at detection. In this work, we propose a new perceptual masking model in the DTCWT domain for blind additive image watermarking. This model builds upon the idea of the Just Perceptual Weighting (JPW) presented in [44] which exploits three HVS characteristics, namely: band sensitivity, local brightness and texture masking. More precisely, the proposed model combines a spatial frequency sensitivity function, a brightness masking function and a texture masking function to compute a weight for each DTCWT coefficient of the image. This weight describes the amount of changes that can be introduced in the DTCWT coefficients triggering the sensitivity of the HVS. The weight value for a coefficient x(Λ, 𝜃, u, v), is formulated as follows

$$\begin{array}{@{}rcl@{}} vm[x({\Lambda},\theta,u,v)] &=& SF({\Lambda},\theta)^{a}LB({\Lambda},\theta,u,v)^{b}\left( [TM(Re(x({\Lambda},\theta,u,v)))]^{c}\right.\\ &&\left.+ \text{j}[TM(Im(x({\Lambda},\theta,u,v)))]^{c}\right) \end{array} $$
(3)

where SF(Λ, 𝜃) represents the spatial frequency for a sub-band (Λ, 𝜃); LB(Λ, 𝜃, u, v) is the local brightness for a coefficient x(Λ, 𝜃, u, v); and TM(Λ, 𝜃, u, v) is the texture masking adjustment of a coefficient located in position (u, v) in the sub-band (Λ, 𝜃). The the parameters a, b and c are obtained through extensive experiments and the optimal values obtained are a = b = 0.25 and c = 0.02.

3.1.1 Spatial frequency sensitivity

It is known that the HVS is sensitive to patterns and textures which can be perceived as spatial frequencies. Furthermore, this sensitivity has been shown to be dependent on the orientation of texture. Particularly, the HVS is more sensitive to vertical and horizontal lines and edges in an image than those with a 45-degree orientation [10]. Normally, the spatial frequency response is described by the sensitivity to luminance contrast as a function of spatial frequency, and this is referred to as the Contrast Sensitivity Function (CSF). In the case of DWT, a CSF is usually implemented by assigning a single value to each sub-band. This represents a frequency weighting factor that describes the average sensitivity of the HVS for the covered frequency range [32].

In this work, we propose to use the CSF model proposed by Hill et al.[14] for the DTCWT, so that a single value for the frequency weighting factor is assigned to each DTCWT sub-band. Note that the real and imaginary parts will receive the same value since the frequency factor depends on the decomposition level and the sub-band orientation only. The values of each DTCWT sub-band are reported in Table 1.

Table 1 The values of SF [14]

3.1.2 Local brightness masking

According to Barni et al. [6], the human eye is less sensitive to modifications that occur in very dark and very bright areas. This characteristic has been introduced to design perceptual models, especially in the DWT domain [6, 42], where the local brightness is exploited by using the approximation sub-band. In this work, we propose to evaluate local brightness of DTCWT sub-bands at a given level based on the magnitude of the low frequency sub-bands of that level. A mathematical formulation is given by

$$ LB(\lambda,\theta,u,v) = \left\{ \begin{array}{ll} 1 + L^{\prime}(\lambda,\theta,u,v), & \text{if }\theta \in\lbrace+ 15^{\circ}, + 45^{\circ}, + 75^{\circ}\rbrace \\ 1 + L^{\prime\prime}(\lambda,\theta,u,v), & \text{if }\theta \in\lbrace-15^{\circ}, -45^{\circ}, -75^{\circ}\rbrace \end{array} \right. $$
(4)

with

$$ \begin{array}{ll} L^{\prime}(\lambda,\theta,u,v) = \left\{ \begin{array}{ll} 1 - \mid x(\lambda,L_{1},u,v)\mid, & \text{if }\mid x(\lambda,L_{1},u,v)\mid < 0.5; \\ \mid x(\lambda,L_{1},u,v)\mid, & \text{otherwise}. \end{array} \right.\\ \newline\\ L^{\prime\prime}(\lambda,\theta,u,v) = \left\{ \begin{array}{ll} 1 - \mid x(\lambda,L_{2},u,v)\mid, & \text{if }\mid x(\lambda,L_{2},u,v)\mid < 0.5; \\ \mid x(\lambda,L_{2},u,v)\mid, & \text{otherwise}. \end{array} \right. \end{array} $$
(5)

where ∣.∣ represents the magnitude value of a complex number. x(λ, L1, u, v) and x(λ, L2, u, v) are the value of the DTCWT coefficient, in the low frequency sub-bands L1 and L2 at level λ, respectively. Note that the values of the magnitudes of the low frequency sub-bands are all normalized into the range of [0,1] before computing the local brightness masking.

3.1.3 Texture masking

It is well known that human eye is less sensitive to alterations in highly textured regions than in smooth and homogeneous areas. This fact can allow to hide or mask other patterns such as watermarks into the textured areas in an imperceptible manner and this is referred to as texture masking. In this work, a Noise Visibility Function (NVF) [38] that characterizes the local image properties is used to model the textured regions. The NVF used is based on stationary Generalized Gaussian model because according to Kwitt et al. [21] the DTCWT coefficients can be well modeled using a GGD. The perceptual weight describing sensitivity to changes in textured areas is given for each DTCWT coefficient x(λ, 𝜃, u, v) as [38]

$$\begin{array}{@{}rcl@{}} TM(x(\lambda,\theta,u,v)) &=& 1-NVF(x(\lambda,\theta,u,v)) \\ &=& 1-\frac{\omega(x(\lambda,\theta,u,v)}{\omega(x(\lambda,\theta,u,v))+\sigma^{2}(\lambda,\theta)} \end{array} $$
(6)

where σ2(λ, 𝜃) is the variance of the sub-band 𝜃 at level λ. and,

$$ \omega(x(\lambda,\theta,u,v)) = \frac{\gamma[\eta(\gamma)]^{\gamma}}{\|r(x(\lambda,\theta,u,v))\|^{2-\gamma}} $$
(7)

where

$$ r(x(\lambda,\theta,u,v)) = \frac{x(\lambda,\theta,u,v) - \overline{x}(\lambda,\theta,i,j)}{\sigma(\lambda,\theta)} $$
(8)

where \(\overline {x}(\lambda ,\theta ,i,j)\) represents a local mean, computed using a local window of size L, centered at (u, v). It is given by: \(\overline {x}(\lambda ,\theta ,u,v) = 1/(2L + 1)^{2} {\Sigma }_{m = -L}^{L}{\Sigma }_{n = -L}^{L} a(i+m,j+n)\). ∥.∥ denotes the matrix norm. and, \(\eta (\gamma ) = \sqrt {\frac {\Gamma (3/\gamma )}{\Gamma (1/\gamma )}}\), with Γ(.) represents the gamma function. The parameter γ represents the shape parameter that characterizes the GGD of each sub-band. This parameter can be estimated as described in [11]. However, in this work, we propose to use fixed values for the parameter γ because as demonstrated in Section 4, this helps to reduce the computation complexity while enhancing the detection performance of the system.

3.2 Watermark embedding and detection

The watermark to be embedded W is a 2-D array with values in {− 1,+ 1} generated by using a pseudo-random sequence generator (PRSG) where the seed represents the secret key k. However, as pointed out in [5, 21, 29], this bipolar watermark cannot be inserted directly into the DTCWT coefficients because, due to the redundancy of the DTCWT, some components of the watermark that lie in the null space of the inverse DTCWT may be lost during the reconstruction process. To overcome this issue, it has been proposed to embed the DTCWT coefficients of the watermark into the host data. In this work, a one-level DTCWT is applied to the watermark W to obtain a low frequency sub-band w(1) and six detail sub-bands w(1, 𝜃), as depicted in Fig. 3. The coefficients of the six high frequency sub-bands constitute the watermark to be embedded in the coefficients at the second level DTCWT coefficients of the host image. In our implementation, the watermark is inserted into the real and imaginary parts of the high frequency DTCWT coefficients of the sub-bands with 𝜃 = ± 45 via an additive rule as follows

$$\begin{array}{@{}rcl@{}} y(\lambda,\theta,u,v) &= &Re(x(\lambda,\theta,u,v)) + \delta Re(vm(\lambda,\theta,u,v))[Re(w^{\prime}(1, \theta))/max(Re(w^{\prime}(1, \theta)))] \\ &&+ \text{j}[ Im(x(\lambda,\theta,u,v)) + \delta Im(vm(\lambda,\theta,u,v))[Im(w^{\prime}(1, \theta))/max(Im(w^{\prime}(1, \theta)))]] \end{array} $$
(9)

with

$$ \begin{array}{ll} w^{\prime}(1,\theta) = \left\{ \begin{array}{ll} w(1, \theta), & \text{if }abs(w(1, \theta)) < \sigma(w(1,\theta)); \\ 0, & \text{otherwise}. \end{array} \right. \end{array} $$

where y represents the set of the watermarked coefficients and δ is a scalar used to control the watermark strength. abs represents the absolute value and σ(.) is the standard deviation.

Fig. 3
figure 3

1-level DTCWT decomoposition applied to the Watermark W[9]

The role of a watermark detector is to verify whether the input image contains the candidate watermark. Watermark detection can be viewed as a problem of detecting a known signal in a noisy environment, where the host coefficients represent the noisy channel and the watermark is the signal to be detected. In this paper, an efficient watermark detector is adopted. The structure of this detector relies on the Rao-test which is based on a binary hypothesis test. Two hypotheses are formulated to describe the presence or absence of the candidate watermark W in the data under test. The two hypotheses are: the null hypothesis H0 (the claimed watermark is not present) and the alternative one H1 (the host data carries the claimed watermark). Furthermore, the performance of the detector depends on the statistical modeling of the host data. As pointed out by Kwitt et al. [21], a good statistical approximation of DTCWT coefficients can be obtained by adaptively varying two parameters of the GGD, which is given by

$$ f_{X}(x;\alpha,\beta)=\frac{\beta}{2\alpha {\Gamma}(1/\beta)}\exp \left( -\left( \frac{|x|}{\alpha}\right)^{\beta}\right) $$
(10)

where Γ(.) is the Gamma function, \({\Gamma }(z)={\int }_{0}^{\infty }e^{-t}t^{z-1}dt\), z > 0. The parameter α is referred to as the scale parameter and it models the width of the pdf peak (standard deviation) and β is called the shape parameter and it is inversely proportional to the decreasing rate of the peak.

In detection theory, Kay [17] has proven that the Rao-test has an asymptotically optimal performance similar to that of the generalized likelihood ratio test (GLRT). In other words, under the assumption that the noise probability density function (pdf) is symmetric, the performance of the Rao-based detector is equivalent to that of GLRT-based one that is designed with a priori knowledge of the noise parameters.

It is worth mentioning that the optimum detector proposed by Nikolaidis and Pitas in [33] for additive watermarking in the DWT and DCT domain cannot be used here because the watermark is not bipolar since it is transformed in the DTCWT domain. We adopt in this work a Rao-based watermarking detector that considers GGD for modeling DTCWT coefficients and takes into account the normal distribution of the DTCWT transformed watermark. The detector response is given by the following equation [21]

$$ \rho = \frac{\left[{\sum}_{i = 1}^{N}w^{\prime *}_{i}sign(y_{i})|y_{i}|^{\beta - 1} \right]^{2}}{\frac{1}{N}{\sum}_{i = 1}^{N}w_{i}^{\prime *2}({\sum}_{i = 1}^{N}|y_{i}|^{2\beta - 2})} $$
(11)

In our case, y represents the high frequency DTCWT coefficients and w′∗ is the set of the high frequency DTCWT coefficients of the candidate watermark W. Without loss of generality, y and w′∗ are assumed to be vectors of length N.

In the literature [18, 21, 22, 33], it is well established that the detection response ρ follows a Chi-square distribution with one degree of freedom (\({\chi _{1}^{2}}\)) under hypothesis H0, whereas under hypothesis H1, it follows a Non-Central Chi-square distribution with one degree of freedom and non-centrality parameter Λ (χ1,Λ2), as shown in Fig. 4. Based on these characteristics, the detection threshold TRao can be defined based on a desired \(P_{FA}^{*}\) as follows

$$ T_{Rao} = 2(erfc^{-1}(P_{FA}^{*}))^{2} $$
(12)

where erfc(.) is the complementary error function, given by \(erfc(x) = 2\pi .{\int }_{x}^{\infty }e^{t^{2}}dt\).

Fig. 4
figure 4

Exemplary histogram for the detection response ρ under H0 and H1

The probability of detection of a watermark (PDet = Prob(ρ > TRao|H1)) is defined for a given \(P_{FA}^{*}\) by [21]

$$ P_{Det} = \text{Q}(\text{Q}^{-1}(P_{FA}^{*}/2) - \sqrt{\Lambda}) - \text{Q}(\text{Q}^{-1}(P_{FA}^{*}/2) + \sqrt{\Lambda}) $$
(13)

where Q(.) is the q-function.

After inspecting the Rao detector structure, there is only one parameter (i.e., the shape parameter β) to be estimated directly from the watermarked coefficients. However, as mentioned in [22, 33], the detector presented by (11) is asymptotically optimal, which means that the host data needs to be adequately large.

4 Experimental results

In this section, intensive experiments have been conducted to evaluate the performance of the proposed watermarking system on a set of test images. In all experiments, standard grayscale and color images of size 512 × 512 have been used. In particular, results on six grayscale and two color images, as shown in Figs. 5 and 6, respectively, are reported. Note that these images are of different contents and cover a good range of the frequency content that natural images normally carry (i.e., textured, edged, and smooth images). For color images, the luminance plane has been selected to hold the watermark to ensure robustness against color manipulations. In our analysis, the following blind additive watermarking systems have been considered for reference

  • Cheng et al. [8]: In their work, a perceptual model constrained approach to information hiding in the DWT and the DCT domains is proposed. In this paper, the DWT-based and the DCT-based models are referred to as Cheng (DWT) and Cheng (DCT), respectively.

  • Kwitt et al. [21]: In their work, a watermark detection structure has been proposed in the DTCWT based on the Rao-test, where no perceptual model has been used. It is referred to as Kwitt (DTCWT) in this paper.

  • Asikuzzaman et al. [5]: Their work builds upon the idea published in [9] where a perceptual mask is used in the embedding phase and the detection relies on an inverse mask to decode the watermark. The correlation is then used to verify the presence of the candidate watermark. The main difference from [9] is that in [5] the chrominance plane is used to enhance watermark imperceptibility and the watermark is embedded in high frequency DTCWT coefficients. This system is referred to as Asikuzzaman (DTCWT).

Fig. 5
figure 5

Grayscale test images

Fig. 6
figure 6

Color test images

Three aspects are considered in our experiments: (i) the imperceptibility of the hidden watermark, (ii) the detection performance in absence of attacks, (iii) the robustness of the watermark against common signal processing attacks, and (iv) the computational complexity of the embedding and the detection processes.

4.1 Imperceptibility analysis

First, watermark invisibility is assessed. In Fig. 7, the original images are displayed along with their watermarked versions with a PSNR close to 45 dB. As can be seen, the images are visually indistinguishable, thus demonstrating the effectiveness of DTCWT watermarking and the perceptual masking scheme. In particular, this can be be appreciated from Fig. 7, where the absolute difference between the original images and the watermarked ones, magnified by a factor of 5, is shown. Obviously, the watermarking takes place mainly in high activity regions and around edges. This suggests that edged and textured images are more suitable for watermarking than smooth and low activity images. Next, an objective evaluation of watermark imperceptibility is performed using two well known measures: the Peak Signal-to-Noise Ratio (PSNR) and the Structural SIMilarity (SSIM) index [41]. Each test image has been watermarked using 2000 randomly generated watermarks and the average values of PSNR and SSIM are reported. To make the comparison as fair as possible, the watermark strength has been set to obtain approximately the same value of the Document-to-Watermark Ratio (DWR) for the competing techniques. Figures 8 and 9 show the obtained PSNR and SSIM of the test images with different values of DWR, respectively. As can be seen, the proposed system clearly outperforms the other systems in terms of imperceptibility on both grayscale and color images.

Fig. 7
figure 7

Imperceptibility evaluation for gray images: a Original images, b Watermarked images, and c Absolute difference between the original images and the watermarked ones, magnified by a factor of 5

Fig. 8
figure 8

PSNR values for different values of DWR

Fig. 9
figure 9

SSIM values for different values of DWR

4.2 Detection performance

In order to evaluate the performance of the watermark detection, the Receiver Operationg Characteristics (ROC) curves were used. These curves represent the variation of the PDet against the theoretical \(P_{FA}^{*}\). To obtain the ROC curves, the test images have been watermarked by 10000 randomly generated watermarks. For each tested system, the strength of the watermark is set to obtain a PSNR value of ≈ 60 dB for Baboon and ≈ 65dB for the other images.

First, experiments have been performed to evaluate the impact of the shape parameter β on the detection performance. To do so, the performance of the proposed system have been assessed with different values of β. These values are either fixed in the range {0.5,0.8,1,1.2} or estimated from the watermarked coefficients according to the Maximum-Likelihood Estimation (MLE) method described in [11]. It is worth mentioning that the idea of using fixed values for the shape parameter β has first been proposed by Hernandez et al. [13] in the DCT and DWT domains. Figure 10 shows the obtained ROC curves. Interestingly, the best detection performance is obtained with fixed parameter settings (i.e, β = 1.2 for Pepper and Baboon and β = 1 for the other images). In fact, the performance obtained when MLE was used to estimate β has been lower for all test images. As a result, it would be sensible to use a fixed value of β since this yields better detection performance than that obtained with MLE. Furthermore, this significantly saves the computational cost involved at the detection stage for estimating β with MLE.

Fig. 10
figure 10

ROC curves for the proposed technique, obtained for different values of the shape parameter β

The second set of experiments have been conducted to evaluate the detection performance of the proposed system in the absence of attacks with a comparison to the aforementioned systems. As can be seen from Fig. 11, the proposed system outperforms the competing techniques for almost all test images. The worst results were obtained for the system of Asikuzzaman et al. [5]. Such a poor performance was expected since this system is a correlation-based detector and hence is optimal only when the host data follows a Gaussian distribution.

Fig. 11
figure 11

Detection performance in the absence of attacks

To validate the superiority of the proposed system over its competitors, we have calculated the Equal Error Rate (EER) and the obtained results are depicted in Table 2. As can be noted, the proposed system appears more powerful than competing techniques.

Table 2 Values of the equal error rate (EER)

4.3 Robustness analysis

The robustness of the proposed scheme against some image processing techniques and geometric attacks is assessed. To this end, a set of 10000 randomly generated watermarks are embedded into each test image, then each attack with a fixed strength value is applied to the watermarked images. In all tests, the value of the strength is set to obtain a PSNR around 55 dB. In this section, only the results obtained on Lena (grayscale image) and Barbara (color image) are reported because similar findings have been reached on the remaining test images.

4.3.1 Robustness against image processing

We have evaluated the robustness of the proposed system against JPEG and JPEG-2000 compression schemes, mean filtering and additive white Gaussian noise (AWGN). The watermarked images have been altered by applying attacks with a fixed strength as follows: JPEG compression with quality factor of 30, JPEG-2000 compression with a ratio of 16, average filtering with a filter size of 5 × 5, and AWGN with SNR= 15 dB. The obtained results are depicted in Figs. 12 and 13. These results clearly show that the proposed system provides superior performance especially in the presence of JPEG compression scheme and mean filtering.

Fig. 12
figure 12

ROC curves for the gray image of Lena after: a JPEG compression with a quality factor = 30, b JPEG 2000 compression with a compression ratio = 16, c mean filtering witha filter size 5 × 5 and d AWGN with SNR = 15 dB

Fig. 13
figure 13

ROC curves for the color image of Barbara after: a JPEG compression with a quality factor = 30, b JPEG 2000 compression with a compression ratio = 16, c mean filtering witha filter size 5 × 5 and d AWGN with SNR = 15 dB

4.3.2 Robustness against geometric attacks

In this paper, two geometric attacks have been considered, namely: image cropping and translation. In our experiments, the cropping is implemented as shown in Fig. 14 while a horizontal shifting is applied to watermarked images, as shown in Fig. 15. The obtained results are given in Figs. 16 and 17, respectively. As can be seen, the proposed system provides more robustness against image cropping and translation.

Fig. 14
figure 14

Cropped images: a Lena (gray image), b Berbara (color image). The size of the cropped image is 300 × 300

Fig. 15
figure 15

Shifted images: a Lena (gray image), b Barbara (color image). Images are horizontally translated by 5 pixels

Fig. 16
figure 16

ROC curves after applying cropping for images: a Lena (Gray image), and b Barbara (Color image)

Fig. 17
figure 17

ROC curves for applying translation for images: a Lena (Gray image), and b Barbara (Color image)

4.4 Computational complexity

In this subsection, a set of experiments have been conducted in order to analyze the computational complexity of the proposed system as well as its competitors. For each system, the run time has been recorded during the embedding and detection stages on 6 gray level test images of size 512 × 512, in which a watermark has been embedded and detected 1000 times. All the source codes were implemented in MATLAB and run on a platform of an Intel Core(TM) i5-3230M CPU at 2.60 GHz with 4 GB of memory. The average CPU time is listed in Table 3 for each technique. It can be seen that the proposed system takes more time to watermark an image than other competing techniques do. This is mainly attributed to the significant computations required for estimating the perceptual mask in addition to the use of complex numbers in the DTCWT structure. It is, however, worth mentioning that the main computational component in our proposed system is the perceptual masking process which involves the estimation of the local brightness mask as well as the texture mask, separately. Therefore, one can explore some parallelism to conduct these two processes simultaneously since they do not depend on each other. Moreover, because the texture masking process is repeated for real and imaginary subbands independently (see (3)), this can also be executed in parallel to speed up the process. On the other hand, it can be seen that the watermark detection process with our system is significantly faster and constitutes the most efficient one along with the system proposed in [21]. In this context, it is worth noting that the embedding process is not as important as the detection one since it can be performed offline. The watermark detection stage, however, is crucial as it requires a decision on the presence of the watermark.

Table 3 Average CPU time (in seconds) for watermarking techniques

5 Conclusions

This paper proposes a blind additive image watermarking scheme in the DTCWT domain. In order to enhance imperceptibility, a new visual masking model exploiting the HVS characteristics has been used. The structure of the watermark detector is an adapted version of the Rao-test based detector. The host data in which the watermark is embedded (i.e. the high frequency DTCWT coefficients) is modeled by the generalized Gaussian distribution. Experimental results have shown that the proposed visual masking enhances significantly the performance of the system in terms of imperceptibility, detection accuracy and robustness to common attacks when compared with recent state-of-the-art techniques. Furthermore, we have found that the MLE of the GGD shape parameter does not provide good detection performance in most cases and a fixed shape parameter can offer better results. In future, it would be sensible to extend this work and use HVS-based masking models for multibit watermarking in the DTCWT domain. This would serve other practical applications of watermarking such as covert communication and source tracking. The optimization of the watermark embedding process would also be of interest to the authors.