1 Introduction

Digital image is an essential factor to express and communicate information. Digital imaging has been applied in many fields, but digital image quality is inevitably reduced and affected during image collection, compression [13], transmission [4], processing [5], and reconstruction [6, 7]. The accurate assessment of image quality has also become challenging [8]. As such, image quality assessment (IQA) has been extensively investigated [911].

IQA can be divided into full-reference (FR), reduced-reference (RR), and no-reference (NR) assessments [12] based on the presence of reference images. The FR IQA methods are based on “the original image”, which is taken as the reference image. It is mainly used in assessing the similarity and fidelity between distorted image and original undistorted image [13, 14]. The RR IQA methods are considered practical when we can only get access to some extracted features instead of the whole original image [15]. We can use these provided features and give a reasonable estimation on the distorted image’s quality [16]. In some practical applications, the reference image is not available to perform a comparison against. Therefore, the NR IQA methods are needed [17]. This study focuses on FR IQA methods.

MSE and PSNR are widely used FR IQA methods. In these methods, image quality is assessed by calculating the overall pixel error, and average error is used as the final assessment result. These methods provide several advantages, such as simple calculation and easy implementation. But since the modeling is too simple, the comprehending of the image is overly superficial. The absolute error between pixels of two images is calculated, but the correlation between pixels and the perceptive characteristics of human visual system (HVS) are disregarded. Their low-level features, such as edge information, are also yet to be described. Thus, it causes serious incongruency, which is against the perceptive characteristics of HVS and is likely the cause of unrealistic conditions between assessed results and actual phenomena during quality assessment [18, 19].

Many representative assessment methods have been proposed to adapt to human visual characteristics. Wang et al. [12] established a Structural SIMilarity (SSIM) model, which is considered the most common representative based on universal image quality index (UQI) [20]. The structural information of images is applied to assess quality and SSIM index. Experiments show that SSIM is appropriate than previous assessment methods. Although SSIM improves the congruency between assessment results and HVS perception, the structural features of images remain scalar and consequently causes SSIM to lose its validity when images are highly blurred. Numerous methods, such as MS-SSIM [21], ESSIM [22], GSSIM [23], 3-SSIM [24], CW-SSIM [25], and IW-SSIM [26], have been improved on the basis of SSIM, and these methods enhance the assessment result to a certain level. Sheikh et al. [27, 28] also developed methods, such as IFC and VIF, based on natural scene statistics (NSS) to introduce the concept of information fidelity. Zhang et al. [29] proposed a Feature SIMilarity (FSIM) method that introduces phase congruency (PC) and gradient magnitude (GM) similarity as assessment features.

With in-depth research, natural images as a two-dimensional signal characterized by highly structured features must have a vector trait. The pixels of images show a strong dependency, which constitutes the structure of two-dimensional image. The main function of HVS is to obtain structural information from the field of view. Zhang et al. [30] constructed similarity matrices by using the characteristic map of first- and second-order Riesz transforms and utilized edge features as pooling function to derive the RFSIM index because of the good performance of Riesz transform in multidimensional signal processing. Luo et al. [31] introduced monogenic phase congruency (MPC) based on PC and proposed the RMFSIM method. With these methods, the structural method can be used to assess the vector characteristics of two-dimensional images more efficiently. However, these methods simply apply the Riesz transform to construct local features that partially consider the physical meaning of monogenic signal (MS) theory. Moreover, these assessment factors describe high-frequency information, such as edge features. The complexity of HVS has not yet to be fully presented. Hence, there is still much room for improvement.

In this study, a FR IQA method called Riesz transform and Visual contrast sensitivity-based feature SIMilarity index (RVSIM) is proposed by combining Riesz transform with visual contrast sensitivity. To the best of our knowledge, the Log-Gabor filter and the contrast sensitivity function (CSF) are all well-known theories. However, we are the first to combine the frequency characteristic of Log-Gabor filter and frequency-sensitive features of HVS, so that the objective and subjective evaluation results are consistent as much as possible. In addition, although Riesz transform in multidimensional signal processing performs well, the first-order Riesz transform cannot clearly express the corners and intersection points in the image. The proposed RVSIM method introduces the GM similarity thus improves the assessment of performance. In general, RVSIM takes full advantage of the MS theory [32] and Log-Gabor filter [33] by exploiting visual CSF [34] to allocate the weights of different frequency bands. The similarity matrix is obtained by introducing GM, and the MPC map is utilized as a pooling function to derive the final IQA score. Two groups of simulated experiments were carried out with two kinds of databases. The one kind is the LIVE, CSIQ, TID2008, and TID2013 databases, which mainly assess performance through calculating the absolute indicators of the method. The other kind is the Waterloo Exploration database, which mainly assesses through calculating the competitive ranking among methods. The experimental results demonstrate that the proposed RVSIM method is a robust IQA method.

Notably, RVSIM is different from RFSIM [30] and RMFSIM [31] in four aspects. First, RVSIM employs Log-Gabor band-pass filters on the reference and distorted images to obtain the components of images in different frequency bands. Second, RVSIM does not directly use the Riesz transform to determine the feature matrix. Instead, RVSIM utilizes the analytic space obtained by Riesz transform, including local amplitude, phase, and direction, which constitute a complete orthogonal basis [35], and subsequently calculates local feature similarities. Third, RVSIM applies the characteristics of HVS to assign different weights to various frequency bands. In this manner, the RVSIM model has appropriate congruency with the perceptive characteristics of the HVS. Fourth, RVSIM introduces the GM similarity and demonstrates that the first-order Riesz transform cannot clearly express the corners and intersection points in images.

The remaining parts of this paper are organized as follows: Section 2 presents the MS theory, Log-Gabor filter, MPC, and visual contrast sensitivity. For the specific application of these theories in this study, we give a detailed design ideas and calculation process. Section 3 introduces the structure of the new IQA method proposed in this study and also describes the combination of MS, CSF, GM, and MPC to derive the RVSIM index. Section 4 presents the experimental results. Section 5 draws the conclusion.

2 Related works

2.1 Riesz transform

In one-dimensional signal processing, the Hilbert transform has been proven to be effective. However, after its expansion to the two-dimensional image, various attempts using the Hilbert transform, including the local Hilbert transform, the overall Hilbert transform, and the local and global Hilbert transform [36], have all failed because they all have a common flaw: they are not isotropic [37]. Riesz transform can convert the Hilbert transform into a high-dimensional Euclidean space, which is suitable for image processing applications [38, 39].

Figure 1 shows that the Riesz transform space is a spherical coordinate system in a 3D Euclidean space. R,R1, and R2 are the projections of the points in the spherical coordinate system on the three axes [40]. In this spatial domain, the local amplitude A, the local direction θ, and the local phase φ can be expressed as:

$$ \begin{aligned} \left\{\begin{array}{lll} A_{R}(x,y) &= \sqrt{R(x,y)^{2}+R_{1}(x,y)^{2}+R_{2}(x,y)^{2}} \\[0.2cm] \theta_{R}(x,y) &= \tan^{-1}{(-R_{2} (x,y)/R_{1} (x,y))} \\[0.2cm] \varphi_{R} (x,y) &= \tan^{-1} {(R_{12}(x,y)/R(x,y))} \end{array}\right. \end{aligned} $$
(1)
Fig. 1
figure 1

The Riesz transform space

where \(R_{12}(x, y) = \sqrt {R_{1}(x, y)^{2} + R_{2}(x, y)^{2}}, \theta _{R}(x, y) \in [0, \pi), \varphi _{R}(x,y) \in [0, \pi)\).

2.2 Log-Gabor filter

Given that the length of the image signal is limited, the image signal is usually band-pass filtered before the Riesz transform, usually using the Log-Gabor filter [41]. In practical applications, multiple Log-Gabor filters should be used to build a complete filter bank in the radial and horizontal directions because of the bandwidth limitation of a single Log-Gabor filter [42]. The optimum filter bank for a specific application can be established on the basis of previously described methods [43, 44]. In this study, the number of scales n r =5, the number of orientations n θ =1, and the splicing parameters are discussed in detail in Section 4.1.

Section 2.4 shows that the center frequencies ω0i (i=1,…,5) of the filter bank are \(\omega _{01}=\frac {1}{3}, \omega _{02}=\frac {1}{3^{2.1}}, \omega _{03}=\frac {1}{3^{2.1 \times 2.1}}, \omega _{04}=\frac {1}{3^{2.1 \times 2.1}}\), and \(\omega _{05}=\frac {1}{3^{2.1 \times 2.1 \times 2.1 \times 2.1}}\). The bands of the Log-Gabor filter bank are [0.4786,0.2026],[0.2611,0.0965],[0.1243,0.0460],[0.0591,0.0221], and [0.0282,0.0105]. Using this filter bank, the image R is filtered to complete the five-scale decomposition of the image, and the decomposed images Rbi (i=1,…,5) are obtained. The MS of the reference image \(\left [R^{bi}, R_{1}^{bi}, R_{2}^{bi}\right ]~(i=1,\ldots,5)\) are obtained using Rbi (i=1,…,5) for the Riesz transform. Thus, Eq. (1) becomes:

$$ \begin{aligned} \left\{\begin{array}{lll} A_{R}^{bi}(x,y) &= \sqrt{R^{bi}(x,y)^{2}+R_{1}^{bi}(x,y)^{2}+R_{2}^{bi}(x,y)^{2}} \\[0.2cm] \theta_{R}^{bi}(x,y) &= \tan^{-1} {\left(-R_{2}^{bi}(x,y)/R_{1}^{bi}(x,y)\right)} \\[0.2cm] \varphi_{R}^{bi}(x,y) &= \tan^{-1} {\left(R_{12}^{bi}(x,y)/R^{bi}(x,y)\right)} \end{array}\right. \end{aligned} $$
(2)

where \(R_{12}^{bi}(x, y) = \sqrt {R_{1}^{bi}(x, y)^{2} + R_{2}^{bi}(x, y)^{2}}, \theta _{R}^{bi}(x, y) \in [0, \pi), \varphi _{R}^{bi}(x, y) \in [0, \pi), i=1,\ldots,5\). Similarly, the MS of the distorted image is \(\left [D^{bi}, D_{1}^{bi}, D_{2}^{bi}\right ]~(i=1,\ldots,5)\) and the corresponding local amplitude \(A_{D}^{bi}\), the local direction \(\theta _{D}^{bi}\), and the local phase \(\varphi _{D}^{bi}, i=1,\ldots,5\).

In this study, the Log-Gabor filter bank is shown in Fig. 2. The center frequencies ω0i (i=1,…,5) from Fig. 2ae are \(\omega _{01}=\frac {1}{3}, \omega _{02}=\frac {1}{3^{2.1}}, \omega _{03}=\frac {1}{3^{2.1 \times 2.1}}, \omega _{04}=\frac {1}{3^{2.1 \times 2.1 \times 2.1}}\), and \(\omega _{05}=\frac {1}{3^{2.1 \times 2.1 \times 2.1 \times 2.1}}\). Using this Log-Gabor filter bank, two sample images (which are monarch and sailing2 in the LIVE database [45]) are filtered to obtain the different components of the corresponding five bands. Notably, the sample images are grayed before filtering.

Fig. 2
figure 2

Two examples and their Log-Gabor filter banks. ae Forms of filter bank. Their center frequencies are \(\omega _{01}=\frac {1}{3}, \omega _{02}=\frac {1}{3^{2.1}}, \omega _{03}=\frac {1}{3^{2.1 \times 2.1}}, \omega _{04}=\frac {1}{3^{2.1 \times 2.1 \times 2.1}}\), and \(\omega _{05}=\frac {1}{3^{2.1 \times 2.1 \times 2.1 \times 2.1}}\) respectively. fk The original image monarch and the different components of the corresponding five bands. lq The original image sailing2 and the different components of the corresponding five bands

Figure 2 also shows that the Log-Gabor filter whose ω0 is set as \(\frac {1}{3}\) reflects the high-frequency components of the image, mainly representing the most detailed information of the original image. The Log-Gabor filter, whose ω0 is set as \(\frac {1}{3^{2.1}}\), reflects the sub-high frequency components of the image. The Log-Gabor filter whose ω0 is set as \(\frac {1}{3^{2.1 \times 2.1 \times 2.1}}\) contains a large number of low-frequency components, which mainly reflect the contour information of the original image. The detailed information describes the small-scale parts of the image such as texture, and the remaining large-scale information expresses the basic structure and the trend of the image.

2.3 Monogenic phase congruency

The traditional PC model [46] utilizes the phase information of the image and is widely used to detect the edges, key feature points, and symmetry of the image. However, noise interference, frequency spread, and other problems will occur [47, 48]. The MPC model developed based on the MS theory and PC can better express the local phase information of the image and improve computational efficiency and local feature accuracy [31].

According to Eq. (2), the sum of the local energy is:

$$ E^{'}(x, y) = \sqrt{R^{b}(x, y)^{2} + R_{1}^{b}(x, y)^{2}+R_{2}^{b}(x, y)^{2}} $$
(3)

where \(R^{b}(x, y) = \sum _{i=1}^{5}R^{bi}(x, y), R_{1}^{b}(x, y) = \sum _{i=1}^{5}R_{1}^{bi}(x, y)\), and \(R_{2}^{b} (x, y) = \sum _{i=1}^{5}R_{2}^{bi}(x, y)\).

The sum of the local amplitudes is:

$$ A^{'} (x, y) = \sum_{i=1}^{5}A^{bi}(x, y) $$
(4)

The MPC model is expressed as:

$$ \begin{aligned} M&PC(x, y)= \\ & W(x, y)\left \lfloor 1-\xi \times acos \begin{pmatrix} \frac{E^{'}(x, y)}{A^{'}(x, y)} \end{pmatrix} \right \rfloor\frac{\left \lfloor E^{'}(x, y) - T \right \rfloor}{A^{'}(x, y)+\varepsilon} \end{aligned} $$
(5)

where ⌊ ⌋ indicates that the difference between the functions is not permitted to become negative. ξ is the gain coefficient, which is generally given as 1≤ξ≤2. T is the noise compensation factor. ε is a small positive constant, which is set as ε=0.0001. W(x,y) is the weight function that applies a filter response extended value to S-type growth curve [49].

$$ W(x,y)=\frac{1}{1+\exp(g(c-s(x,y)))} $$
(6)

where c is the cutoff value of the filter response spread, below which the PC values become penalized, g is the gain factor that controls the sharpness of the cutoff, and s(x,y) is the spread function [31]. Here, we set g=1.8182 and c=1/3.

Figure 3 shows the three-dimensional surface of W(x,y) used to derive the weight function more intuitively. Two sample images (Fig. 3a, d, which is the same as Fig. 2f, l) in the LIVE database [45] are taken as examples. Figure 3b, e shows the three-dimensional surface of W(x,y). Figure 3c, f shows the three-dimensional rotate surface of W(x,y).

Fig. 3
figure 3

Two sample images used for the weight function. These images are extracted from the LIVE database. a, d Reference image. b, e Three-dimensional surface of the weight function. c, f Rotate maps of the three-dimensional surface

Figure 3 shows that the weight function accurately highlights the local characteristics in the sample image, indicating that the MPC can express the local phase information of the image.

2.4 Visual contrast sensitivity

Physiological and psychological research have revealed that HVS has many characteristics such as visual sensitivity band-pass effect, visual nonlinearity effect, visual multichannel, and masking effect [50]. Among them, the CSF characterizes the HVS sensitivity band-pass effect, which reflects the difference in the sensitivity of HVS to different spatial frequencies. Given that CSF can be combined with subjective visual experience, it has been applied to many IQA methods [51, 52]. This study uses the CSF model proposed by Mannos et al. [34]:

$$ A(f_{r}) \approx 2.6(0.0192+0.114f_{r})\exp{\left(-(0.114f_{r})^{1.1}\right)} $$
(7)

where f r is the spatial frequency. The normalized CSF characteristic curve is obtained as shown in Fig. 4.

Fig. 4
figure 4

The visual CSF characteristic curve. The CSF curve is divided into five segments, which correspond to red, orange, green, cyan, and blue colors

To facilitate the calculation and adapt to CSF, the center frequencies ω0i (i=1,…,5) of the Log-Gabor filter bank are set as \(\omega _{01} = \frac {1}{3}, \omega _{02} = \frac {1}{3^{2.1}}, \omega _{03} = \frac {1}{3^{2.1 \times 2.1}}, \omega _{04} = \frac {1}{3^{2.1 \times 2.1 \times 2.1}}\), and \(\omega _{05} = \frac {1}{3^{2.1 \times 2.1 \times 2.1 \times 2.1}}\). The CSF curve is divided into five segments. The half-power point filter is set as the bandwidth limit. Then, the five bands of the Log-Gabor filter bank are [0.4786,0.2026],[0.2611,0.0965],[0.1243,0.0460],[0.0591,0.0221], and [0.0282,0.0105], which are correspondent to red, orange, green, cyan, and blue colors, respectively, in Fig. 4 (the overlap between the bands in the figure is not reflected). The maximum value of each band is set as the weight of the corresponding similarity matrix, and w1=0.3370,w2=0.8962,w3=0.9809,w4=0.9753, and w5=0.7411.

3 Proposed RVSIM method

3.1 The proposed framework

The framework of the proposed RVSIM method in this study is shown in Fig. 5. The reference image R and the distorted image D are filtered by a five-band Log-Gabor band-pass filter to obtain the components Rbi and Dbi (i=1,…,5) in five different frequency bands. \(\left [R^{bi}, R_{1}^{bi}, R_{2}^{bi}\right ]\) and \(\left [D^{bi}, D_{1}^{bi}, D_{2}^{bi}\right ]~(i=1,\ldots,5)\) are obtained by applying Riesz transform to the decomposed image. Five MS similarity functions \(\left (S_{A}^{bi}, S_{\varphi }^{bi}, S_{\theta }^{bi}\right)~(i=1,\ldots,5)\) are obtained using the five similarity functions of the local features (including local amplitude A, local phase φ, and local direction θ). Then, the similarity matrix S Mi (i=1,…,5) is derived. The weights w i (i=1,…,5) of the five similarity matrices are set using the CSF to obtain a single similarity matrix S M . The GM similarity matrix S G of R and D is calculated. Then, S M and S G are combined to obtain the local feature similarity S L of R and D. At the same time, the MPC calculation is performed using the MS obtained by the reference image R to obtain the pooling function. Finally, the local feature similarity map S L is convoluted by the pooling function MPC to obtain the proposed similarity index.

Fig. 5
figure 5

Illustration of the proposed RVSIM method

3.2 RVSIM index

As described previously, the reference image R and the distorted image D are subjected to a Log-Gabor filter bank and a first-order Riesz transform to obtain five MSs to calculate the characteristic indices in the Riesz transform space, including the amplitude A, phase φ, and direction θ. Then, the MS similarity of R and D at the pixel (x,y) is derived as:

$$ \begin{aligned} \left\{\begin{array}{lll} S_{A}^{bi}(x,y)&= \frac{2A_{R}^{bi}A_{D}^{bi}+C_{1}}{\left(A_{R}^{bi}\right)^{2}+\left(A_{D}^{bi}\right)^{2}+C_{1}} \\[0.2cm] S_{\theta}^{bi} (x,y)&= \exp\left(-\left| tan\left(\theta_{R}^{bi}-\theta_{D}^{bi}\right)\right|\right)\\[0.2cm] &= \exp\left(-\left| \frac{R_{1}^{bi} D_{2}^{bi}-R_{2}^{bi} D_{1}^{bi}}{R_{1}^{bi} D_{1}^{bi}+R_{2}^{bi} D_{2}^{bi}} \right|\right) \\[0.2cm] S_{\varphi}^{bi} (x,y) &= \exp\left(-\left| tan\left(\varphi_{R}^{bi}-\varphi_{D}^{bi}\right)\right|\right)\\[0.2cm] &= \exp\left(-\left| \frac{R^{bi} D_{12}^{bi}-R_{12}^{bi} D^{bi}}{R^{bi} D^{bi}+R_{12}^{bi} D_{12}^{bi}} \right|\right) \end{array}\right. \end{aligned} $$
(8)

where i=1,…,5, and C1 is a relatively small positive number.

The construction parameter S Mi is taken as the MS similarity matrix:

$$ S_{Mi}=S_{A}^{bi}\cdot S_{\theta}^{bi}\cdot S_{\varphi}^{bi} $$
(9)

where i=1,…,5.

The weights of five MS similarity matrices are set as w i (i=1,…,5) using the CSF curve. The weighted sum is calculated to obtain the MS similarity matrix S M :

$$ S_{M} = \sum_{i=1}^{5}w_{i} S_{Mi} $$
(10)

Similar to previous studies [29, 53], the GM similarity is defined as:

$$ S_{G}(x,y)=\frac{2G_{R}(x,y) G_{D}(x,y)+C_{2}}{(G_{R}(x,y))^{2}+(G_{D}(x,y))^{2}+C_{3}} $$
(11)

where G R (x,y) and G D (x,y) are GM R and D at the pixel (x,y), respectively. C2 and C3 are relatively small positive numbers.

The value range of S G (x,y) is (0,1]. The smaller the value is, the more severe the GM distortion. When S G (x,y)=1, R and D are not distorted at the GM of the pixel. C3 can prevent Eq. (11) from singularity. C2 and C3 play important roles in adjusting the contrast response at the low gradient region.

Then, S M and S G are combined to derive the similarity S L of R and D. S L is defined as:

$$ S_{L} = \left[S_{M} \right]^{\alpha} \cdot \left[S_{G}\right]^{\beta} $$
(12)

where α and β are parameters used to adjust the relative importance of MS and GM features. In this study, α=β=1 is set for simplicity.

$$ S_{L} = S_{M} \cdot S_{G} $$
(13)

Finally, the MS PC assessment factor MPC is used as the pooling function to obtain the RVSIM index:

$$ RVSIM=\frac{\sum_{(x,y) \in \Omega}S_{L}(x,y) \cdot MPC(x,y)}{\sum_{(x,y) \in \Omega}MPC(x,y)} $$
(14)

where Ω means the whole image spatial domain.

4 Experimental results and discussion

This study runs the RVSIM index on five image databases, namely, LIVE [45], CSIQ [54], TID2008 [55], TID2013 [56], and Waterloo Exploration database [57], to verify the performance of the proposed method. The five image databases are used here for algorithm validation and comparison. The characteristics of these five databases are summarized in Table 1.

Table 1 Comparison of five IQA databases

For the LIVE, CSIQ, TID2008, and TID2013 databases, the five-parameter nonlinear logistic regression function in Eq. (15) is used to fit the data [58]. Moreover, four corresponding indicators, such as Spearman rank-order correlation coefficient (SROCC), Kendall rank-order correlation coefficient (KROCC), Pearson linear correlation coefficient (PLCC), and root mean square error (RMSE), are used to compare the performance of the index objectively [59].

$$ f(z) = {{\beta_{1}}}{{\left[{\frac{1}{2}-\frac{1}{{1 + \exp({\beta_{2}}(z-{\beta_{3}}))}}}\right]}}+{{\beta_{4}}}z+{{\beta_{5}}} $$
(15)

where z is the objective IQA index, f(z) is the IQA regression index, and β i (i=1,…,5) are the regressing function parameters.

For the Waterloo Exploration database, the group MAximum Differentiation (gMAD) competition, which provides the strongest test to let the IQA models compete with each other [60], is carried out. The gMAD competition can automatically select a subset of image pairs from the database, which provides the competition ranking and reveals the relative performance of the IQA models.

4.1 Determination of parameters

4.1.1 Determination of the constants C1, C2, and C3

Orthogonal experiments were conducted on the LIVE database using the assessment index SROCC to determine the optimal values of constants C1,C2, and C3. Two rounds of orthogonal experiments were conducted to achieve a balance between the complexity of the experiment and the determination of the parameters. Similar to the SSIM model [12], [C1,C2,C3]=[(K1L)2,(K2L)2,[(K3L)2]. L is the dynamic range of the pixel values. For 8-bit grayscale image, the value is L=28−1=255.

Fig. 6
figure 6

Determine the optimal values of K1,K2, and K3. a K2=1.0 and K3=1.0, b K1=1.0 and K3=1.0, c K1=1.0 and K2=1.2, d K2=1.2 and K3=1.0, e K1=1.09 and K3=1.0, and f K1=1.09 and K2=1.16

  1. 1.

    First round: In the first step, K2=1.0 and K3=1.0 were set. The RVSIM index is applied to the LIVE database when K1 has different values. The K1SROCC curve is obtained. As shown in Fig. 6a, SROCC can achieve its maximum value when K1=1.0. The second step is to set K1=1.0 and K3=1.0 when K2 has different values. The RVSIM index is applied to the LIVE database to obtain the K2SROCC curve. As shown in Fig. 6b, SROCC can achieve its maximum value when K2=1.2. In the third step, K1=1.0 and K2=1.2 when K3 has different values. The RVSIM index is applied to the LIVE database, and the K3SROCC curve is obtained. As shown in Fig. 6c, the maximum value of SROCC is obtained when K3=1.0. At this point, the first round of experiments ends. The parameters are K1=1.0, K2=1.2, and K3=1.0.

  2. 2.

    Second round: Based on the parameters obtained in the first round of experiments, the first round of experiments is repeated to obtain the results shown in Fig. 6df. At the end of the second round of experiments, the finalized parameters are K1=1.09, K2=1.16, and K3=1.00.

4.1.2 Determination of the Log-Gabor filter bank

As described in Section 2.2, the finalized splicing parameters of the Log-Gabor filter bank are the number of scales n r =5 and the number of orientations n θ =1. Table 2 lists the SROCC/KROCC/PLCC/RMSE values obtained by applying the RVSIM index to the LIVE, CSIQ, TID2008, and TID2013 databases when different splicing parameters are taken to illustrate the rationality of the selection of these two parameters. The top performance is highlighted in bold. Table 2 shows that, when the number of scales n r =5 and the number of orientations n θ =1, the RVSIM index exhibits its best performance.

Table 2 SROCC/KROCC/PLCC/RMSE values comparison with different splicing parameters on four benchmark databases

4.2 Two sample examples

In order to determine whether the proposed RVSIM method agrees with human judgment, two sample images (Fig. 7a,g, which are the same as Fig. 2f,l) in the LIVE database [45] are taken as examples. Corresponding to these two ground truth images, we select five noise-distorted images and five blur-distorted images in different degrees from the LIVE database.

Fig. 7
figure 7

Two group of images and their corresponding subjective/objective scores. af The original image monarch and five noise-distorted images. gl The original image sailing2 and five blur-distorted images

As shown in Fig. 7, images seem to degrade with increasing blur or noise from left to right. The LIVE database provides the difference mean opinion score (DMOS) for each image. A small DMOS represents a high-quality image. We calculate the objective scores of these images using the RVSIM method. The results can be found in Fig. 7.

Figure 7 shows that RVSIM index is consistent with DMOS. This indicates that RVSIM method, in line with the subjective perception of HVS, can work well in indicating the image quality.

4.3 Performance comparison

Table 3 lists the performance of RVSIM and 11 other state-of-the-art IQA methods (including PSNR, SSIM [12], GSSIM [23], MS-SSIM [21], IW-SSIM [26], FSIM [29], RFSIM [30], VSI [61], SCQI [13], MDSI [62], and SRSIM [63]) on the LIVE, CSIQ, TID2008, and TID2013 databases. The top 3 performances of the indices are highlighted in bold. Apart from GSSIM, the MATLAB source codes of all of the other methods were obtained from the authors. Compared with traditional methods such as PSNR, SSIM, GSSIM, and MS-SSIM, RVSIM exhibits a good performance on the LIVE and CSIQ databases. As we only conduct the orthogonal experiments based on LIVE database, but do not carry out on TID2008 and TID2013 databases, RVSIM performs slightly worse than the best results on TID2008 and TID2013 databases.

Table 3 Performance comparison of IQA methods on four benchmark databases

Figure 8 shows the scatter distributions of the subjective DMOS versus the quality/distortion predicted scores by PSNR, SSIM, MS-SSIM, IW-SSIM, FSIM, SCQI, MDSI, RFSIM, and RVSIM indices on the LIVE database. Figure 8 shows that the scatter plot of RVSIM is evenly distributed throughout the coordinate system and has a strong linear relationship with DMOS, which indicates that the RVSIM model has a strong congruency with HVS.

Fig. 8
figure 8

Scatter plots of predicted image quality indices on the LIVE database. a PSNR, b SSIM, c MS-SSIM, d IW-SSIM, e FSIM, f SCQI, g MDSI, h RFSIM, and i RVSIM

The experiments on these four databases (LIVE, CSIQ, TID2008, and TID2013) are insufficient to illustrate the problem. This study conducted gMAD competition in the Waterloo Exploration database to test the performance of RVSIM objectively and fairly.

Figure 9 shows the competition ranking in the Waterloo Exploration database. In the gMAD competition experiment, the results of the ranking of the 16 state-of-the-art methods have been provided by the official framework [60]. The experimenter is only allowed to participate in the competition ranking on the basis of 16 algorithms that have been provided. The algorithm to be added in Fig. 9af is RVSIM, SRSIM, RFSIM, VSI, MDSI, and SCQI respectively. Notably, the overall performance of RVSIM ranked first. In particular, the RVSIM performs consistently well in terms of aggressiveness, validating that it is a robust IQA method.

Fig. 9
figure 9

gMAD competition. a RVSIM, b SRSIM, c RFSIM, d VSI, e MDSI, and f SCQI

4.4 Discussion

In Table 3, the top 6 methods are highlighted in bold, i.e., MDSI (16 times in bold), SCQI (12 times in bold), VSI (9 times in bold), SRSIM (4 times in bold), FSIM (3 times in bold), and RVSIM (3 times in bold). In Fig. 9, the top 6 methods of the gMAD competition are RVSIM, SRSIM, MS-SSIM, MDSI, and RFSIM. The results are summarized in Table 3 and Fig. 9, and the algorithm rank statistics are shown in Table 4. The proposed RVSIM is highlighted in bold.

Table 4 Summary of the method rank statistics on five databases LIVE, CSIQ, TID2008, TID2013, and Waterloo Exploration

Table 4 shows that the conclusion of indicator performance on the LIVE, CSIQ, TID2008, and TID2013 databases and the conclusion of gMAD competitive ranking on the Waterloo Exploration database are not exactly the same. MDSI ranked first in indicator performance, but ranked fifth in gMAD competition. SCQI ranked second in indicator performance, but performed poorly in gMAD competition. VSI ranked third in indicator performance, but ranked fourth in gMAD competition. SRSIM ranked fourth in indicator performance, but ranked second in gMAD competition. Although RVSIM, SRSIM, and MS-SSIM are not ranked at the top in indicator performance, they exhibited good results in gMAD competition. In particular, RVSIM had the highest rank in gMAD competition.

What results should be considered? The performance indices of the method and gMAD competition ranking are two kinds of judging basis. The performance indices can objectively reflect the performance of the method, but the benchmark databases only provide limited images because of the time-consuming and laborious subjective scoring. gMAD competitions are performed between methods. The results of competitive ranking objectively reflect the relative performance of the IQA models. However, the subjective scoring is needed because the Waterloo Exploration database is so large that the official did not provide DMOS of the image in advance. In other words, they have both rationality and restrictions. A method which has both good results in performance indices and gMAD competitive ranking is considered as an excellent and more objective method. From this point of view, RVSIM exhibits a more consistent and stable performance than the other methods.

5 Conclusion

This study proposes a FR IQA method called RVSIM, which combines Riesz transform and visual contrast sensitivity. RVSIM takes full advantage of the MS theory and Log-Gabor filter by exploiting CSF to allocate the weights of different frequency bands. At the same time, GM similarity is introduced to obtain the gradient similarity matrix. Then, the MPC matrix is used to construct the pooling function and obtain the RVSIM index.

This study conducts experiments involving the RVSIM index on five benchmark IQA databases. The conclusion of the indicator performance indicates that the RVSIM index delivers a highly competitive prediction accuracy on the LIVE and CSIQ databases. The scatter plot of the subjective DMOS versus scores obtained by RVSIM prediction on the LIVE database suggests that the RVSIM model has a strong congruency with HVS. The conclusion of gMAD competition ranking on the Waterloo Exploration database implies that the performance of the RVSIM method is better than that of advanced IQA methods. The overall performance on all five databases demonstrates that RVSIM is a robust IQA method.