Abstract
A novel unsupervised blind image quality assessment (BIQA) method, which requires no mean opinion scores for model training is presented in this paper. The method employs joint spatial and transform features as quality degradation metrics, specifically, phase congruency, gradient magnitude (GM), and GM and Laplacian of Gaussian response and local normalized coefficient are extracted as spatial features, and Karhunen–Loéve transform coefficient and discrete cosine transform coefficient are modeled as transform features. Both spatial and transform features are well analyzed to remove the redundancy, and then fitted to the multivariate Gaussian model for no-reference image quality assessment. Extensive experiments conducted on seven IQA databases demonstrate the superiority of the proposed method over the state-of-the-art both supervised and unsupervised BIQA methods.
Similar content being viewed by others
Introduction
The quality of images plays a more important role with the development of multimedia and transmission technologies, efforts to accurately assess image quality with lower complexity have become increasingly important. As human is the ultimate receiver of images, subjective quality assessment always has the highest accuracy but is time-consuming and expensive, making it impractical for most applications. On the contrary, objective quality assessment attempts to assess image quality without human involvement, which is more practical for real-world use and has therefore become increasingly important.
Objective image quality assessment (IQA) can be classified into three categories based on the usage of reference information, namely full-reference (FR) IQA1,2,3,4, reduced-reference (RR) IQA5,6,7, and no-reference IQA (NR-IQA)/blind IQA (BIQA)8,9,10,11. FR-IQA requires complete access to the reference image and compares it with the distorted image using a distortion measure for quality assessment. RR-IQA utilizes a subset of pre-determined features from both the reference and distorted images for quality assessment and requires less reference information than FR-IQA but more than BIQA. BIQA predicts the perceived quality of an image by extracting and analyzing its internal features, requiring no reference. This makes BIQA highly practical in real-world applications where reference images may be limited or unavailable.
BIQA methods can be further classified into two categories based on the usage of training labels. Supervised BIQA methods usually use subjective scores as labels to train the quality assessment models, and the difference mainly exists in the features and regression models used. For example, Mittal et al.8 extracted mean subtracted contrast normalized (MSCN) feature, Yang et al.12,13 extracted and enhanced naturalness and structural features via Karhunen-Loéve transform (KLT), while Zhang et al.14 proposed to extract quality-aware features from joint generalized local binary pattern statistics. These features were then mapped to subjective scores via support vector regression (SVR) to perform BIQA. Min et al.15 proposed to utilize multiple pseudo reference images created through various types and levels of distortion aggravation with FR-IQA method to generate similarity scores for BIQA.
With the advent of deep learning technologies, several approaches have been proposed for BIQA that utilize convolutional neural networks (CNN) for end-to-end joint feature extraction and regression. Ma et al.16 and Zhu et al.17 employed CNNs for this purpose. Wang et al.18 proposed a dual-perception network (DPNet) that uses end-to-end multi-task learning with knowledge distillation, while Lan et al.19 developed a framework that combines two feature extraction networks and a multilevel feature fusion (MFF) network to obtain multilevel degradation features for BIQA. Additionally, Wang et al.20 reformulated BIQA as an ordinal regression problem and achieved improved prediction accuracy by using deep CNNs and Transformers. Madhusudana et al.21 performed the prediction of distortion type and degree as an auxiliary task to learn features for BIQA. Pan et al.22 introduced a distortion aware module in CNN to perform BIQA on different distortions. Chen et al.23 propose an NR-IQA method via feature level pseudo-reference hallucination. Pan et al.24 proposed a multi-branch convolutional neural network to perform NR-IQA. Zhou et al.25 employed self-attention and recurrent neural network (RNN) to perform BIQA. Liu et al.26 proposed spatial optimal-scale filtering analysis for deep learning-based BIQA. Cao et al.27 proposed objective audio-visual quality assessment using attentional neural networks. Yu et al.28 employed transformers with self-attention mechanisms to perform NR-IQA.
Compared with supervised BIQA methods, unsupervised ones require no subjective scores for model parameters training and usually have better generalization ability. Wu et al.9 extracted one dimension feature from image binary patterns and utilized this feature to perform unsupervised BIQA with extremely low complexity. Natural image quality evaluator (NIQE)29 and its extensions, e.g., ILNIQE30 and SNP-NIQE31 employed multivariate Gaussian (MVG) model to perform unsupervised BIQA with different features. Venkatanath et al.32 proposed to estimate quality from spatial regions that are perceptually significant for unsupervised BIQA. Wu et al.33 proposed to consider the correlation between different color channels through quaternion representation, and then adopt MVG model to perform unsupervised BIQA. Also, Wu et al.34 incorporated histogram and deep-learned features along with natural scene statistical features to assess distortions. With the development of quality-aware features, the performance of the unsupervised BIQA method can be further improved.
The main contributions of this paper are summarized as follows:
-
We propose an unsupervised BIQA method, to evaluate image quality from multiple aspects, we extract phase congruency (PC), gradient magnitude (GM), GM and Laplacian of Gaussian response (GM-LOG) to measure image structure, local normalized coefficient to measure image naturalness. Additionally, we extract KLT and discrete cosine transform (DCT) coefficients to measure image perception quality from the transform domain.
-
Since we utilized multiple features in our method to reflect the characteristics of the HVS, ablation studies are conducted to analyze the contributions of different features, and comprehensive analyses are performed to select the most significant features to remove the redundancy of these multiple features.
Method
The framework of the proposed method is shown in Fig. 1. To better measure the perceived quality of an image, we extract spatial and transform features from non-overlapping image patches to form the feature matrix. Herein, to avoid the impact of non-texture patches, we only utilize high-contrast patches in pristine images for model training. During testing, we consider all patches from the test image. We then fit the MVG model with the mean vector \(\nu \) and covariance matrix \(\Sigma \) of the feature matrix.
We calculate the local and global qualities of distorted image as the distance between the pristine and distorted MVG model parameters:
where n is the number of image patches in each distorted image and \(f_i\) is the corresponding feature vector, \(\nu _{p}\), \(\nu _{d}\) and \(\Sigma _{p}\), \(\Sigma _{d}\) are the mean vectors and covariance matrices of pristine image MVG model and distorted image MVG model, respectively.
Finally, the quality of the distorted image is measured as the weighted average of local and global quality as follows:
where we set \(\alpha \) = 0.5 in this paper based on experiments on seven IQA databases as demonstrated in “Parameters optimization and ablation studies”.
Spatial features
An image is a two-dimensional (2D) spatial signal, and therefore, we extract spatial features from images to measure their perceptual quality. Specifically, since the perceived quality of an image is highly dependent on its structure and naturalness, we extract image structure-related features such as PC, GM, and GM-LOG features, along with naturalness-related features like MSCN.
PC is utilized as an indicator of the edge strength of an image, which is highly related to image structure. And we adopt the method in35 to compute the PC value of image I at position p as follows:
where \(H_{\theta _{f}}(p)=\sqrt{E_{n,\theta _{f}}(p)^{2}+O_{n,\theta _{f}}(p)^{2}}\), \(E_{n,\theta _{f}}(p)=\sum _{n} e_{n,\theta _{f}}(p)\), \(O_{n,\theta _{f}}(p)=\sum _{n} o_{n,\theta _{f}}(p)\), \(A_{n,\theta _{f}}(p)=\sqrt{e_{n,\theta _{f}}(p)^{2}+o_{n,\theta _{f}}(p)^{2}}\), \(e_{n,\theta _{f}}(p)\) and \(o_{n,\theta _{f}}(p)\) are responses of even and odd-symmetric filters, n and \(\theta _{f}\) are scale and directions, and \(\varepsilon \) is a small positive constant. The color relevant space O36 is utilized to extract PC,
and the coefficients are fitted with Weibull distribution to form the feature vector.
GM represents the contrast of an image, which can highly affect the perceived quality of the image, therefore is an indispensable IQA index. We utilize filters \(\varvec{D}_{h}=[1,-1]\) and \(\varvec{D}_{v}=\varvec{D}_{h}^{T}\) to extract GM:
where \(*\) is the convolution operation, and we compute the GM in the luminance channel. The distribution of GM coefficient is modeled with the Weibull distribution to form a feature vector.
Besides, we also extract finely selected GM-LOG features as structure features. We utilize Eq. (6) to extract the GM feature, with \(D_{h}\) and \(D_{v}\) being the Gaussian partial derivative filters in the horizontal and vertical directions, respectively. LOG feature is calculated as follows:
where \(h_{LOG}\) is the two-dimensional Laplacian of Gaussian operator. Then, the normalized GM and L are quantized into 10 levels, and the normalized bivariate histogram is calculated get the GM-LOG feature vector. We extract the GM-LOG feature from pristine images in29 and select the indices with the values all higher than 0.05 to remove the redundancy.
Image naturalness generally refers to the degree to which an image appears to be natural or realistic, and is a factor that can affect the perceived quality of an image. Therefore, it is often considered in IQA. The MSCN coefficient distribution of distorted natural scene images differs from pristine ones. Therefore, we employ it to measure the naturalness of an image, where MSCN is computed as:
where i and j are the pixel coordinates, \(\mu \) and \(\sigma \) are calculated as follows:
and \(\omega \) is a unit-volume Gaussian window with size 7 \(\times \) 7. General Gaussian distribution (GGD) in37 is utilized to model the MSCN coefficient distribution as features. Besides, asymmetric generalized Gaussian distribution (AGGD) is employed to model the adjacent MSCN coefficient in four directions, including horizontal, vertical, main diagonal, and sub-diagonal8 as feature representation.
Transform features
Considering that the transform technologies are widely utilized in image processing, we extract transform features from KLT and DCT coefficients to measure image perceptual quality from different aspects.
KLT is a data-driven transform and can extract quality-aware features12,38. Non-overlapping patches of MSCN normalized pristine image with size \(\sqrt{k}\times \sqrt{k}\) are used to collect vectorized patches to calculate the covariance matrix. The transform kernel \(\textbf{P}\) with size \(k\times k\) is the eigenvectors of the covariance matrix arranged in descending order based on the eigenvalues. We set k to 4 in this paper, and GGD is adopted to fit the KLT coefficient distribution in each frequency band as transform feature.
According to Benford’s law, for a carry system with b as the base, the probability of occurrence of a number starting with n is \(p(n) = log_b{(n+1)}-log_b{(n)}\), and Ou et al.39 found that the distance between the distribution of pristine image p(n) and the distorted image \(p_d(n)\) in the DCT domain is highly correlated with the subjective score of the distorted image. Therefore, we calculate the Euclidean distance between these two distributions in the color relevant space in Eq. (5) as features. An example of Benford’s law of distorted image is shown in Fig. 2, we can see that the distribution of DCT coefficient from reference image is very close to the distribution of Benford’s law, however, the distribution of distorted images are quite different.
Significant feature selection and model training
We extract all the above-mentioned spatial and transform features from two scales, i.e., the original scale, and the 1/2 downsampled scale, except for the DCT feature, which is extracted only from the downsampled scale (for lower computational complexity). The original patch size is 96 \(\times \) 96. To reduce the redundancy among different types of features, we select the columns from the feature matrix extracted from 125 pristine natural scene images in29 whose average values are higher than 0.01 as significant features. These pristine images are also utilized to train the benchmark MVG model parameters and KLT kernels.
Experimental results
The experiments are conducted on eight widely utilized IQA databases, including natural scene IQA databases LIVE40, MICT41, CSIQ42, TID201343, KADID-10k44, CID201345, LIVE Challenge (LIVE-C)46, as well as screen content IQA databases SIQAD47. All of these databases are publicly available.
Comparison with state-of-the-art methods
Table 1 shows the Spearman Rank Order Correlation Coefficient (SROCC) of the proposed method as well as supervised BIQA methods8,16,48,49 and unsupervised methods29,30,31 on common distortion types (JPEG compression, JPEG2000 compression, white noise, and Gaussian blur) and real-world distortion. Specifically, JPEG and JPEG2000 compression refer to lossy image compression using the JPEG and JPEG2000 codecs, which are very common in image compression. White noise refers to additive white Gaussian noise, which is commonly encountered in image acquisition and transmission. Gaussian blur refers to blurring an image using a Gaussian filter, which is quite common in image acquisition. LIVE, MICT, CSIQ_sub, TID2013_sub, and KADID-10k_sub have common distortions. CID2013 and LIVE-C have real-world distortion. Specifically, the SROCC values in each row are calculated between the objective scores predicted by the method and the subjective scores that are given in the databases. We utilize LIVE to train the supervised models and test on the remaining six IQA databases. “W. A.” refers to the weighted average results on the above seven IQA databases with the number of distorted images in each database being the weights. The best results of both supervised and unsupervised methods are boldfaced. We can see that our method achieves the highest results on both common and authentic distortions, as well as the weighted average result compared to reference methods.
We also perform statistical analysis by applying t-test on the prediction residuals, the results are also tabulated in Table 1, wherein (1), (0), and (− 1) tell that our method is superior, comparative, and inferior to the reference method statistically with 95% confidence. According to31, the residuals are obtained by calculating the differences between the subjective scores and the converted objective scores using Eq. (11):
where q and s(q) are the objective and the converted scores, respectively, \(\beta _1\)–\(\beta _5\) are the curve fitting parameters. This conversion is necessary because objective and subjective scores have different scales. The proposed method achieves comparable results with NIQE and SNP-NIQE, and better results than ILNIQE on common distortions.
Table 2 shows the feature dimension and average extraction time of different unsupervised BIQA methods. The test methods are all implemented with MATLAB and tested on Windows system with Intel Core i7-3770 3.40 GHz dual-core CPU and 8 GB RAM. ILNIQE has good generalization performance but its computational complexity is too high. LPSI has very low computational complexity, but its performance is limited. The proposed method achieves the highest average results on seven databases, and the generalization performance and running time are comparative.
The SROCC results of all distortions on TID2013 and KADID-10k are tabulated in Tables 3 and 4, respectively. For better visualization, the best and the second-best results in the unsupervised methods are boldfaced and underlined. The proposed method achieved comparable average performance with state-of-the-art unsupervised BIQA methods on both databases, which has demonstrated the good performance of our method on uncommon distortion. However, the proposed method has failure cases on some specific distortions, such as non-eccentricity pattern noise (NERP) and mean shift (MS) in Table 3, which are luminance-related distortions, and change of color saturation (CCS) in Table 3, color diffusion (CD), denoise (DEN), and quantization (QN) in Table 4, which are color-related distortions. Since the features utilized in our method mainly measure image structure and naturalness, we cannot accurately quantify luminance and color-related distortions, which is a shortcoming of our method and can be addressed in future work.
From these tables, we can see that with the joint utilization of finely selected spatial and transform features, the proposed method can assess the image perceived quality from different aspects, resulting in achieving the highest quality assessment accuracy on most distortion types with relatively low computational complexity. The results have shown the superiority and good generalization capability of our method as an unsupervised BIQA method.
Parameters optimization and ablation studies
The original GM-LOG features have many zeros and outliers which make it less efficient, therefore we attempt to remove these redundant features to improve the efficiency of GM-LOG. We perform the GM-LOG feature selection based on selecting the features with their minimum values higher than \(TH_{GM-LOG}\), the indices of selected features are obtained from pristine images29, and then applied to features extracted from distorted images. The results of GM-LOG features selected with different \(TH_{sf}\) values are tabulated in Table 5, with the best results boldfaced. We can see from the table that when setting \(TH_{sf}\) = 0.05, the proposed method achieves the highest results on CSIQ, Kadid-10k, CID2013, and LIVE-C, and comparable results on the rest three databases. Therefore, in consideration of both efficiency and feature dimension, we set \(TH_{sf}\) = 0.05 to select efficient GM-LOG features in this paper.
We conduct experiments on seven IQA databases to study the optimal KLT kernel size for transform feature extraction, and employ SROCC as well as Kendall rank-order correlation coefficient (KROCC) and Pearson linear correlation coefficient (PLCC) to evaluate the performance. The objective scores are mapped to the subjective scores via nonlinear mapping in Eq. (11) before calculating PLCC50. Weighted average results on seven IQA databases are shown in Fig. 3a, we can see that with the increase of KLT kernel size (from 4 \(\times \) 4 to 16 \(\times \) 16), the three metrics decrease, therefore we set k = 4 in this paper. We also conduct experiments on seven IQA databases to study the impact of \(\alpha \) in Eq. (3) and plot the weighted average SROCC results on Fig. 3b, we can see that when setting \(\alpha \) = 0.5, the proposed achieved the highest result on seven databases.
As we extracted multiple features as spatial and transform features, some features may contribute less than the other features. therefore, to further improve the performance and reduce the feature dimension, we perform significant feature selection to remove the insignificant features. The experiment is conducted on seven IQA databases, and we remove the insignificant features by selecting the features with their average values higher than \(TH_{sf}\). We mark these feature indices based on features extracted from pristine images29, and then test on IQA databases, the results are tabulated in Table 6, with the best results boldfaced. We can see from the table that when setting \(TH_{sf}\) = 0.01, the proposed method achieves the highest results on MICT, TID2013, and LIVE-C, and comparable results on the rest four databases.
To verify the effectiveness of the two types of features, i.e. spatial features and transform features, we report the ablation test results in Table 7, where we can see that the spatial features usually take the leading role and transform features work as supplements, and the combination of these two features achieves the optimal results. However, we can also see that the performance of combined features is inferior to the single transform feature, but better than the single spatial feature on full TID2013 and KADID-10k databases, which means the transform feature takes the leading role on uncommon artificial distortion types.
Performance on screen content images
The proposed method is designed for natural scene images, we also conduct an experiment on the screen image quality assessment database (SIQAD)47 to further verify the generalization ability. SROCC, KROCC, PLCC, and root mean square error (RMSE) of different unsupervised BIQA methods are tabulated in Table 8, with the best results boldfaced. Similarly, the objective scores are mapped to the subjective scores via nonlinear mapping in Eq. (11) before calculating RMSE50. We can see from the table that the proposed method achieved the best performance compared with competing unsupervised BIQA methods, but the result is still not good enough and can be further improved in the future.
Conclusion
In this paper, we proposed an unsupervised BIQA method utilizing joint spatial and transform features. Specifically, we utilized PC, GM, GM-LOG and MSCN as spatial features, and KLT and DCT coefficients as transform features. And these features were well analyzed to remove the redundancy and then fitted to MVG model for unsupervised BIQA. Experiments on multiple IQA databases indicated that the proposed method achieved state-of-the-art results with low complexity on both artificial and authentic distortions. Future work will be focused on improving the performance on authentic distortions and illustrated images.
Data availability
The datasets used and/or analyzed during the current study are available from the corresponding author upon reasonable request. And we clarify that no human or animal is directly involved during this study.
References
Wang, Z., Bovik, A., Sheikh, H. & Simoncelli, E. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 13, 600–612 (2004).
Zhang, L., Zhang, L., Mou, X. & Zhang, D. FSIM: A feature similarity index for image quality assessment. IEEE Trans. Image Process. 20, 2378–2386 (2011).
Jain, A. & Bhateja, V. A full-reference image quality metric for objective evaluation in spatial domain. In 2011 International Conference on Communication and Industrial Application, 1–5 (2011).
Kim, J. & Lee, S. Deep blind image quality assessment by employing FR-IQA. In 2017 IEEE International Conference on Image Processing (ICIP), 3180–3184 (2017).
Wu, J., Liu, Y., Shi, G. & Lin, W. Saliency change based reduced reference image quality assessment. In 2017 IEEE Visual Communications and Image Processing (VCIP), 1–4 (2017).
Liu, Y. et al. Reduced-reference image quality assessment in free-energy principle and sparse representation. IEEE Trans. Multimed. 20, 379–391 (2018).
Hu, Q., Sheng, Y., Yang, L., Li, Q. & Chai, L. Reduced-reference image quality assessment for single-image super-resolution based on wavelet domain. In 2019 Chinese Control And Decision Conference (CCDC), 2067–2071 (2019).
Mittal, A., Moorthy, A. K. & Bovik, A. C. No-reference image quality assessment in the spatial domain. IEEE Trans. Image Process. 21, 4695–4708 (2012).
Wu, Q., Wang, Z. & Li, H. A highly efficient method for blind image quality assessment. In 2015 IEEE International Conference on Image Processing (ICIP), 339–343 (2015).
Ma, H., Cui, Z., Gan, Z., Tang, G. & Liu, F. Saliency-enhanced two-stream convolutional network for no-reference image quality assessment. J. Electron. Imaging 31, 1–19 (2022).
Sang, Q. et al. MoNET: No-reference image quality assessment based on a multi-depth output network. J. Electron. Imaging 30, 1–17 (2021).
Yang, C., Zhang, X., An, P., Shen, L. & Kuo, C.-C.J. Blind image quality assessment based on multi-scale KLT. IEEE Trans. Multimed. 23, 1557–1566 (2021).
Yang, C., An, P. & Shen, L. Blind image quality measurement via data-driven transform-based feature enhancement. IEEE Trans. Instrum. Meas. 71, 1–12 (2022).
Zhang, M., Muramatsu, C., Zhou, X., Hara, T. & Fujita, H. Blind image quality assessment using the joint statistics of generalized local binary pattern. IEEE Signal Process. Lett. 22, 207–210 (2015).
Min, X., Zhai, G., Gu, K., Liu, Y. & Yang, X. Blind image quality estimation via distortion aggravation. IEEE Trans. Broadcast. 64, 508–517 (2018).
Ma, K. et al. End-to-end blind image quality assessment using deep neural networks. IEEE Trans. Image Process. 27, 1202–1213 (2018).
Zhu, H., Li, L., Wu, J., Dong, W. & Shi, G. MetaIQA: Deep meta-learning for no-reference image quality assessment. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 14131–14140 (2020).
Wang, X., Xiong, J., Li, B., Suo, J. & Gao, H. Learning hybrid representations of semantics and distortion for blind image quality assessment. In ICASSP 2023– 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1–5 (2023).
Lan, X. et al. Multilevel feature fusion for end-to-end blind image quality assessment. IEEE Transactions on Broadcasting 1–11 (2023).
Wang, H., Tu, Y., Liu, X., Tan, H. & Liu, H. Deep ordinal regression framework for no-reference image quality assessment. IEEE Signal Process. Lett. 30, 428–432 (2023).
Madhusudana, P. C., Birkbeck, N., Wang, Y., Adsumilli, B. & Bovik, A. C. Image quality assessment using contrastive learning. IEEE Trans. Image Process. 31, 4149–4161 (2022).
Pan, Z. et al. DACNN: Blind image quality assessment via a distortion-aware convolutional neural network. IEEE Trans. Circ. Syst. Video Technol. 32, 7518–7531 (2022).
Chen, B. et al. No-reference image quality assessment by hallucinating pristine features. IEEE Trans. Image Process. 31, 6139–6151 (2022).
Pan, Z. et al. No-reference image quality assessment via multibranch convolutional neural networks. IEEE Trans. Arti. Intell. 4, 148–160 (2023).
Zhou, M. et al. An end-to-end blind image quality assessment method using a recurrent network and self-attention. IEEE Trans. Broadcast. 69, 369–377 (2023).
Liu, M., Huang, J., Zeng, D., Ding, X. & Paisley, J. A multiscale approach to deep blind image quality assessment. IEEE Trans. Image Process. 32, 1656–1667 (2023).
Cao, Y., Min, X., Sun, W. & Zhai, G. Attention-guided neural networks for full-reference and no-reference audio-visual quality assessment. IEEE Trans. Image Process. 32, 1882–1896 (2023).
Yu, L., Li, J., Pakdaman, F., Ling, M. & Gabbouj, M. MAMIQA: No-reference image quality assessment based on multiscale attention mechanism with natural scene statistics. IEEE Signal Process. Lett. 30, 588–592 (2023).
Mittal, A., Soundararajan, R. & Bovik, A. C. Making a “completely blind’’ image quality analyzer. IEEE Signal Process. Lett. 20, 209–212 (2013).
Zhang, L., Zhang, L. & Bovik, A. C. A feature-enriched completely blind image quality evaluator. IEEE Trans. Image Process. 24, 2579–2591 (2015).
Liu, Y. et al. Unsupervised blind image quality evaluation via statistical measurements of structure, naturalness, and perception. IEEE Trans. Circ. Syst. Video Technol. 30, 929–943 (2020).
N, V., D, P., Bh, M. C., Channappayya, S. S. & Medasani, S. S. Blind image quality evaluation using perception based features. In 2015 Twenty First National Conference on Communications (NCC), 1–6 (2015).
Wu, L., Zhang, X., Chen, H. & Zhou, Y. Unsupervised quaternion model for blind colour image quality assessment. Signal Process. 176, 107708 (2020).
Wu, L., Zhang, X., Chen, H., Wang, D. & Deng, J. VP-NIQE: An opinion-unaware visual perception natural image quality evaluator. Neurocomputing 463, 17–28 (2021).
Kovesi, P. Image features from phase congruency. Videre J. Comput. Vis. Res. 1, 1–26 (1999).
Geusebroek, J., van den Boomgaard, R., Smeulders, A. W. M. & Geerts, H. Color invariance. IEEE Trans. Pattern Anal. Mach. Intell. 23, 1338–1350 (2001).
Sharifi, K. & Leon-Garcia, A. Estimation of shape parameter for generalized Gaussian distributions in subband decompositions of video. IEEE Trans. Circ. Syst. Video Technol. 5, 52–56 (1995).
Zhang, X., Kwong, S. & Kuo, C.-C.J. Data-driven transform-based compressed image quality assessment. IEEE Trans. Circ. Syst. Video Technol. 31, 3352–3365 (2021).
Ou, F.-Z., Wang, Y.-G. & Zhu, G. A novel blind image quality assessment method based on refined natural scene statistics. In 2019 IEEE International Conference on Image Processing (ICIP), 1004–1008 (2019).
Sheikh, H. R., Wang, Z., Cormack, L. & Bovik, A. C. LIVE image quality assessment database release 2 (2005). http://live.ece.utexas.edu/research/quality.
Horita, Y., Shibata, K. & Kawayoka, Y. Toyama Image quality evaluation database (2011). http://mict.eng.u-toyama.ac.jp/mictdb.html.
Larson, E. & Chandler, D. Categorical image quality (CSIQ) database (2010). http://vision.okstate.edu/csiq.
Ponomarenko, N. et al. Color image database TID2013: Peculiarities and preliminary results. In European Workshop on Visual Information Processing (EUVIP), 106–111 (2013). http://www.ponomarenko.info/tid2013.htm.
Lin, H., Hosu, V. & Saupe, D. KADID-10k: A large-scale artificially distorted IQA database. In 2019 Eleventh International Conference on Quality of Multimedia Experience (QoMEX), 1–3 (2019). http://database.mmsp-kn.de/kadid-10k-database.html.
Virtanen, T., Nuutinen, M., Vaahteranoksa, M., Oittinen, P. & Häkkinen, J. CID2013: A database for evaluating no-reference image quality assessment algorithms. IEEE Trans. Image Process. 24, 390–402 (2015).
Ghadiyaram, D. & Bovik, A. C. Massive online crowdsourced study of subjective and objective picture quality. IEEE Trans. Image Process. 25, 372–387 (2016).
Yang, H., Fang, Y. & Lin, W. Perceptual quality assessment of screen content images. IEEE Trans. Image Process. 24, 4408–4421 (2015).
Ye, P., Kumar, J., Kang, L. & Doermann, D. Unsupervised feature learning framework for no-reference image quality assessment. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, 1098–1105 (2012).
Liu, X., Van De Weijer, J. & Bagdanov, A. D. RankIQA: Learning from rankings for no-reference image quality assessment. In 2017 IEEE International Conference on Computer Vision (ICCV), 1040–1049 (2017).
Antkowiak, J. & Baina, T. J. Final report from the video quality experts group on the validation of objective models of video quality assessment march. ITU-T Standards Contribution COM (2000).
Acknowledgements
This work was supported in part by the NSFC under Grant 61901252, 62171002, 62071287, 62020106011, and Science and Technology Commission of Shanghai Municipality under Grant 22ZR1424300, 20DZ2290100.
Author information
Authors and Affiliations
Contributions
C.Y. conducted project administration, methodology and wrote the main manuscript, Q.H. conducted the experiments and prepared the tables and figures, P.A. conducted project administration and supervision. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Yang, C., He, Q. & An, P. Unsupervised blind image quality assessment via joint spatial and transform features. Sci Rep 13, 10865 (2023). https://doi.org/10.1038/s41598-023-38099-5
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-023-38099-5
- Springer Nature Limited
This article is cited by
-
Attention-driven residual-dense network for no-reference image quality assessment
Signal, Image and Video Processing (2024)