METER: Multi-task efficient transformer for no-reference image quality assessment

Zhu, Pengli; Liu, Siyuan; Liu, Yancheng; Yap, Pew-Thian

doi:10.1007/s10489-023-05104-3

METER: Multi-task efficient transformer for no-reference image quality assessment

Published: 06 November 2023

Volume 53, pages 29974–29990, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Pengli Zhu^1,2,
Siyuan Liu ORCID: orcid.org/0000-0002-0946-4683¹^na1,
Yancheng Liu¹ &
…
Pew-Thian Yap³

360 Accesses
Explore all metrics

Abstract

No-reference image quality assessment (NR-IQA) is a fundamental yet challenging task in computer vision. Current NR-IQA methods based on convolutional neural networks typically employ deeply-stacked convolutions to learn local features pertinent to image quality, neglecting the importance of non-local information and distortion types. As a remedy, we introduce in this paper an end-to-end multi-task efficient transformer (METER) for the NR-IQA task, consisting of a multi-scale semantic feature extraction (MSFE) backbone module, a distortion type identification (DTI) module, and an adaptive quality prediction (AQP) module. METER identifies the distortion type using the DTI module to facilitate extraction of distortion-specific features via the MSFE module. METER scores image quality in an adaptive manner by adjusting the weights and biases of adaptive fully-connected (AFC) layers in the AQP module, increasing generalizability to images captured in different natural environments. Experimental results demonstrate that METER significantly outperforms existing methods for accuracy and efficiency across five public datasets: LIVEC, BID, KonIQ, LIVE, and CSIQ, and exhibits remarkable performance with Pearson’s linear correlation coefficients: 0.923, 0.912, 0.937, 0.978, and 0.982 on respective datasets when compared to human subjective scores. Additionally, METER also attains higher efficiency (-53.9% Params and -87.7% FLOPs) compared to the existing transformer-based methods, making it valuable for real-world applications.

Graphical Abstract

METER: Multi-task Efficient Transformer for No-Reference Image Quality Assessment. Pengli Zhu,Siyuan Liu,Yancheng Liu,Pew-Thian Yap

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CBAM: Convolutional Block Attention Module

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

Deep learning models for digital image processing: a review

Article 07 January 2024

Availability of data and materials

the datasets generated during and/or analysed during the current study can be publicly available with links from the corresponding references.

Code Availability

https://github.com/Idea89560041/METER.

References

Ma K, Liu W, Zhang K, Duanmu Z, Wang Z, Zuo W (2017) End-to-end blind image quality assessment using deep neural networks. IEEE Trans Image Process 27(3):1202–1213
MathSciNet Google Scholar
Su S, Yan Q, Zhu Y, Zhang C, Ge X, Sun J, Zhang Y (2020) Blindly assess image quality in the wild guided by a self-adaptive hyper network. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pp. 3667–3676
Sun S, Yu T, Xu J, Lin J, Zhou W, Chen Z (2022) Graphiqa: Learning distortion graph representations for blind image quality assessment. IEEE Transactions on Multimedia
Di Claudio ED, Jacovitti G (2017) A detail-based method for linear full reference image quality prediction. IEEE Trans Image Process 27(1):179–193
MathSciNet Google Scholar
Sun W, Liao Q, Xue J-H, Zhou F (2018) Spsim: A superpixel-based similarity index for full-reference image quality assessment. IEEE Trans Image Process 27(9):4232–4244
MathSciNet Google Scholar
Bae S-H, Kim M (2016) A novel image quality assessment with globally and locally consilient visual quality perception. IEEE Trans Image Process 25(5):2392–2406
MathSciNet Google Scholar
Bampis CG, Gupta P, Soundararajan R, Bovik AC (2017) Speed-qa: Spatial efficient entropic differencing for image and video quality. IEEE Signal Process Lett 24(9):1333–1337
Google Scholar
Min X, Gu K, Zhai G, Hu M, Yang X (2018) Saliency-induced reduced-reference quality index for natural scene and screen content images. Signal Process 145:127–136
Google Scholar
Zhu W, Zhai G, Min X, Hu M, Liu J, Guo G, Yang X (2019) Multi-channel decomposition in tandem with free-energy principle for reduced-reference image quality assessment. IEEE Trans Multimed 21(9):2334–2346
Google Scholar
Zhai G, Min X, Liu N (2019) Free-energy principle inspired visual quality assessment: An overview. Digit Signal Process 91:11–20
Google Scholar
Lu Y, Li W, Ning X, Dong X, Zhang Y, Sun L (2020) Image quality assessment based on dual domains fusion. In: 2020 International Conference on High Performance Big Data and Intelligent Systems (HPBD &IS), pp 1–6. IEEE
Lu Y, Li W, Ning X, Dong X, Zhang L, Sun L, Cheng C (2021) Blind image quality assessment based on the multiscale and dual-domains features fusion. Practice and Experience, Concurrency and Computation, p 6177
Min X, Zhai G, Gu K, Fang Y, Yang X, Wu X, Zhou J, Liu X (2016) Blind quality assessment of compressed images via pseudo structural similarity. In: 2016 IEEE International Conference on Multimedia and Expo (ICME), pp 1–6. IEEE
Zhan Y, Zhang R (2017) No-reference jpeg image quality assessment based on blockiness and luminance change. IEEE Signal Process Lett 24(6):760–764
Google Scholar
Dong L, Zhou J, Tang YY (2018) Effective and fast estimation for image sensor noise via constrained weighted least squares. IEEE Trans Image Process 27(6):2715–2730
MathSciNet Google Scholar
Li L, Xia W, Lin W, Fang Y, Wang S (2016) No-reference and robust image sharpness evaluation based on multiscale spatial and spectral features. IEEE Trans Multimed 19(5):1030–1040
Google Scholar
Dendi SVR, Channappayya SS (2020) No-reference video quality assessment using natural spatiotemporal scene statistics. IEEE Trans Image Process 29:5612–5624
Google Scholar
Liu Y, Gu K, Zhang Y, Li X, Zhai G, Zhao D, Gao W (2019) Unsupervised blind image quality evaluation via statistical measurements of structure, naturalness, and perception. IEEE Trans Circ Syst Vid Technol 30(4):929–943
Google Scholar
Yan B, Bare B, Tan W (2019) Naturalness-aware deep no-reference image quality assessment. IEEE Trans Multimed 21(10):2603–2615
Google Scholar
Liu Y, Yin X, Wang Y, Yin Z, Zheng Z (2022) Hvs-based perception-driven no-reference omnidirectional image quality assessment. IEEE Trans Instrum Meas 72:1–11
Google Scholar
Yao J, Shen J, Yao C (2023) Image quality assessment based on the perceived structural similarity index of an image. Mathematical Biosciences and Engineering: MBE 20(5):9385–9409
Google Scholar
Zhang F, Roysam B (2016) Blind quality metric for multidistortion images based on cartoon and texture decomposition. IEEE Signal Process Lett 23(9):1265–1269
Google Scholar
Kim J, Nguyen A-D, Lee S (2018) Deep cnn-based blind image quality predictor. IEEE Trans Neural Netw Learn Syst 30(1):11–24
Google Scholar
Wu Q, Li H, Ngan KN, Ma K (2017) Blind image quality assessment using local consistency aware retriever and uncertainty aware evaluator. IEEE Trans Circ Syst Vid Technol 28(9):2078–2089
Google Scholar
Pang Y, Zhou B, Nie F (2019) Simultaneously learning neighborship and projection matrix for supervised dimensionality reduction. IEEE Trans Neural Netw Learn Syst 30(9):2779–2793
MathSciNet Google Scholar
Liu S, Thung K-H, Lin W, Yap P-T, Shen D (2020) Real-time quality assessment of pediatric mri via semi-supervised deep nonlocal residual neural networks. IEEE Trans Image Process 29:7697–7706
Google Scholar
Zhang W, Ma K, Yan J, Deng D, Wang Z (2018) Blind image quality assessment using a deep bilinear convolutional neural network. IEEE Trans Circ Syst Vid Technol 30(1):36–47
Google Scholar
Li D, Jiang T, Lin W, Jiang M (2018) Which has better visual quality: The clear blue sky or a blurry animal? IEEE Trans Multimed 21(5):1221–1234
Google Scholar
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008
Golestaneh SA, Dadsetan S, Kitani KM (2022) No-reference image quality assessment via transformers, relative ranking, and self-consistency. In: Proceedings of the IEEE/CVF Winter conference on applications of computer vision, pp 1220–1230
Yang S, Wu T, Shi S, Lao S, Gong Y, Cao M, Wang J, Yang Y (2022) Maniqa: Multi-dimension attention network for no-reference image quality assessment. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pp 1191–1200
Zhang Q, Yang Y-B (2021) Rest: An efficient transformer for visual recognition. Adv neural inf process syst 34:15475–15485
Google Scholar
Zhang L, Zhang L, Bovik AC (2015) A feature-enriched completely blind image quality evaluator. IEEE Trans Image Process 24(8):2579–2591
MathSciNet Google Scholar
Xu J, Ye P, Li Q, Du H, Liu Y, Doermann D (2016) Blind image quality assessment based on high order statistics aggregation. IEEE Trans Image Process 25(9):4444–4457
MathSciNet Google Scholar
Kim J, Lee S (2016) Fully deep blind image quality predictor. IEEE J Sel Top Signal Process 11(1):206–220
Google Scholar
Zhang Y, Chandler DM (2018) Opinion-unaware blind quality assessment of multiply and singly distorted images via distortion parameter estimation. IEEE Trans Image Process 27(11):5433–5448
MathSciNet Google Scholar
Kang L, Ye P, Li Y, Doermann D (2015) Simultaneous estimation of image quality and distortion via multi-task convolutional neural networks. In: 2015 IEEE International Conference on Image Processing (ICIP), pp 2791–2795. IEEE
Zeng H, Zhang L, Bovik AC (2018) Blind image quality assessment with a probabilistic quality representation. In: 2018 IEEE International Conference on Image Processing (ICIP) p
Bahdanau D, Cho KH, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: 3rd International Conference on Learning Representations, ICLR 2015
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J, Houlsby N (2021) An image is worth 16x16 words: Transformers for image recognition at scale. ICLR
Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: European conference on computer vision, pp 213–229. Springer
Chen H, Wang Y, Guo T, Xu C, Deng Y, Liu Z, Ma S, Xu C, Xu C, Gao W (2021) Pre-trained image processing transformer. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pp 12299–12310
You J, Korhonen J (2021) Transformer for image quality assessment. In: 2021 IEEE International Conference on Image Processing (ICIP), pp 1389–1393. IEEE
Liu J, Li X, Peng Y, Yu T, Chen Z (2022) Swiniqa: Learned swin distance for compressed image quality assessment. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pp 1795–1799
Xu Y, Wei H, Lin M, Deng Y, Sheng K, Zhang M, Tang F, Dong W, Huang F, Xu C (2022) Transformers in computational visual media: A survey. Comput Vis Med 8:33–62
Google Scholar
Liu Y, Zhang Y, Wang Y, Hou F, Yuan J, Tian J, Zhang Y, Shi Z, Fan J, He Z (2023) A survey of visual transformers. IEEE Transactions on Neural Networks and Learning Systems
Li K, Wang Y, Zhang J, Gao P, Song G, Liu Y, Li H, Qiao Y (2023) Uniformer: Unifying convolution and self-attention for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence
Fan X, Liu H (2023) Flexformer: Flexible transformer for efficient visual recognition. Pattern Recognit Lett 169:95–101
Google Scholar
Li W, Li J, Gao G, Deng W, Zhou J, Yang J, Qi G-J (2023) Cross-receptive focused inference network for lightweight image super-resolution. IEEE Transactions on Multimedia
Feng H, Wang L, Li Y, Du A (2022) Lkasr: Large kernel attention for lightweight image super-resolution. Knowl-Based Syst 252:109376
Google Scholar
Lin X, Yu L, Cheng K-T, Yan Z (2023) Batformer: Towards boundary-aware lightweight transformer for efficient medical image segmentation. IEEE Journal of Biomedical and Health Informatics
Yang J, Tu J, Zhang X, Yu S, Zheng X (2023) Tse deeplab: An efficient visual transformer for medical image segmentation. Biomed Signal Process Control 80:104376
Google Scholar
Zhao Z, Hao K, Liu X, Zheng T, Xu J, Cui S, He C, Zhou J, Zhao G (2023) Mcanet: Hierarchical cross-fusion lightweight transformer based on multi-convhead attention for object detection. Image and Vision Computing, p 104715
Ye T, Qin W, Zhao Z, Gao X, Deng X, Ouyang Y (2023) Real-time object detection network in uav-vision based on cnn and transformer. IEEE Trans Instrum Meas 72:1–13
Google Scholar
Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International conference on machine learning, pp 448–456. PMLR
Li Y, Yuan Y (2017) Convergence analysis of two-layer neural networks with relu activation. Advances in neural information processing systems, vol 30
Han J, Moraga C (1995) The influence of the sigmoid function parameters on the speed of backpropagation learning. In: International workshop on artificial neural networks, pp 195–201. Springer
Kabani A, El-Sakka MR (2016) Object detection and localization using deep convolutional networks with softmax activation and multi-class log loss. In: Image analysis and recognition: 13th International conference, ICIAR 2016, in Memory of Mohamed Kamel, Póvoa de Varzim, Portugal, July 13-15, 2016, Proceedings 13, pp 358–366. Springer
Ulyanov D, Vedaldi A, Lempitsky V (2016) Instance normalization: The missing ingredient for fast stylization. arXiv:1607.08022
Xu J, Sun X, Zhang Z, Zhao G, Lin J (2019) Understanding and improving layer normalization. Advances in Neural Information Processing Systems, vol 32
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958
MathSciNet Google Scholar
Moorthy AK, Bovik AC (2010) A two-step framework for constructing blind image quality indices. IEEE Signal Process Lett 17(5):513–516
Google Scholar
Ghadiyaram D, Bovik AC (2015) Massive online crowdsourced study of subjective and objective picture quality. IEEE Trans Image Process 25(1):372–387
MathSciNet Google Scholar
Hosu V, Lin H, Sziranyi T, Saupe D (2020) Koniq-10k: An ecologically valid database for deep learning of blind image quality assessment. IEEE Trans Image Process 29:4041–4056
Google Scholar
Ciancio A, Silva EA, Said A, Samadani R, Obrador P et al (2010) No-reference blur assessment of digital pictures based on multifeature classifiers. IEEE Trans Image Process 20(1):64–75
MathSciNet Google Scholar
Thomee B, Shamma DA, Friedland G, Elizalde B, Ni K, Poland D, Borth D, Li L-J (2016) Yfcc100m: The new data in multimedia research. Commun ACM 59(2):64–73
Google Scholar
Sheikh HR, Sabir MF, Bovik AC (2006) A statistical evaluation of recent full reference image quality assessment algorithms. IEEE Trans Image Process 15(11):3440–3451
Google Scholar
Larson EC, Chandler DM (2010) Most apparent distortion: full-reference image quality assessment and the role of strategy. J Electron Imaging 19(1):011006
Google Scholar
Bosse S, Maniry D, Müller K-R, Wiegand T, Samek W (2017) Deep neural networks for no-reference and full-reference image quality assessment. IEEE Trans Image Process 27(1):206–219
MathSciNet Google Scholar
Group VQE, et al (2000) Final report from the video quality experts group on the validation of objective models of video quality assessment. In: VQEG Meeting, Ottawa, Canada, March, 2000
Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L et al (2019) Pytorch: An imperative style, high-performance deep learning library. Adv Neural Infor Process Syst 32:8026–8037
Google Scholar
Zhang Z (2018) Improved adam optimizer for deep neural networks. In: 2018 IEEE/ACM 26th International Symposium on Quality of Service (IWQoS), pp 1–2. IEEE
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: A large-scale hierarchical image database. In: 2009 IEEE Conference on computer vision and pattern recognition, pp 248–255. IEEE
Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the 13th International conference on artificial intelligence and statistics, pp 249–256. JMLR Workshop and conference proceedings
Zhang Q, Rao L, Yang Y (2021) Group-cam: Group score-weighted visual explanations for deep convolutional networks. arXiv:2103.13859
Tipping ME, Bishop CM (1999) Probabilistic principal component analysis. J Royal Stat Soc Ser B (Stat Methodol) 61(3):611–622
MathSciNet Google Scholar
Zhang J, Le TM (2010) A new no-reference quality metric for jpeg2000 images. IEEE Trans Cons Electron 56(2):743–750
Google Scholar
Liang L, Wang S, Chen J, Ma S, Zhao D, Gao W (2010) No-reference perceptual image quality metric using gradient profiles for jpeg2000. Signal Process Image Commun 25(7):502–516
Google Scholar
Wang Q, Chu J, Xu L, Chen Q (2016) A new blind image quality framework based on natural color statistic. Neurocomput 173:1798–1810
Google Scholar
Lee D, Plataniotis KN (2016) Toward a no-reference image quality assessment using statistics of perceptual color descriptors. IEEE Trans Image Process 25(8):3875–3889
MathSciNet Google Scholar
Liu T-J, Liu K-H (2017) No-reference image quality assessment by wide-perceptual-domain scorer ensemble method. IEEE Trans Image Process 27(3):1138–1151
MathSciNet Google Scholar
Freitas PG, Akamine WY, Farias MC (2018) No-reference image quality assessment using orthogonal color planes patterns. IEEE Trans Multimed 20(12):3353–3360
Google Scholar
Ma K, Liu W, Liu T, Wang Z, Tao D (2017) dipiq: Blind image quality assessment by learning-to-rank discriminable image pairs. IEEE Trans Image Process 26(8):3951–3964
MathSciNet Google Scholar
Ye P, Kumar J, Kang L, Doermann D (2012) Unsupervised feature learning framework for no-reference image quality assessment. In: 2012 IEEE Conference on computer vision and pattern recognition, pp 1098–1105. IEEE
Mittal A, Moorthy AK, Bovik AC (2012) No-reference image quality assessment in the spatial domain. IEEE Trans Image Process 21(12):4695–4708
MathSciNet Google Scholar
Bianco S, Celona L, Napoletano P, Schettini R (2018) On the use of deep learning for blind image quality assessment. Signal, Image Vid Process 12(2):355–362
Google Scholar
Varga D, Saupe D, Szirányi T (2018) Deeprn: A content preserving deep architecture for blind image quality assessment. In: 2018 IEEE International Conference on Multimedia and Expo (ICME), pp 1–6. IEEE
Lin K-Y, Wang G (2018) Hallucinated-iqa: No-reference image quality assessment via adversarial learning. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 732–741
Liu X, Van De Weijer J, Bagdanov AD (2017) Rankiqa: Learning from rankings for no-reference image quality assessment. In: Proceedings of the IEEE International conference on computer vision, pp 1040–1049
Chen D, Wang Y, Gao W (2020) No-reference image quality assessment: An attention driven approach. IEEE Trans Image Process 29:6496–6506
Google Scholar

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation (NSF) of China under Grants 51979021 and 51709028, Natural Science Foundation of Liaoning under Grant 2019JH8/10100045, China Scholarship Council (CSC) under Grant 202206570013, Dalian High-level Talent Innovation Support Program Project 2019RQ008 and Fundamental Research Funds for the Central Universities under Grant 3132022218 and 3132019317.

Author information

Siyuan Liu contributed equally to this work.

Authors and Affiliations

The College of Marine Engineering, Dalian Maritime University, Dalian, 116026, China
Pengli Zhu, Siyuan Liu & Yancheng Liu
The College of Design and Engineering, National University of Singapore, Singapore, 119077, Singapore
Pengli Zhu
The Department of Radiology and Biomedical Research Imaging Center (BRIC), University of North Carolina at Chapel Hill, Chapel Hill, 27599, USA
Pew-Thian Yap

Authors

Pengli Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Siyuan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yancheng Liu
View author publications
You can also search for this author in PubMed Google Scholar
Pew-Thian Yap
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Pengli Zhu designed the framework and network architecture, carried out the implementation, performed the experiments and analysed the data. Pengli Zhu and Siyuan Liu wrote the manuscript. Siyuan Liu, Yancheng Liu and Pew-Thian Yap revised the manuscript. Siyuan Liu conceived the study and were in charge of overall direction and planning.

Corresponding author

Correspondence to Siyuan Liu.

Ethics declarations

Conflict of interest/Competing interests

the authors declare no competing interests.

Ethics approval

the manuscript is submitted through all the authors’ consent and promises not to submit to other journals.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhu, P., Liu, S., Liu, Y. et al. METER: Multi-task efficient transformer for no-reference image quality assessment. Appl Intell 53, 29974–29990 (2023). https://doi.org/10.1007/s10489-023-05104-3

Download citation

Accepted: 10 October 2023
Published: 06 November 2023
Issue Date: December 2023
DOI: https://doi.org/10.1007/s10489-023-05104-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions