An effective masked transformer network for image denoising

Xu, Shaoping; Xiao, Nan; Tao, Wuyong; Zhou, Changfei; Xiong, Minghai

doi:10.1007/s11760-024-03210-4

An effective masked transformer network for image denoising

Original Paper
Published: 29 April 2024

(2024)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

Shaoping Xu¹,
Nan Xiao¹,
Wuyong Tao¹,
Changfei Zhou¹ &
…
Minghai Xiong¹

89 Accesses
Explore all metrics

Abstract

The rising popularity of employing deep learning networks for image denoising can be observed over the past decade. Typically, their exceptional performance is rooted in their ability to learn the mapping from noisy images to clear ones through extensive training on image datasets. However, variations in noise types and intensity between test and training images significantly impact their performance. Hence, due to its weak generalization capability in practical applications, it becomes imperative to train multiple denoising models for different types of noise and varying degrees of noise interference. To address this challenge, we introduce an effective masked Transformer network by incorporating the random mask module into the network. Specifically, a random mask module is integrated into the Transformer to randomly discard certain features, thereby augmenting the network’s generalization capabilities. Furthermore, the random mask module is also applied during the input processing stage to reduce reliance on the whole original image information. Additionally, the sampling operations are added to the network. Based on this, the image size is reduced to half of the original size through downsampling layer, significantly improving the execution efficiency. Experimental results manifest that the proposed masked Transformer network surpasses state-of-the-art methods like SwinIR and Restormer concerning denoising effectiveness and execution efficiency across both synthetic and real-world noisy images.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

RA-UNet: an improved network model for image denoising

Article 26 September 2023

A multi-scale generative adversarial network for real-world image denoising

Article 18 July 2021

Practical Blind Image Denoising via Swin-Conv-UNet and Data Synthesis

Article Open access 15 September 2023

Data Availability

The data and materials are available from the corresponding author on reasonable request.

References

Chen, W., Huang, Y., Wang, M., Wu, X., Zeng, X.: Tsdn: Two-stage raw denoising in the dark. IEEE Trans. Image Process. 32, 3679–3689 (2023)
Article Google Scholar
Wei, P., Xie, Z., Li, G., Lin, L.: Taylor neural network for real-world image super-resolution. IEEE Trans. Image Process. 32, 1942–1951 (2023)
Article Google Scholar
VS, V., Oza, P., Patel, V.M.: Instance relation graph guided source-free domain adaptive object detection. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3520–3530 (2023)
Ling, Y., Wang, Y., Dai, W., Yu, J., Liang, P., Kong, D.: Mtanet: Multi-task attention network for automatic medical image segmentation and classification. IEEE Trans. Med. Imagin. 43(2), 674–685 (2024)
Article Google Scholar
Zhang, Y., Li, K., Li, K., Sun, G., Kong, Y., Fu, Y.: Accurate and fast image denoising via attention guided scaling. IEEE Trans. Image Process. 30, 6255–6265 (2021)
Article Google Scholar
Zhang, K., Zuo, W., Chen, Y., Meng, D., Zhang, L.: Beyond a Gaussian denoiser: Residual learning of deep CNN for image denoising. IEEE Trans. Image Process. 26(7), 3142–3155 (2017)
Article MathSciNet Google Scholar
Zhang, K., Zuo, W., Zhang, L.: FFDNet: Toward a fast and flexible solution for CNN-based image denoising. IEEE Trans. Image Process. 27(9), 4608–4622 (2018)
Article MathSciNet Google Scholar
Zhang, K., Li, Y., Zuo, W., Zhang, L., Gool, L.V., Timofte, R.: Plug-and-play image restoration with deep denoiser prior. IEEE Transactions on Pattern Analysis and Machine Intelligence. in press
Liang, J., Cao, J., Sun, G., Zhang, K., Van Gool, L., Timofte, R.: SwinIR: Image restoration using swin transformer. In: 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), Montreal,Canada, pp. 1833–1844 (2021)
Zamir, S.W., Arora, A., Khan, S., Hayat, M., Khan, F.S., Yang, M.-H.: Restormer: Efficient transformer for high-resolution image restoration. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, Louisiana, USA, pp. 5728–5739 (2022)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Pereira, F., Burges, C., Bottou, L., Weinberger, K. (eds.) Advances in neural information processing systems, vol. 25. Curran Associates Inc (2012)
Google Scholar
Chen, H., Gu, J., Liu, Y., Magid, S.A., Dong, C., Wang, Q., Pfister, H., Zhu, L.: Masked image training for generalizable deep image denoising. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1692–1703 (2023)
Zhang, K., Zuo, W., Gu, S., Zhang, L.: Learning deep CNN denoiser prior for image restoration. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, pp. 2808–2817 (2017)
Dabov, K., Foi, A., Katkovnik, V., Egiazarian, K.: Image denoising by sparse 3-D transform-domain collaborative filtering. IEEE Trans. Image Process. 16(8), 2080–2095 (2007)
Article MathSciNet Google Scholar
Gu, S., Zhang, L., Zuo, W., Feng, X.: Weighted nuclear norm minimization with application to image denoising. In: Conference on Computer Vision and Pattern Recognition (CVPR 2014), Columbus, OH, USA, pp. 2862–2869 (2014)
Dong, W., Zhang, L., Shi, G., Li, X.: Nonlocally centralized sparse representation for image restoration. IEEE Trans. Image Process. 22(4), 1620–1630 (2013)
Article MathSciNet Google Scholar
Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A convnet for the 2020s, In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L.u., Polosukhin, I.: Attention is all you need. In: Advances in neural information processing systems, Vol. 30 (2017)
Parmar, N.J., Vaswani, A., Uszkoreit, J., Kaiser, L., Shazeer, N., Ku, A., Tran, D.: Image transformer. In: International Conference on Machine Learning (ICML) (2018)
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European conference on computer vision, pp. 213–229 (2020)
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021)
Zhang, K., Li, Y., Liang, J., Cao, J., Zhang, Y., Tang, H., Fan, D.-P., Timofte, R., Gool, L.V.: Practical blind image denoising via swin-conv-unet and data synthesis. Mach. Intell. Res. 20(6), 822–836 (2023)
Article Google Scholar
Wang, Z., Cun, X., Bao, J., Zhou, W., Liu, J., Li, H.: Uformer: A general u-shaped transformer for image restoration. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 17662–17672 (2022)
Xie, Z., Zhang, Z., Cao, Y., Lin, Y., Bao, J., Yao, Z., Dai, Q., Hu, H.: Simmim: A simple framework for masked image modeling. In: International Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
Bao, H., Dong, L., Piao, S., Wei, F.: Beit: Bert pre-training of image transformers (2022). arXiv:2106.08254
Zhou, J., Wei, C., Wang, H., Shen, W., Xie, C., Yuille, A., Kong, T.: ibot: Image bert pre-training with online tokenizer. In: International Conference on Learning Representations (ICLR)
He, K., Chen, X., Xie, S., Li, Y., Dollár, P., Girshick, R.: Masked autoencoders are scalable vision learners. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 16000–16009 (2022)
Gao, P., Ma, T., Li, H., Dai, J., Qiao, Y.: Convmae: Masked convolution meets masked autoencoders, arXiv preprint arXiv:2205.03892
Jia, X., Liu, S., Feng, X., Zhang, L.: Focnet: A fractional optimal control network for image denoising. In: IEEE Conference on Computer Vision and Pattern Recognition (2019)
Ren, C., He, X., Wang, C., Zhao, Z.: Adaptive consistency prior based deep network for image denoising. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8596–8606 (2021)
Chong, M., Jian, Z., Zhuoyuan, W.: Dynamic attentive graph learning for image restoration. In: IEEE International Conference on Computer Vision (2021)
Shi, W., Caballero, J., Huszár, F., Totz, J., Aitken, A.P., Bishop, R., Rueckert, D., Wang, Z.: Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1874–1883 (2016)
Ma, K., Duanmu, Z., Wu, Q., Wang, Z., Yong, H., Li, H., Zhang, L.: Waterloo exploration database: New challenges for image quality assessment models. IEEE Trans. Image Process. 26(2), 1004–1016 (2016)
Article MathSciNet Google Scholar
Chen, Y., Pock, T.: Trainable nonlinear reaction diffusion: A flexible framework for fast and effective image restoration. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1256–1272 (2016)
Article Google Scholar
Agustsson, E., Timofte, R.: Ntire 2017 challenge on single image super-resolution: Dataset and study. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp. 126–135 (2017)
Guo, X., Li, Y., Ling, H.: LIME: Low-light image enhancement via illumination map estimation. IEEE Trans. Image Process. 26(2), 982–993 (2017)
Article MathSciNet Google Scholar
Martin, D., Fowlkes, C., Tal, D., Malik, J.: A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In: Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001, Vol. 2, pp. 416–423 (2001)
Huang, J.-B., Singh, A., Ahuja, N.: Single image super-resolution from transformed self-exemplars. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5197–5206 (2015)
Maity, A., Pattanaik, A., Sagnika, S., Pani, S.: A comparative study on approaches to speckle noise reduction in images. In: 2015 International Conference on Computational Intelligence and Networks, IEEE, pp. 148–155 (2015)
Abdelhamed, A., Lin, S., Brown, M.S.: A high-quality denoising dataset for smartphone cameras. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1692–1700 (2018)
Liang, T., Jin, Y., Li, Y., Wang, T.: Edcnn: Edge enhancement-based densely connected network with compound loss for low-dose ct denoising, 2020 15th IEEE International Conference on Signal Processing (ICSP)
Ren, C., He, X., Wang, C., Zhao, Z.: Adaptive consistency prior based deep network for image denoising. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8596–8606 (2021)

Download references

Acknowledgements

The authors would like to thank the authors of [6,7,8,9,10, 12, 31, 42] for providing their source code.

Funding

This research was funded by Natural Science Foundation of China grant number 62162043.

Author information

Authors and Affiliations

School of Mathematics and Computer Sciences, Nanchang University, Nanchang, 330031, China
Shaoping Xu, Nan Xiao, Wuyong Tao, Changfei Zhou & Minghai Xiong

Authors

Shaoping Xu
View author publications
You can also search for this author in PubMed Google Scholar
Nan Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Wuyong Tao
View author publications
You can also search for this author in PubMed Google Scholar
Changfei Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Minghai Xiong
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Shaoping Xu contributed to the conception of the study, and Nan Xiao wrote the main manuscript text, and Wuyong Tao and Changfei Zhou contributed significantly to analysis and manuscript preparation, and Minghai Xiong conducted experiments. All authors reviewed the manuscript.

Corresponding author

Correspondence to Shaoping Xu.

Ethics declarations

Conflict of interest

The authors declare that they have no competing interests.

Ethics approval

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Xu, S., Xiao, N., Tao, W. et al. An effective masked transformer network for image denoising. SIViP (2024). https://doi.org/10.1007/s11760-024-03210-4

Download citation

Received: 23 February 2024
Revised: 31 March 2024
Accepted: 07 April 2024
Published: 29 April 2024
DOI: https://doi.org/10.1007/s11760-024-03210-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An effective masked transformer network for image denoising

Abstract

Access this article

Similar content being viewed by others

RA-UNet: an improved network model for image denoising

A multi-scale generative adversarial network for real-world image denoising

Practical Blind Image Denoising via Swin-Conv-UNet and Data Synthesis

Data Availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethics approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An effective masked transformer network for image denoising

Abstract

Access this article

Similar content being viewed by others

RA-UNet: an improved network model for image denoising

A multi-scale generative adversarial network for real-world image denoising

Practical Blind Image Denoising via Swin-Conv-UNet and Data Synthesis

Data Availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethics approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation