Skip to main content
Log in

An effective masked transformer network for image denoising

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

The rising popularity of employing deep learning networks for image denoising can be observed over the past decade. Typically, their exceptional performance is rooted in their ability to learn the mapping from noisy images to clear ones through extensive training on image datasets. However, variations in noise types and intensity between test and training images significantly impact their performance. Hence, due to its weak generalization capability in practical applications, it becomes imperative to train multiple denoising models for different types of noise and varying degrees of noise interference. To address this challenge, we introduce an effective masked Transformer network by incorporating the random mask module into the network. Specifically, a random mask module is integrated into the Transformer to randomly discard certain features, thereby augmenting the network’s generalization capabilities. Furthermore, the random mask module is also applied during the input processing stage to reduce reliance on the whole original image information. Additionally, the sampling operations are added to the network. Based on this, the image size is reduced to half of the original size through downsampling layer, significantly improving the execution efficiency. Experimental results manifest that the proposed masked Transformer network surpasses state-of-the-art methods like SwinIR and Restormer concerning denoising effectiveness and execution efficiency across both synthetic and real-world noisy images.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data Availability

The data and materials are available from the corresponding author on reasonable request.

References

  1. Chen, W., Huang, Y., Wang, M., Wu, X., Zeng, X.: Tsdn: Two-stage raw denoising in the dark. IEEE Trans. Image Process. 32, 3679–3689 (2023)

    Article  Google Scholar 

  2. Wei, P., Xie, Z., Li, G., Lin, L.: Taylor neural network for real-world image super-resolution. IEEE Trans. Image Process. 32, 1942–1951 (2023)

    Article  Google Scholar 

  3. VS, V., Oza, P., Patel, V.M.: Instance relation graph guided source-free domain adaptive object detection. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3520–3530 (2023)

  4. Ling, Y., Wang, Y., Dai, W., Yu, J., Liang, P., Kong, D.: Mtanet: Multi-task attention network for automatic medical image segmentation and classification. IEEE Trans. Med. Imagin. 43(2), 674–685 (2024)

    Article  Google Scholar 

  5. Zhang, Y., Li, K., Li, K., Sun, G., Kong, Y., Fu, Y.: Accurate and fast image denoising via attention guided scaling. IEEE Trans. Image Process. 30, 6255–6265 (2021)

    Article  Google Scholar 

  6. Zhang, K., Zuo, W., Chen, Y., Meng, D., Zhang, L.: Beyond a Gaussian denoiser: Residual learning of deep CNN for image denoising. IEEE Trans. Image Process. 26(7), 3142–3155 (2017)

    Article  MathSciNet  Google Scholar 

  7. Zhang, K., Zuo, W., Zhang, L.: FFDNet: Toward a fast and flexible solution for CNN-based image denoising. IEEE Trans. Image Process. 27(9), 4608–4622 (2018)

    Article  MathSciNet  Google Scholar 

  8. Zhang, K., Li, Y., Zuo, W., Zhang, L., Gool, L.V., Timofte, R.: Plug-and-play image restoration with deep denoiser prior. IEEE Transactions on Pattern Analysis and Machine Intelligence. in press

  9. Liang, J., Cao, J., Sun, G., Zhang, K., Van Gool, L., Timofte, R.: SwinIR: Image restoration using swin transformer. In: 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), Montreal,Canada, pp. 1833–1844 (2021)

  10. Zamir, S.W., Arora, A., Khan, S., Hayat, M., Khan, F.S., Yang, M.-H.: Restormer: Efficient transformer for high-resolution image restoration. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, Louisiana, USA, pp. 5728–5739 (2022)

  11. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Pereira, F., Burges, C., Bottou, L., Weinberger, K. (eds.) Advances in neural information processing systems, vol. 25. Curran Associates Inc (2012)

    Google Scholar 

  12. Chen, H., Gu, J., Liu, Y., Magid, S.A., Dong, C., Wang, Q., Pfister, H., Zhu, L.: Masked image training for generalizable deep image denoising. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1692–1703 (2023)

  13. Zhang, K., Zuo, W., Gu, S., Zhang, L.: Learning deep CNN denoiser prior for image restoration. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, pp. 2808–2817 (2017)

  14. Dabov, K., Foi, A., Katkovnik, V., Egiazarian, K.: Image denoising by sparse 3-D transform-domain collaborative filtering. IEEE Trans. Image Process. 16(8), 2080–2095 (2007)

    Article  MathSciNet  Google Scholar 

  15. Gu, S., Zhang, L., Zuo, W., Feng, X.: Weighted nuclear norm minimization with application to image denoising. In: Conference on Computer Vision and Pattern Recognition (CVPR 2014), Columbus, OH, USA, pp. 2862–2869 (2014)

  16. Dong, W., Zhang, L., Shi, G., Li, X.: Nonlocally centralized sparse representation for image restoration. IEEE Trans. Image Process. 22(4), 1620–1630 (2013)

    Article  MathSciNet  Google Scholar 

  17. Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A convnet for the 2020s, In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

  18. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L.u., Polosukhin, I.: Attention is all you need. In: Advances in neural information processing systems, Vol. 30 (2017)

  19. Parmar, N.J., Vaswani, A., Uszkoreit, J., Kaiser, L., Shazeer, N., Ku, A., Tran, D.: Image transformer. In: International Conference on Machine Learning (ICML) (2018)

  20. Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European conference on computer vision, pp. 213–229 (2020)

  21. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021)

  22. Zhang, K., Li, Y., Liang, J., Cao, J., Zhang, Y., Tang, H., Fan, D.-P., Timofte, R., Gool, L.V.: Practical blind image denoising via swin-conv-unet and data synthesis. Mach. Intell. Res. 20(6), 822–836 (2023)

    Article  Google Scholar 

  23. Wang, Z., Cun, X., Bao, J., Zhou, W., Liu, J., Li, H.: Uformer: A general u-shaped transformer for image restoration. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 17662–17672 (2022)

  24. Xie, Z., Zhang, Z., Cao, Y., Lin, Y., Bao, J., Yao, Z., Dai, Q., Hu, H.: Simmim: A simple framework for masked image modeling. In: International Conference on Computer Vision and Pattern Recognition (CVPR) (2022)

  25. Bao, H., Dong, L., Piao, S., Wei, F.: Beit: Bert pre-training of image transformers (2022). arXiv:2106.08254

  26. Zhou, J., Wei, C., Wang, H., Shen, W., Xie, C., Yuille, A., Kong, T.: ibot: Image bert pre-training with online tokenizer. In: International Conference on Learning Representations (ICLR)

  27. He, K., Chen, X., Xie, S., Li, Y., Dollár, P., Girshick, R.: Masked autoencoders are scalable vision learners. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 16000–16009 (2022)

  28. Gao, P., Ma, T., Li, H., Dai, J., Qiao, Y.: Convmae: Masked convolution meets masked autoencoders, arXiv preprint arXiv:2205.03892

  29. Jia, X., Liu, S., Feng, X., Zhang, L.: Focnet: A fractional optimal control network for image denoising. In: IEEE Conference on Computer Vision and Pattern Recognition (2019)

  30. Ren, C., He, X., Wang, C., Zhao, Z.: Adaptive consistency prior based deep network for image denoising. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8596–8606 (2021)

  31. Chong, M., Jian, Z., Zhuoyuan, W.: Dynamic attentive graph learning for image restoration. In: IEEE International Conference on Computer Vision (2021)

  32. Shi, W., Caballero, J., Huszár, F., Totz, J., Aitken, A.P., Bishop, R., Rueckert, D., Wang, Z.: Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1874–1883 (2016)

  33. Ma, K., Duanmu, Z., Wu, Q., Wang, Z., Yong, H., Li, H., Zhang, L.: Waterloo exploration database: New challenges for image quality assessment models. IEEE Trans. Image Process. 26(2), 1004–1016 (2016)

    Article  MathSciNet  Google Scholar 

  34. Chen, Y., Pock, T.: Trainable nonlinear reaction diffusion: A flexible framework for fast and effective image restoration. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1256–1272 (2016)

    Article  Google Scholar 

  35. Agustsson, E., Timofte, R.: Ntire 2017 challenge on single image super-resolution: Dataset and study. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp. 126–135 (2017)

  36. Guo, X., Li, Y., Ling, H.: LIME: Low-light image enhancement via illumination map estimation. IEEE Trans. Image Process. 26(2), 982–993 (2017)

    Article  MathSciNet  Google Scholar 

  37. Martin, D., Fowlkes, C., Tal, D., Malik, J.: A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In: Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001, Vol. 2, pp. 416–423 (2001)

  38. Huang, J.-B., Singh, A., Ahuja, N.: Single image super-resolution from transformed self-exemplars. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5197–5206 (2015)

  39. Maity, A., Pattanaik, A., Sagnika, S., Pani, S.: A comparative study on approaches to speckle noise reduction in images. In: 2015 International Conference on Computational Intelligence and Networks, IEEE, pp. 148–155 (2015)

  40. Abdelhamed, A., Lin, S., Brown, M.S.: A high-quality denoising dataset for smartphone cameras. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1692–1700 (2018)

  41. Liang, T., Jin, Y., Li, Y., Wang, T.: Edcnn: Edge enhancement-based densely connected network with compound loss for low-dose ct denoising, 2020 15th IEEE International Conference on Signal Processing (ICSP)

  42. Ren, C., He, X., Wang, C., Zhao, Z.: Adaptive consistency prior based deep network for image denoising. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8596–8606 (2021)

Download references

Acknowledgements

The authors would like to thank the authors of [6,7,8,9,10, 12, 31, 42] for providing their source code.

Funding

This research was funded by Natural Science Foundation of China grant number 62162043.

Author information

Authors and Affiliations

Authors

Contributions

Shaoping Xu contributed to the conception of the study, and Nan Xiao wrote the main manuscript text, and Wuyong Tao and Changfei Zhou contributed significantly to analysis and manuscript preparation, and Minghai Xiong conducted experiments. All authors reviewed the manuscript.

Corresponding author

Correspondence to Shaoping Xu.

Ethics declarations

Conflict of interest

The authors declare that they have no competing interests.

Ethics approval

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xu, S., Xiao, N., Tao, W. et al. An effective masked transformer network for image denoising. SIViP (2024). https://doi.org/10.1007/s11760-024-03210-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11760-024-03210-4

Keywords

Navigation