Abstract
The rising popularity of employing deep learning networks for image denoising can be observed over the past decade. Typically, their exceptional performance is rooted in their ability to learn the mapping from noisy images to clear ones through extensive training on image datasets. However, variations in noise types and intensity between test and training images significantly impact their performance. Hence, due to its weak generalization capability in practical applications, it becomes imperative to train multiple denoising models for different types of noise and varying degrees of noise interference. To address this challenge, we introduce an effective masked Transformer network by incorporating the random mask module into the network. Specifically, a random mask module is integrated into the Transformer to randomly discard certain features, thereby augmenting the network’s generalization capabilities. Furthermore, the random mask module is also applied during the input processing stage to reduce reliance on the whole original image information. Additionally, the sampling operations are added to the network. Based on this, the image size is reduced to half of the original size through downsampling layer, significantly improving the execution efficiency. Experimental results manifest that the proposed masked Transformer network surpasses state-of-the-art methods like SwinIR and Restormer concerning denoising effectiveness and execution efficiency across both synthetic and real-world noisy images.
Similar content being viewed by others
Data Availability
The data and materials are available from the corresponding author on reasonable request.
References
Chen, W., Huang, Y., Wang, M., Wu, X., Zeng, X.: Tsdn: Two-stage raw denoising in the dark. IEEE Trans. Image Process. 32, 3679–3689 (2023)
Wei, P., Xie, Z., Li, G., Lin, L.: Taylor neural network for real-world image super-resolution. IEEE Trans. Image Process. 32, 1942–1951 (2023)
VS, V., Oza, P., Patel, V.M.: Instance relation graph guided source-free domain adaptive object detection. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3520–3530 (2023)
Ling, Y., Wang, Y., Dai, W., Yu, J., Liang, P., Kong, D.: Mtanet: Multi-task attention network for automatic medical image segmentation and classification. IEEE Trans. Med. Imagin. 43(2), 674–685 (2024)
Zhang, Y., Li, K., Li, K., Sun, G., Kong, Y., Fu, Y.: Accurate and fast image denoising via attention guided scaling. IEEE Trans. Image Process. 30, 6255–6265 (2021)
Zhang, K., Zuo, W., Chen, Y., Meng, D., Zhang, L.: Beyond a Gaussian denoiser: Residual learning of deep CNN for image denoising. IEEE Trans. Image Process. 26(7), 3142–3155 (2017)
Zhang, K., Zuo, W., Zhang, L.: FFDNet: Toward a fast and flexible solution for CNN-based image denoising. IEEE Trans. Image Process. 27(9), 4608–4622 (2018)
Zhang, K., Li, Y., Zuo, W., Zhang, L., Gool, L.V., Timofte, R.: Plug-and-play image restoration with deep denoiser prior. IEEE Transactions on Pattern Analysis and Machine Intelligence. in press
Liang, J., Cao, J., Sun, G., Zhang, K., Van Gool, L., Timofte, R.: SwinIR: Image restoration using swin transformer. In: 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), Montreal,Canada, pp. 1833–1844 (2021)
Zamir, S.W., Arora, A., Khan, S., Hayat, M., Khan, F.S., Yang, M.-H.: Restormer: Efficient transformer for high-resolution image restoration. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, Louisiana, USA, pp. 5728–5739 (2022)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Pereira, F., Burges, C., Bottou, L., Weinberger, K. (eds.) Advances in neural information processing systems, vol. 25. Curran Associates Inc (2012)
Chen, H., Gu, J., Liu, Y., Magid, S.A., Dong, C., Wang, Q., Pfister, H., Zhu, L.: Masked image training for generalizable deep image denoising. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1692–1703 (2023)
Zhang, K., Zuo, W., Gu, S., Zhang, L.: Learning deep CNN denoiser prior for image restoration. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, pp. 2808–2817 (2017)
Dabov, K., Foi, A., Katkovnik, V., Egiazarian, K.: Image denoising by sparse 3-D transform-domain collaborative filtering. IEEE Trans. Image Process. 16(8), 2080–2095 (2007)
Gu, S., Zhang, L., Zuo, W., Feng, X.: Weighted nuclear norm minimization with application to image denoising. In: Conference on Computer Vision and Pattern Recognition (CVPR 2014), Columbus, OH, USA, pp. 2862–2869 (2014)
Dong, W., Zhang, L., Shi, G., Li, X.: Nonlocally centralized sparse representation for image restoration. IEEE Trans. Image Process. 22(4), 1620–1630 (2013)
Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A convnet for the 2020s, In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L.u., Polosukhin, I.: Attention is all you need. In: Advances in neural information processing systems, Vol. 30 (2017)
Parmar, N.J., Vaswani, A., Uszkoreit, J., Kaiser, L., Shazeer, N., Ku, A., Tran, D.: Image transformer. In: International Conference on Machine Learning (ICML) (2018)
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European conference on computer vision, pp. 213–229 (2020)
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021)
Zhang, K., Li, Y., Liang, J., Cao, J., Zhang, Y., Tang, H., Fan, D.-P., Timofte, R., Gool, L.V.: Practical blind image denoising via swin-conv-unet and data synthesis. Mach. Intell. Res. 20(6), 822–836 (2023)
Wang, Z., Cun, X., Bao, J., Zhou, W., Liu, J., Li, H.: Uformer: A general u-shaped transformer for image restoration. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 17662–17672 (2022)
Xie, Z., Zhang, Z., Cao, Y., Lin, Y., Bao, J., Yao, Z., Dai, Q., Hu, H.: Simmim: A simple framework for masked image modeling. In: International Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
Bao, H., Dong, L., Piao, S., Wei, F.: Beit: Bert pre-training of image transformers (2022). arXiv:2106.08254
Zhou, J., Wei, C., Wang, H., Shen, W., Xie, C., Yuille, A., Kong, T.: ibot: Image bert pre-training with online tokenizer. In: International Conference on Learning Representations (ICLR)
He, K., Chen, X., Xie, S., Li, Y., Dollár, P., Girshick, R.: Masked autoencoders are scalable vision learners. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 16000–16009 (2022)
Gao, P., Ma, T., Li, H., Dai, J., Qiao, Y.: Convmae: Masked convolution meets masked autoencoders, arXiv preprint arXiv:2205.03892
Jia, X., Liu, S., Feng, X., Zhang, L.: Focnet: A fractional optimal control network for image denoising. In: IEEE Conference on Computer Vision and Pattern Recognition (2019)
Ren, C., He, X., Wang, C., Zhao, Z.: Adaptive consistency prior based deep network for image denoising. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8596–8606 (2021)
Chong, M., Jian, Z., Zhuoyuan, W.: Dynamic attentive graph learning for image restoration. In: IEEE International Conference on Computer Vision (2021)
Shi, W., Caballero, J., Huszár, F., Totz, J., Aitken, A.P., Bishop, R., Rueckert, D., Wang, Z.: Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1874–1883 (2016)
Ma, K., Duanmu, Z., Wu, Q., Wang, Z., Yong, H., Li, H., Zhang, L.: Waterloo exploration database: New challenges for image quality assessment models. IEEE Trans. Image Process. 26(2), 1004–1016 (2016)
Chen, Y., Pock, T.: Trainable nonlinear reaction diffusion: A flexible framework for fast and effective image restoration. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1256–1272 (2016)
Agustsson, E., Timofte, R.: Ntire 2017 challenge on single image super-resolution: Dataset and study. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp. 126–135 (2017)
Guo, X., Li, Y., Ling, H.: LIME: Low-light image enhancement via illumination map estimation. IEEE Trans. Image Process. 26(2), 982–993 (2017)
Martin, D., Fowlkes, C., Tal, D., Malik, J.: A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In: Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001, Vol. 2, pp. 416–423 (2001)
Huang, J.-B., Singh, A., Ahuja, N.: Single image super-resolution from transformed self-exemplars. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5197–5206 (2015)
Maity, A., Pattanaik, A., Sagnika, S., Pani, S.: A comparative study on approaches to speckle noise reduction in images. In: 2015 International Conference on Computational Intelligence and Networks, IEEE, pp. 148–155 (2015)
Abdelhamed, A., Lin, S., Brown, M.S.: A high-quality denoising dataset for smartphone cameras. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1692–1700 (2018)
Liang, T., Jin, Y., Li, Y., Wang, T.: Edcnn: Edge enhancement-based densely connected network with compound loss for low-dose ct denoising, 2020 15th IEEE International Conference on Signal Processing (ICSP)
Ren, C., He, X., Wang, C., Zhao, Z.: Adaptive consistency prior based deep network for image denoising. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8596–8606 (2021)
Funding
This research was funded by Natural Science Foundation of China grant number 62162043.
Author information
Authors and Affiliations
Contributions
Shaoping Xu contributed to the conception of the study, and Nan Xiao wrote the main manuscript text, and Wuyong Tao and Changfei Zhou contributed significantly to analysis and manuscript preparation, and Minghai Xiong conducted experiments. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no competing interests.
Ethics approval
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Xu, S., Xiao, N., Tao, W. et al. An effective masked transformer network for image denoising. SIViP (2024). https://doi.org/10.1007/s11760-024-03210-4
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11760-024-03210-4