An efficient parallel fusion structure of distilled and transformer-enhanced modules for lightweight image super-resolution

Wang, Guanqiang; Chen, Mingsong; Lin, Yongcheng; Tan, Xianhua; Zhang, Chizhou; Yao, Wenxin; Gao, Baihui; Zeng, Weidong

doi:10.1007/s00371-023-03243-9

An efficient parallel fusion structure of distilled and transformer-enhanced modules for lightweight image super-resolution

Research
Published: 22 January 2024

(2024)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Guanqiang Wang^1,3,
Mingsong Chen^2,3,
Yongcheng Lin^1,3,
Xianhua Tan^1,3,
Chizhou Zhang²,
Wenxin Yao^1,3,
Baihui Gao^1,3 &
…
Weidong Zeng^1,3

198 Accesses
1 Altmetric
Explore all metrics

Abstract

Although the convolution neural network (CNN) and transformer methods have greatly promoted the development of image super-resolution, these two methods have their disadvantages. Making a trade-off between the two methods and effectively integrating their advantages can restore high-frequency information of images with fewer parameters and higher quality. Hence, in this study, a novel dual parallel fusion structure of distilled feature pyramid and serial CNN and transformer (PFDFCT) model is proposed. In one branch, a lightweight serial structure of CNN and transformer is implemented to guarantee the richness of the global features extracted by transformer. In the other branch, an efficient distillation feature pyramid hybrid attention module is designed to efficiently purify the local features extracted by CNN and maintain integrity through cross-fusion. Such a multi-path parallel fusion strategy can ensure the richness and accuracy of features while avoiding the use of complex and long-range structures. The results show that the PFDFCT can reduce the mis-generated stripes and make the reconstructed image clearer for both easy-to-reconstruct and difficult-to-reconstruct targets compared to other advanced methods. Additionally, PFDFCT achieves a remarkable advance in model size and computational cost. Compared to the state-of-the-art (SOTA) model (i.e., efficient long-range attention network) in 2022, PFDFCT reduces parameters and floating point of operations (FLOPs) by more than 20% and 26% under all three scales, while maintaining a similar advanced reconstruction ability. The FLOPs of PFDFCT are as low as 31.8G, 55.3G, and 122.5G under scales of 2, 3 and 4, which are much lower than most current SOTA methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

Lightweight single image super-resolution based on multi-path progressive feature fusion and attention mechanism

Article 17 May 2023

AFFSRN: Attention-Based Feature Fusion Super-Resolution Network

Lightweight single-image super-resolution via multi-scale feature fusion CNN and multiple attention block

Article 22 July 2023

References

Ullah, Z., Qi, L., Hasan, A., Asim, M.: Improved deep CNN-based two stream super resolution and hybrid deep model-based facial emotion recognition. Eng. Appl. Artif. Intell. 116, 105486 (2022)
Article Google Scholar
Wang, Y., Bashir, S.M.A., Khan, M., Ullah, Q., Wang, R., et al.: Remote sensing image super-resolution and object detection: benchmark and state of the art. Expert Syst. Appl. 197, 116793 (2022)
Article Google Scholar
Niu, T., Chen, B., Lyu, Q., Li, B., Luo, W., Wang, Z., Li, B.: Scoring Bayesian Neural Networks for learning from inconsistent labels in surface defect segmentation. Measurement. 225, 113998 (2024)
Yang, H., Yang, X., Liu, K., Jeon, G., Zhu, C.: SCN: self-calibration network for fast and accurate image super-resolution. Expert Syst. Appl. 226, 120159 (2023)
Article Google Scholar
Keys, R.: Cubic convolution interpolation for digital image processing. IEEE Trans. Acoust. Speech Signal Process. 29(6), 1153–1160 (1981)
Article MathSciNet Google Scholar
Dong, C., Loy, C.C., He, K., Tang, X.: Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. 38(2), 295–307 (2015)
Article Google Scholar
Anwar, S., Barnes, N.: Densely residual laplacian super-resolution. IEEE T. Pattern Anal. 44(3), 1192–1204 (2020)
Article Google Scholar
Park, D., Kim, K., Young Chun, S.: Efficient module based single image super resolution for multiple problems. In: Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 995–9958. IEEE, Salt Lake City (2018)
Liu, Y., Yang, D., Zhang, F., Xie, Q., Zhang, C.: Deep recurrent residual channel attention network for single image super-resolution. Vis. Comput. (2023). https://doi.org/10.1007/s00371-023-03044-0
Article Google Scholar
Wang, J., Zou, Y., Wu, H.: Image super-resolution method based on attention aggregation hierarchy feature. Vis. Comput. (2023). https://doi.org/10.1007/s00371-023-02968-x
Article Google Scholar
Chen, Y., Xia, R., Yang, K., Zou, K.: MFFN: image super-resolution via multi-level features fusion network. Vis. Comput. (2023). https://doi.org/10.1007/s00371-023-02795-0
Article Google Scholar
Agustsson, E., Timofte, R.: Ntire 2017 challenge on single image super-resolution: Dataset and study. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 126–135. IEEE, Honolulu (2017)
Ahn, N., Kang, B., Sohn, K.: Fast, accurate, and lightweight super-resolution with cascading residual network. In: Proceedings of the European Conference on Computer Vision, pp. 252–268. IEEE, Munich (2018)
Li, W., Zhou, K., Qi, L., Jiang, N., Lu, J., Jia, J.L.: Linearly-assembled pixel-adaptive regression network for single image super-resolution and beyond. Adv. Neural Inf. Proc. Syst. 33, 20343–20355 (2020)
Google Scholar
Gao, Q., Zhao, Y., Li, G., Tong, T.: Image super-resolution using knowledge distillation. In: Asian Conference on Computer Vision, pp. 1103–1112. Springer (2018)
Angarano, S., Salvetti, F., Martini, M., Chiaberge, M.: Generative adversarial super-resolution at the edge with knowledge distillation. Eng. Appl. Artif. Intel. 123, 106407 (2023)
Article Google Scholar
Wang, H., Li, J., Wu, H., Hovy, E., Sun, Y.: Pre-trained language models and their applications. Engineering. 25, 51–65 (2023)
Zhou, M., Duan, N., Liu, S., Shum, H.: Progress in neural NLP: modeling, learning, and reasoning. Engineering 6(3), 275–290 (2020)
Article Google Scholar
Zhou, Z., Li, G., Wang, G.: A hybrid of transformer and CNN for efficient single image super-resolution via multi-level distillation. Displays 76, 102352 (2023)
Article Google Scholar
Fang, J., Lin, H., Chen, X., Zeng, K.: A hybrid network of cnn and transformer for lightweight image super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pp. 1103–1112. IEEE, New Orleans (2022)
Lu, Z., Li, J., Liu, H., Huang, C., Zhang, L., Zeng, T.: Transformer for single image super-resolution. In: Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 456–465. IEEE, New Orleans (2022)
Wang, H., Wu, H., He, Z., Huang, L., Church, K.W.: Progress in machine translation. Engineering 18, 143–153 (2021)
Article Google Scholar
Hui, Z., Gao, X., Yang, Y., Wang, X.: Lightweight image super-resolution with information multi-distillation network. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 2024–2032. New York, NY, USA (2019)
Luo, X., Xie, Y., Zhang, Y., Qu, Y., Li, C., Fu, Y.: Latticenet: Towards lightweight image super-resolution with lattice block. In: Computer Vision—ECCV 2020: 16th European Conference, pp. 272–289. Glasgow, UK (2020)
Liu, J., Tang, J., Wu, G.: Residual feature distillation network for lightweight image super-resolution. In: Computer Vision—ECCV 2020 Workshops, pp. 41–55. Glasgow, UK (2020)
Yang, X., Guo, Y., Li, Z., Zhou, D., Li, T.: MRDN: a lightweight multi-stage residual distillation network for image super-resolution. Expert Syst. Appl. 204, 117594 (2022)
Article Google Scholar
Zamir, S. W., Arora, A., Khan, S., Hayat, M., Khan, F. S., Yang, M.: Restormer: efficient transformer for high-resolution image restoration. In: Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5718–5729. IEEE, New Orleans (2022)
Wang, Z., Cun, X., Bao, J., Zhou, W., Liu, J., Li, H.: Uformer: a general U-shaped transformer for image restoration. In: Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, New Orleans, LA, USA, pp. 17662–17672 (2022)
Chen, H., Wang, Y., Guo, T., Xu, C., Deng, Y., Liu, Z., et al.: Pre-trained image processing transformer. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 2165–0608. IEEE Safranbolu (2021)
Liang, J., Cao, J., Sun, G., Zhang, K., Van Gool, L., Timofte, R.: Swinir: Image restoration using swin transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1833–1844. IEEE, Montreal (2021)
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022. IEEE, Montreal (2021)
Zhang, X., Zeng, H., Guo, S., Zhang, L.: Efficient long-range attention network for image super-resolution. In: Proceedings of European Conference on Computer Vision, p. 13677. Springer, Cham (2022)
Xu, B., Yin, H.: A slimmer and deeper approach to deep network structures for low-level vision tasks. Expert Syst. e13092, 1–16 (2022)
Shi, W., Du, H., Mei, W., Ma, Z.: (SARN)spatial-wise attention residual network for image super-resolution. Vis. Comput. 37, 1569 (2021)
Article Google Scholar
Yang, A., Wei, Z., Wang, J., Cao, J., Ji, Z., Pang, Y.: Multi-feature self-attention super-resolution network. Vis. Comput. (2023). https://doi.org/10.1007/s00371-023-03046-y
Article Google Scholar
Wang, G., Chen, M., Lin, Y. C., Tan, X., Zhang, C., Yao, W., Gao, B., Li, K., Li, Z., Zeng, W.: Efficient multi-branch dynamic fusion network for super-resolution of industrial component image. Displays. 82, 102633 (2024)
Kim, J. H., Choi, J. H., Cheon, M., Lee, J. S. Ram.: residual attention module for single image super-resolution. arXiv preprint arXiv:1811.12043 (2018)
Niu, B., Wen, W., Ren, W., Zhang, X., Yang, L., Wang, S., et al.: Single image super-resolution via a holistic attention network. In: Proceedings of European Conference on Computer Vision, pp. 191–207. Springer, Cham (2020)
Lan, R., Sun, L., Liu, Z., Lu, H., Pang, C., Luo, X.: MADNet: a fast and lightweight network for single-image super resolution. IEEE Trans. Cybern. 51(3), 1443–1453 (2020)
Article Google Scholar
Wang, L., Li, D., Zhu, Y., Tian, L., Shan, Y.: Dual super-resolution learning for semantic segmentation. In: Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3773–3782. IEEE, Seattle (2020)
Huang, Z., Li, W., Li, J., Zhou, D.: Dual-path attention network for single image super-resolution. Expert Syst. Appl. 169, 114450 (2021)
Article Google Scholar
Yang, X., Guo, Y., Li, Z., Zhou, D.: Image super-resolution network based on a multi-branch attention mechanism. Signal Image Video Process. 15(7), 1397–1405 (2021)
Article Google Scholar
Huang, S., Liu, X., Tan, T., et al.: TransMRSR: transformer-based self-distilled generative prior for brain MRI super-resolution. Visual Comput. 39(8), 3647–3659 (2023)
Article Google Scholar
Lin, X., Sun, S., Huang, W., Sheng, B., Li, P., Feng, D.: EAPT: efficient attention pyramid transformer for image processing. IEEE Trans. Multimed. 25, 50–61 (2023)
Article Google Scholar
Liu, J., Tang, J., Wu, G.: Residual feature distillation network for lightweight image super-resolution. Computer Vision–ECCV 2020 Workshops, pp. 41–55. Glasgow, UK (2020)
Fang, J., Lin, H., Chen, X., Zeng, K.: A hybrid network of cnn and transformer for lightweight image super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1103–1112. IEEE, New Orleans (2022)
Lin, T., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125. IEEE, Honolulu (2017)
Bevilacqua, M., Roumy, A., Guillemot, C., Alberi-morel, M. L.: Low-complexity single-image super-resolution based on nonnegative neighbor embedding. British Machine Vision Conference, Surrey, UK, pp. 1–10 (2012)
Zeyde, R., Elad, M., Protter, M.: On single image scale-up using sparse-representations. In: Proceedings of International Conference on Curves and Surfaces, pp. 711–730. Springer, Berlin (2010)
Martin, D., Fowlkes, C., Tal, D., Malik, J.: A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In: Proceedings of Eighth IEEE International Conference on Computer Vision, pp. 416–423. IEEE, Vancouver (2001)
Huang, J., Singh, A., Ahuja, N.: Single image super-resolution from transformed self-exemplars. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5197–5206. IEEE, Waknaghat (2015)
Matsui, Y., Ito, K., Aramaki, Y., Fujimoto, A., Ogawa, T., Yamasaki, T., et al.: Sketch-based manga retrieval using manga109 dataset. Multimed. Tools Appl. 76, 21811–21838 (2017)
Article Google Scholar
Kingma, D. P., Ba, J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Fang, J., Lin, H., Chen, X., Zeng, K. A hybrid network of cnn and transformer for lightweight image super-resolution. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 1103–1112. IEEE, New Orleans (2022)
Paszke, A. Gross, S., Massa, F, Lerer, A., Bradbury, J., Chanan.G, et al.: Pytorch: An imperative style, high-performance deep learning library. In: Proceedings of the international conference on neural information processing systems, pp. 8024–8035. Curran Associates, Vancouver (2019)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7132–7141. IEEE, Salt Lake City (2018)

Download references

Acknowledgements

This work was financially supported by the \({\text{National}}\;{\text{ Key }}\;{\text{Research }}\;{\text{and}}\;{\text{ Development }}\;{\text{Program}}\) of China under Grant 2022YFB3706902. Hunan Provincial Natural Science Foundation of China under Grant 2022JJ40614.

Author information

Authors and Affiliations

School of Mechanical and Electrical Engineering, Central South University, Changsha, 410083, China
Guanqiang Wang, Yongcheng Lin, Xianhua Tan, Wenxin Yao, Baihui Gao & Weidong Zeng
Lightweight Alloy Research Institute of Central South University, Changsha, 410083, China
Mingsong Chen & Chizhou Zhang
State Key Laboratory of Precision Manufacturing for Extreme Service Performance, Changsha, 410083, China
Guanqiang Wang, Mingsong Chen, Yongcheng Lin, Xianhua Tan, Wenxin Yao, Baihui Gao & Weidong Zeng

Authors

Guanqiang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Mingsong Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yongcheng Lin
View author publications
You can also search for this author in PubMed Google Scholar
Xianhua Tan
View author publications
You can also search for this author in PubMed Google Scholar
Chizhou Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Wenxin Yao
View author publications
You can also search for this author in PubMed Google Scholar
Baihui Gao
View author publications
You can also search for this author in PubMed Google Scholar
Weidong Zeng
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

GQW designed and performed the experiments, derived the models and analyzed the data. MSC verified the analytical methods and oversaw overall direction and planning. YCL supervised the project and provided financial support. XHT and CZZ verified the experimental results and supervised the findings of this work. Both WXY and BHG contributed to the final version of the manuscript. WDZ modified the grammar. All authors reviewed the manuscript.

Corresponding authors

Correspondence to Mingsong Chen, Yongcheng Lin, Xianhua Tan or Chizhou Zhang.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wang, G., Chen, M., Lin, Y. et al. An efficient parallel fusion structure of distilled and transformer-enhanced modules for lightweight image super-resolution. Vis Comput (2024). https://doi.org/10.1007/s00371-023-03243-9

Download citation

Accepted: 19 December 2023
Published: 22 January 2024
DOI: https://doi.org/10.1007/s00371-023-03243-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An efficient parallel fusion structure of distilled and transformer-enhanced modules for lightweight image super-resolution

Abstract

Access this article

Similar content being viewed by others

Lightweight single image super-resolution based on multi-path progressive feature fusion and attention mechanism

AFFSRN: Attention-Based Feature Fusion Super-Resolution Network

Lightweight single-image super-resolution via multi-scale feature fusion CNN and multiple attention block

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An efficient parallel fusion structure of distilled and transformer-enhanced modules for lightweight image super-resolution

Abstract

Access this article

Similar content being viewed by others

Lightweight single image super-resolution based on multi-path progressive feature fusion and attention mechanism

AFFSRN: Attention-Based Feature Fusion Super-Resolution Network

Lightweight single-image super-resolution via multi-scale feature fusion CNN and multiple attention block

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation