Skip to main content
Log in

PCCFormer: Parallel coupled convolutional transformer for image super-resolution

  • Research
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

In recent years, with the continuous development of deep learning technology, image super-resolution (SR) has witnessed tremendous progress. Transformer as an emerging deep learning model, its application in the field of image SR is also gradually attracted the attention of researchers. In this paper, we propose parallel coupled convolutional transformer (PCCFormer) architecture for image SR. The proposed design uses transformer and convolutional neural network (CNN) as the structure of the network, thus leveraging the complementary benefits. We use parallel attention transformer to improve feature expression ability of the model. We introduce adaptive convolution residual block to effectively extract the image local information. We introduce a novel concurrent module to aggregate enriched features, including long-range features from transformer and local features captured by CNN. Extensive experiments conducted on benchmark datasets illustrate that PCCFormer surpasses existing image SR methods. For example, our PCCFormer achieves the PSNR of 34.01 dB on Urban100 dataset with scale factor 2.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data availability

The dataset and code used to support this study are available from the corresponding author upon request.

References

  1. Arefin, M.R., Michalski, V., St-Charles, P.-L., Kalaitzis, A., Kim, S., Kahou, S.E., Bengio, Y.: Multi-image super-resolution for remote sensing using deep recurrent networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 206–207 (2020)

  2. Vavilala, V., Meyer, M.: Deep learned super resolution for feature film production. In: Special Interest Group on Computer Graphics and Interactive Techniques Conference Talks, pp. 1–2 (2020)

  3. Fang, H., Deng, W., Zhong, Y., Hu, J.: Generate to adapt: Resolution adaption network for surveillance face recognition. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XV 16, pp. 741–758 (2020). Springer

  4. Zhang, Y., Li, K., Li, K., Fu, Y.: Mr image super-resolution with squeeze and excitation reasoning attention network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13425–13434 (2021)

  5. Gavade, A., Sane, P.: Super resolution image reconstruction by using bicubic interpolation. In: National Conference on Advanced Technologies in Electrical and Electronic Systems, vol. 10 (2014)

  6. Irfan, M.A., Khan, S., Arif, A., Khan, K., Khaliq, A., Memon, Z.A., Ismail, M.: Single image super resolution technique: an extension to true color images. Symmetry 11(4), 464 (2019)

    Article  ADS  Google Scholar 

  7. Zhu, X., Wang, X., Wang, J., Jin, P., Liu, L., Mei, D., et al.: Image super-resolution based on sparse representation via direction and edge dictionaries. Math. Prob. Eng. 2017, (2017)

  8. Maeda, S.: Image super-resolution with deep dictionary. In: European Conference on Computer Vision, pp. 464–480 (2022). Springer

  9. Wang, Z., Liu, D., Yang, J., Han, W., Huang, T.: Deep networks for image super-resolution with sparse prior. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 370–378 (2015)

  10. Liang, J., Cao, J., Sun, G., Zhang, K., Van Gool, L., Timofte, R.: Swinir: Image restoration using swin transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1833–1844 (2021)

  11. Sun, N., Li, H.: Super resolution reconstruction of images based on interpolation and full convolutional neural network and application in medical fields. IEEE Access 7, 186470–186479 (2019)

    Article  Google Scholar 

  12. Shiina, K., Mori, H., Tomita, Y., Lee, H.K., Okabe, Y.: Inverse renormalization group based on image super-resolution using deep convolutional networks. Sci. Rep. 11(1), 9617 (2021)

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  13. Dong, C., Loy, C.C., Tang, X.: Accelerating the super-resolution convolutional neural network. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II 14, pp. 391–407 (2016). Springer

  14. Tai, Y., Yang, J., Liu, X.: Image super-resolution via deep recursive residual network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3147–3155 (2017)

  15. Tai, Y., Yang, J., Liu, X., Xu, C.: Memnet: A persistent memory network for image restoration. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4539–4547 (2017)

  16. Kim, J.-H., Choi, J.-H., Cheon, M., Lee, J.-S.: Ram: Residual attention module for single image super-resolution. arXiv preprint arXiv:1811.120432(1), 2 (2018)

  17. Ying, X., Wang, Y., Wang, L., Sheng, W., An, W., Guo, Y.: A stereo attention module for stereo image super-resolution. IEEE Signal Process. Lett. 27, 496–500 (2020)

    Article  ADS  Google Scholar 

  18. Chen, H., Gu, J., Zhang, Z.: Attention in attention network for image super-resolution. arXiv preprint arXiv:2104.09497 (2021)

  19. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)

  20. Chen, H., Wang, Y., Guo, T., Xu, C., Deng, Y., Liu, Z., Ma, S., Xu, C., Xu, C., Gao, W.: Pre-trained image processing transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12299–12310 (2021)

  21. Cao, J., Liang, J., Zhang, K., Li, Y., Zhang, Y., Wang, W., Gool, L.V.: Reference-based image super-resolution with deformable attention transformer. In: European Conference on Computer Vision, pp. 325–342 (2022). Springer

  22. Chen, X., Wang, X., Zhou, J., Qiao, Y., Dong, C.: Activating more pixels in image super-resolution transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 22367–22377 (2023)

  23. Kim, J., Lee, J.K., Lee, K.M.: Accurate image super-resolution using very deep convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1646–1654 (2016)

  24. Lai, W.-S., Huang, J.-B., Ahuja, N., Yang, M.-H.: Deep laplacian pyramid networks for fast and accurate super-resolution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 624–632 (2017)

  25. Zhang, K., Zuo, W., Zhang, L.: Learning a single convolutional super-resolution network for multiple degradations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3262–3271 (2018)

  26. Lim, B., Son, S., Kim, H., Nah, S., Mu Lee, K.: Enhanced deep residual networks for single image super-resolution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 136–144 (2017)

  27. Zhang, Y., Tian, Y., Kong, Y., Zhong, B., Fu, Y.: Residual dense network for image super-resolution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2472–2481 (2018)

  28. Zhang, Y., Li, K., Li, K., Wang, L., Zhong, B., Fu, Y.: Image super-resolution using very deep residual channel attention networks. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 286–301 (2018)

  29. He, X., Mo, Z., Wang, P., Liu, Y., Yang, M., Cheng, J.: Ode-inspired network design for single image super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1732–1741 (2019)

  30. Zhou, S., Zhang, J., Zuo, W., Loy, C.C.: Cross-scale internal graph neural network for image super-resolution. Adv. Neural Inf. Process. Syst. 33, 3499–3509 (2020)

    Google Scholar 

  31. Mei, Y., Fan, Y., Zhou, Y., Huang, L., Huang, T.S., Shi, H.: Image super-resolution with cross-scale non-local attention and exhaustive self-exemplars mining. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5690–5699 (2020)

  32. Niu, B., Wen, W., Ren, W., Zhang, X., Yang, L., Wang, S., Zhang, K., Cao, X., Shen, H.: Single image super-resolution via a holistic attention network. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XII 16, pp. 191–207 (2020). Springer

  33. Zhang, X., Zeng, H., Guo, S., Zhang, L.: Efficient long-range attention network for image super-resolution. In: European Conference on Computer Vision, pp. 649–667 (2022). Springer

  34. Xia, B., Hang, Y., Tian, Y., Yang, W., Liao, Q., Zhou, J.: Efficient non-local contrastive attention for image super-resolution. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 2759–2767 (2022)

  35. Su, J.-N., Gan, M., Chen, G.-Y., Yin, J.-L., Chen, C.P.: Global learnable attention for single image super-resolution. IEEE Trans. Patt. Anal. Mach. Intell. (2022)

  36. Wang, Z., Chen, J., Hoi, S.C.: Deep learning for image super-resolution: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 43(10), 3365–3387 (2020)

    Article  Google Scholar 

  37. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30, (2017)

  38. Dong, C., Loy, C.C., He, K., Tang, X.: Learning a deep convolutional network for image super-resolution. In: Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part IV 13, pp. 184–199 (2014). Springer

  39. Kim, J., Lee, J.K., Lee, K.M.: Deeply-recursive convolutional network for image super-resolution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1637–1645 (2016)

  40. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  41. Shi, W., Caballero, J., Huszár, F., Totz, J., Aitken, A.P., Bishop, R., Rueckert, D., Wang, Z.: Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1874–1883 (2016)

  42. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

  43. Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)

  44. Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020). Springer

  45. Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021)

  46. Li, W., Lu, X., Qian, S., Lu, J., Zhang, X., Jia, J.: On efficient transformer-based image pre-training for low-level vision. arXiv preprint arXiv:2112.10175 (2021)

  47. Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. Adv. Neural Inf. Process. Syst. 27, (2014)

  48. Feichtenhofer, C., Pinz, A., Wildes, R.P.: Spatiotemporal multiplier networks for video action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4768–4777 (2017)

  49. Chen, Y., Dai, X., Chen, D., Liu, M., Dong, X., Yuan, L., Liu, Z.: Mobile-former: Bridging mobilenet and transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5270–5279 (2022)

  50. Jin, Z., Iqbal, M.Z., Bobkov, D., Zou, W., Li, X., Steinbach, E.: A flexible deep cnn framework for image restoration. IEEE Trans. Multim. 22(4), 1055–1068 (2019)

    Article  Google Scholar 

  51. Isobe, T., Jia, X., Gu, S., Li, S., Wang, S., Tian, Q.: Video super-resolution with recurrent structure-detail network. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XII 16, pp. 645–660 (2020). Springer

  52. Wen, R., Yang, Z., Chen, T., Li, H., Li, K.: Progressive representation recalibration for lightweight super-resolution. Neurocomputing 504, 240–250 (2022)

    Article  Google Scholar 

  53. Zou, W., Ye, T., Zheng, W., Zhang, Y., Chen, L., Wu, Y.: Self-calibrated efficient transformer for lightweight super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 930–939 (2022)

  54. Yoo, J., Kim, T., Lee, S., Kim, S.H., Lee, H., Kim, T.H.: Enriched cnn-transformer feature aggregation networks for super-resolution. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 4956–4965 (2023)

  55. Lu, Z., Li, J., Liu, H., Huang, C., Zhang, L., Zeng, T.: Transformer for single image super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 457–466 (2022)

  56. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)

  57. Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016)

  58. Timofte, R., Gu, S., Wu, J., Van Gool, L.: Ntire 2018 challenge on single image super-resolution: Methods and results. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 852–863 (2018)

  59. Bevilacqua, M., Roumy, A., Guillemot, C., Alberi-Morel, M.L.: Low-complexity single-image super-resolution based on nonnegative neighbor embedding (2012)

  60. Yang, J., Wright, J., Huang, T.S., Ma, Y.: Image super-resolution via sparse representation. IEEE transactions on image processing 19(11), 2861–2873 (2010)

    Article  ADS  MathSciNet  PubMed  Google Scholar 

  61. Martin, D., Fowlkes, C., Tal, D., Malik, J.: A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In: Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001, vol. 2, pp. 416–423 (2001). IEEE

  62. Huang, J.-B., Singh, A., Ahuja, N.: Single image super-resolution from transformed self-exemplars. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5197–5206 (2015)

  63. Dai, T., Cai, J., Zhang, Y., Xia, S.-T., Zhang, L.: Second-order attention network for single image super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11065–11074 (2019)

  64. Mei, Y., Fan, Y., Zhou, Y.: Image super-resolution with non-local sparse attention. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3517–3526 (2021)

Download references

Funding

This research was supported by the International Partnership Program of Chinese Academy of Sciences (no.184131KYSB20200033).

Author information

Authors and Affiliations

Authors

Contributions

All authors have made contributions to this work. Individual contributions are as follows: B.H. was involved in conceptualization, methodology, software, and writing—original draft preparation; L.G. was involved in writing—review and editing, and funding acquisition.

Corresponding author

Correspondence to Bowen Hou.

Ethics declarations

Conflict of interest

The authors declare no conflicts of interest.

Ethical approval

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hou, B., Li, G. PCCFormer: Parallel coupled convolutional transformer for image super-resolution. Vis Comput (2024). https://doi.org/10.1007/s00371-023-03257-3

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00371-023-03257-3

Keywords

Navigation