Scene text image super-resolution using multi-scale convolutional neural network with skip connections

Walha, Rim; Aouini, Amal

doi:10.1007/s10489-024-05471-5

Scene text image super-resolution using multi-scale convolutional neural network with skip connections

Published: 03 May 2024

Volume 54, pages 5931–5943, (2024)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

124 Accesses
Explore all metrics

Abstract

Scene text image super-resolution is an interesting and challenging task which aims to enhance the spatial resolution of low-resolution text images in the wild, and consequently improve the image visual quality and boost the performance of real-world text-related applications. However, most of previous super-resolution methods ignore the important specific characteristics of text patterns and regard scene text images as natural scene images. In this paper, a novel deep convolutional-based architecture is specifically proposed for the super-resolution of scene text images. In order to recover fine details of low-resolution characters, the proposed architecture has been carefully designed and its main specificities are three-fold: (1) the introduction of multi-scale features extraction by incorporating parallel convolutional layers in order to preserve both local and global high-frequency components that encapsulate the intricate details of characters’ patterns. This strategy allows the proposed method to capture fine nuances in the visual representation of characters, enhancing the richness of extracted features. (2) the integration of skip connections through convolutional layers. This strategic design choice facilitates the seamless flow of information from lower to higher layers of the deep architecture, allowing sequential information about text patterns to be preserved more effectively. (3) the proposition of a specialized network in network-based reconstruction within our architecture to recover high-resolution text details from the collected features. Such a network paradigm minimizes information loss and enhances the proposed method’s ability to discern and reconstruct fine textual details. These design elements collectively empower our super-resolution method to excel in analyzing fine text patterns for effective high-resolution reconstruction, providing a comprehensive solution for the challenging task of recovering fine details in low-resolution characters. Quantitative and qualitative evaluations on four well-known benchmarks, including the SVT, IIIT5k, IC03 and ICDAR2015-TextSR datasets, prove the efficiency of our proposal whose performance surpasses those of different state-of-the-art super-resolution methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CBAM: Convolutional Block Attention Module

A review of convolutional neural networks in computer vision

Article Open access 23 March 2024

MFFN: image super-resolution via multi-level features fusion network

Article 15 February 2023

Data availability and access

Public datasets are employed in this research, including the ICDAR2003 (IC03) dataset [39], the ICDAR2015-TextSR dataset [40], the IIIT 5K-words (IIIT5K) dataset [41], and the Street View Text (SVT) dataset [42].

References

Cheng Z, Bai F, Xu Y, Zheng G, Pu S, Zhou S (2017) Focusing attention: Towards accurate text recognition in natural images. In: IEEE International Conference on Computer Vision (ICCV), Venice, Italy. pp 5086–5094. https://doi.org/10.1109/ICCV.2017.543
Mallek A, Drira F, Walha R, Alimi AM, Lebourgeois F (2017) Deep learning with sparse prior - application to text detection in the wild. In: 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, VISIGRAPP, Volume 5: VISAPP, Porto, Portugal, 2017. pp 243–250
Liu Z, Li Y, Ren F, Goh WL, Yu H (2018) Squeezedtext: A real-time scene text recognition by binary convolutional encoder-decoder network. International Conference on Artificial Intelligence (AAAI). Louisiana, USA, pp 7194–7201
Google Scholar
Luo C, Jin L, Sun Z (2019) MORAN: A multi-object rectified attention network for scene text recognition. Pattern Recognit. 90:109–118. https://doi.org/10.1016/j.patcog.2019.01.020
Article Google Scholar
Harizi R, Walha R, Drira F (2022) Deep-learning based end-to-end system for text reading in the wild. Multim Tools Appl 81(17):24691–24719. https://doi.org/10.1007/s11042-022-11998-x
Article Google Scholar
Harizi R, Walha R, Drira F, Zaied M (2022) Convolutional neural network with joint stepwise character/word modeling based system for scene text recognition. Multim Tools Appl 81(3):3091–3106. https://doi.org/10.1007/s11042-021-10663-z
Article Google Scholar
Walha R, Drira F, Lebourgeois F, Garcia C, Alimi AM (2018) Handling noise in textual image resolution enhancement using online and offline learned dictionaries. Int J Document Anal Recognit 21(1–2):137–157. https://doi.org/10.1007/s10032-017-0294-6
Article Google Scholar
Walha R, Drira F, Lebourgeois F, Garcia C, Alimi AM (2015) Resolution enhancement of textual images via multiple coupled dictionaries and adaptive sparse representation selection. Int J Document Anal Recognit 18(1):87–107. https://doi.org/10.1007/s10032-014-0235-6
Article Google Scholar
Walha R, Drira F, Lebourgeois F, Garcia C, Alimi AM (2015) Joint denoising and magnification of noisy low-resolution textual images. In: 13th International Conference on Document Analysis and Recognition, ICDAR, Nancy, France, 2015. pp 871–875. https://doi.org/10.1109/ICDAR.2015.7333886
Chen J, Li B, Xue X (2021) Scene text telescope: Text-focused scene image super-resolution. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp 12026–12035. https://doi.org/10.1109/CVPR46437.2021.01185
Xue M, Huang Z, Liu R, Lu T (2021) A novel attention enhanced residual-in-residual dense network for text image super-resolution. In: IEEE International Conference on Multimedia and Expo (ICME), Shenzhen, China. pp 1–6. https://doi.org/10.1109/ICME51207.2021.9428128
Jain D, Prabhu AD, Ramena G, Goyal M, Mohanty DP, Moharana S, Purre N (2020) On-device text image super resolution. In: 25th International Conference on Pattern Recognition (ICPR), Milan, Italy. pp 5775–5781. https://doi.org/10.1109/ICPR48806.2021.9412222
Geng C, Chen L, Zhang X, Gao Z (2020) Adversarial text image super-resolution using sinkhorn distance. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain. pp 2663–2667. https://doi.org/10.1109/ICASSP40776.2020.9054360
Wang W, Xie E, Liu X, Wang W, Liang D, Shen C, Bai X (2020) Scene text image super-resolution in the wild. In: 16th European Conference on Computer Vision (ECCV), Glasgow, UK, Proceedings, Part X, vol. 12355. pp 650–666. https://doi.org/10.1007/978-3-030-58607-2_38
Mou Y, Tan L, Yang H, Chen J, Liu L, Yan R, Huang Y (2020) Plugnet: Degradation aware scene text recognition supervised by a pluggable super-resolution unit. In: 16th European Conference on Computer Vision (ECCV), Glasgow, UK, Proceedings, Part XV, vol. 12360. pp 158–174. https://doi.org/10.1007/978-3-030-58555-6_10
Wang Y, Su F, Qian Y (2019) Text-attentional conditional generative adversarial network for super-resolution of text images. In: IEEE International Conference on Multimedia and Expo (ICME), Shanghai, China, 2019. pp 1024–1029. https://doi.org/10.1109/ICME.2019.00180
Liu W, Chen C, Wong KK, Su Z, Han J (2016) Star-net: A spatial attention residue network for scene text recognition. In: Wilson RC, Hancock ER, Smith WAP (eds) British Machine Vision Conference (BMVC). York, UK
Google Scholar
Dong C, Zhu X, Deng Y, Loy CC, Qiao Y (2015) Boosting optical character recognition: A super-resolution approach. CoRR. arXiv:1506.02211
Ma J, Liang Z, Zhang L (2022) A text attention network for spatial deformation robust scene text image super-resolution. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA. pp 5901–5910. https://doi.org/10.1109/CVPR52688.2022.00582
Dong C, Loy CC, He K, Tang X (2016) Image super-resolution using deep convolutional networks. IEEE Trans Pattern Anal Mach Intell 38(2):295–307. https://doi.org/10.1109/TPAMI.2015.2439281
Article Google Scholar
Yamanaka J, Kuwashima S, Kurita T (2017) Fast and accurate image super resolution by deep CNN with skip connection and network in network. In: International Conference on Neural Information Processing (ICONIP), Guangzhou, China, Proceedings, Part II. pp 217–225. https://doi.org/10.1007/978-3-319-70096-0_23
Zhang Y, Tian Y, Kong Y, Zhong B, Fu Y (2018) Residual dense network for image super-resolution. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), UT, USA, 2018. pp 2472–2481. https://doi.org/10.1109/CVPR.2018.00262
Walha R, Drira F, Lebourgeois F, Alimi AM, Garcia C (2016) Resolution enhancement of textual images: a survey of single image-based methods. IET Image Process 10(4):325–337. https://doi.org/10.1049/iet-ipr.2015.0334
Article Google Scholar
Thouin PD, Chang C (2000) A method for restoration of low-resolution document images. Int J Document Anal Recognit 2(4):200–210. https://doi.org/10.1007/PL00021526
Article Google Scholar
Luong HQ, Philips W (2008) Robust reconstruction of low-resolution document images by exploiting repetitive character behaviour. Int J Document Anal Recognit 11(1):39–51. https://doi.org/10.1007/s10032-008-0068-2
Article Google Scholar
Li X, Orchard MT (2001) New edge-directed interpolation. IEEE Trans Image Process 10(10):1521–1527. https://doi.org/10.1109/83.951537
Article Google Scholar
Walha R, Drira F, Lebourgeois F, Garcia C, Alimi AM (2013) Single textual image super-resolution using multiple learned dictionaries based sparse coding. In: 17th International Conference on Image Analysis and Processing, (ICIAP), Naples, Italy, Proceedings, Part II, vol. 8157. pp 439–448. https://doi.org/10.1007/978-3-642-41184-7_45
Walha R, Drira F, Alimi AM, Lebourgeois F, Garcia C (2014) A sparse coding based approach for the resolution enhancement and restoration of printed and handwritten textual images. In: 14th International Conference on Frontiers in Handwriting Recognition (ICFHR), Crete, Greece. pp 696–701. https://doi.org/10.1109/ICFHR.2014.122
Lim B, Son S, Kim H, Nah S, Lee KM (2017) Enhanced deep residual networks for single image super-resolution. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPR), Honolulu, HI, USA. pp 1132–1140. https://doi.org/10.1109/CVPRW.2017.151
Ledig C, Theis L, Huszar F, Caballero J, Cunningham A, Acosta A, Aitken AP, Tejani A, Totz J, Wang Z, Shi W (2017) Photo-realistic single image super-resolution using a generative adversarial network. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA. pp 105–114. https://doi.org/10.1109/CVPR.2017.19
Lai W, Huang J, Ahuja N, Yang M (2017) Deep laplacian pyramid networks for fast and accurate super-resolution. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA. pp 5835–5843. https://doi.org/10.1109/CVPR.2017.618
Zhang L, Jiang W, Xiang W (2022) Dictionary learning based on structural self-similarity and convolution neural network. J Ambient Intell Humaniz Comput 13(3):1463–1470. https://doi.org/10.1007/s12652-020-02739-9
Article MathSciNet Google Scholar
Li P, Li Z, Pang X, Wang H, Lin W, Wu W (2022) Multi-scale residual denoising GAN model for producing super-resolution CTA images. J Ambient Intell Humaniz Comput 13(3):1515–1524. https://doi.org/10.1007/s12652-021-03009-y
Article Google Scholar
Dong C, Loy CC, He K, Tang X (2014) Learning a deep convolutional network for image super-resolution. In: European Conference on Computer Vision (ECCV), Zurich, Switzerland, Proceedings, Part IV, vol. 8692. pp 184–199. https://doi.org/10.1007/978-3-319-10593-2_13
Kim J, Lee JK, Lee KM (2016) Accurate image super-resolution using very deep convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA. pp 1646–1654. https://doi.org/10.1109/CVPR.2016.182
Wang X, Yu K, Wu S, Gu J, Liu Y, Dong C, Qiao Y, Loy CC (2018) ESRGAN: enhanced super-resolution generative adversarial networks. In: Computer Vision - ECCV 2018 Workshops - Munich, Germany, Proceedings, Part V, vol. 11133. pp 63–79. https://doi.org/10.1007/978-3-030-11021-5_5
Ma J, Guo S, Zhang L (2023) Text prior guided scene text image super-resolution. IEEE Trans Image Process 32:1341–1353. https://doi.org/10.1109/TIP.2023.3237002
Article Google Scholar
Lin M, Chen Q, Yan S (2014) Network in network. In: International Conference on Learning Representations (ICLR), Banff, AB, Canada
Jaderberg M, Vedaldi A, Zisserman A (2014) Deep features for text spotting. In: Computer Vision - ECCV 2014 - 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part IV. pp 512–528. https://doi.org/10.1007/978-3-319-10593-2_34
Peyrard C, Baccouche M, Mamalet F, Garcia C (2015) ICDAR2015 competition on text image super-resolution. In: 13th International Conference on Document Analysis and Recognition (ICDAR), Nancy, France, 2015. pp 1201–1205. https://doi.org/10.1109/ICDAR.2015.7333951
Mishra A, Alahari K, Jawahar CV (2012) Scene text recognition using higher order language priors. In: British Machine Vision Conference, BMVC 2012, Surrey, UK, September 3-7, 2012. pp 1–11. https://doi.org/10.5244/C.26.127
Wang K, Belongie SJ (2010) Word spotting in the wild. In: Computer Vision - ECCV 2010, 11th European Conference on Computer Vision, Heraklion, Crete, Greece, September 5-11, 2010, Proceedings, Part I. pp 591–604. https://doi.org/10.1007/978-3-642-15549-9_43
Walha R, Drira F, Lebourgeois F, Alimi AM (2012) Super-resolution of single text image by sparse representation. In: Proceeding of the Workshop on Document Analysis and Recognition, DAR@ICVGIP 2012, Mumbai, India. pp 22–29. https://doi.org/10.1145/2432553.2432558
Walha R, Drira F, Lebourgeois F CharImageDB: Character Image Dataset. IEEE Dataport. https://doi.org/10.21227/xdgk-ad26
Zeyde R, Elad M, Protter M (2010) On single image scale-up using sparse representations. In: 7th International Conference on Curves and Surfaces, Avignon, France, 2010. pp 711–730. https://doi.org/10.1007/978-3-642-27413-8_47
Timofte R, Smet VD, Gool LV (2014) A+: adjusted anchored neighborhood regression for fast super-resolution. In: 12th Asian Conference on Computer Vision (ACCV), Singapore, 2014, Part IV. pp 111–126. https://doi.org/10.1007/978-3-319-16817-3_8
Wang W, Xie E, Sun P, Wang W, Tian L, Shen C, Luo P (2019) Textsr: Content-aware text super-resolution guided by recognition. CoRR. arXiv:1909.07113

Download references

Author information

Authors and Affiliations

Research Groups in Intelligent Machines, REGIM-Lab, National Engineering School of Sfax, University of Sfax, BP 1173, 3038, Sfax, Tunisia
Rim Walha
Higher Institute of Computer Science and Multimedia of Sfax, University of Sfax, Technological pole of Sfax, BP 242, 3021, Sfax, Tunisia
Rim Walha & Amal Aouini

Authors

Rim Walha
View author publications
You can also search for this author in PubMed Google Scholar
Amal Aouini
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

R. Walha, and A. Aouini contributed to the design and implementation of this research, to the analysis of the results and to the writing of the manuscript.

Corresponding author

Correspondence to Rim Walha.

Ethics declarations

Competing interests

The authors declare that they have no conflict of interest.

Ethics approval and consent to participate

This article does not contain any studies with human participants or animals.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Walha, R., Aouini, A. Scene text image super-resolution using multi-scale convolutional neural network with skip connections. Appl Intell 54, 5931–5943 (2024). https://doi.org/10.1007/s10489-024-05471-5

Download citation

Accepted: 18 April 2024
Published: 03 May 2024
Issue Date: April 2024
DOI: https://doi.org/10.1007/s10489-024-05471-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Scene text image super-resolution using multi-scale convolutional neural network with skip connections

Abstract

Access this article

Similar content being viewed by others

CBAM: Convolutional Block Attention Module

A review of convolutional neural networks in computer vision

MFFN: image super-resolution via multi-level features fusion network

Data availability and access

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Ethics approval and consent to participate

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Scene text image super-resolution using multi-scale convolutional neural network with skip connections

Abstract

Access this article

Similar content being viewed by others

CBAM: Convolutional Block Attention Module

A review of convolutional neural networks in computer vision

MFFN: image super-resolution via multi-level features fusion network

Data availability and access

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Ethics approval and consent to participate

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation