Parametric loss-based super-resolution for scene text recognition

Viriyavisuthisakul, Supatta; Sanguansat, Parinya; Racharak, Teeradaj; Nguyen, Minh Le; Kaothanthong, Natsuda; Haruechaiyasak, Choochart; Yamasaki, Toshihiko

doi:10.1007/s00138-023-01416-z

Parametric loss-based super-resolution for scene text recognition

Original Paper
Published: 28 June 2023

Volume 34, article number 61, (2023)
Cite this article

Machine Vision and Applications Aims and scope Submit manuscript

356 Accesses
1 Citation
2 Altmetric
Explore all metrics

Abstract

Scene text image super-resolution (STISR) is regarded as the process of improving the image quality of low-resolution scene text images to improve text recognition accuracy. Recently, a text attention network was introduced to reconstruct high-resolution scene text images; the backbone method involved the convolutional neural network-based and transformer-based architecture. Although it can deal with rotated and curved-shaped texts, it still cannot properly handle images containing improper-shaped texts and blurred text regions. This can lead to incorrect text predictions during the text recognition step. In this study, we propose the application of multiple parametric regularizations and parametric weight parameters to the loss function of the STISR method to improve scene text image quality and text recognition accuracy. We design and extend it into three types of methods: adding multiple parametric regularizations, modifying parametric weight parameters, and combining parametric weights and multiple parametric regularizations. Experiments were conducted and compared with state-of-the-art models. The results showed a significant improvement for every proposed method. Moreover, our methods generated clearer and sharper edges than the baseline with a better-quality image score.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Semantic and Gradient Guided Scene Text Image Super-Resolution

Soft-edge-guided significant coordinate attention network for scene text image super-resolution

Article 08 October 2023

PlugNet: Degradation Aware Scene Text Recognition Supervised by a Pluggable Super-Resolution Unit

References

Silva, S.M., Jung, C.R.: Real-time license plate detection and recognition using deep convolutional neural networks. J. Vis. Commun. Image Represent. 71, 102773 (2020)
Article Google Scholar
Liem, H.D., Minh, N.D., Trung, N.B., Duc, H.T., Hiep, P.H., Dung, D.V., Vu, D.H.: FVI: an end-to-end vietnamese identification card detection and recognition in images. In: NICS 2018—Proceedings of 2018 5th NAFOSTED Conference on Information and Computer Science, pp. 338–340 (2019)
Khare, V., Shivakumara, P., Chan, C.S., Lu, T., Meng, L.K., Woon, H.H., Blumenstein, M.: A novel character segmentation-reconstruction approach for license plate recognition. Expert Syst. Appl. 131, 219–239 (2019)
Article Google Scholar
Chen, H., He, X., Qing, L., Wu, Y., Ren, C., Sheriff, R.E., Zhu, C.: Real-world single image super-resolution: a brief review. Inf. Fusion 79, 124–145 (2022)
Article Google Scholar
Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A., Tejani, A., Totz, J., Wang, Z., Shi, W.: Photo-realistic single image super-resolution using a generative adversarial network. In: 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, pp. 4681–4690 (2017)
Wang, X., Yu, K., Wu, S., Gu, J., Liu, Y., Dong, C., Qiao, Y., Loy, C.C., Tang, X.: ESRGAN: enhanced super-resolution generative adversarial networks. In: European Conference on Computer Vision (ECCV) Workshops, pp. 1–23 (2018)
Zhang, Y., Tian, Y., Kong, Y., Zhong, B., Fu, Y.: Residual dense network for image restoration. IEEE Trans. Pattern Anal. Mach. Intell. 43(7), 2480–2495 (2021)
Article Google Scholar
Wang, W., Xie, E., Liu, X., Wang, W., Liang, D., Shen, C., Bai, X.: Scene text image super-resolution in the wild. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4434–4443 (2019)
Mou, Y., Tan, L., Yang, H., Chen, J., Liu, L., Yan, R., Huang, Y.: PlugNet: degradation aware scene text recognition supervised by a pluggable super-resolution unit. Lect. Notes Comput. Sci. 12360, 158–174 (2020)
Article Google Scholar
Chen, J., Li, B., Xue, X.: Scene text telescope: text-focused scene image super-resolution. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 12021–12030 (2021)
Ma, J., Guo, S., Zhang, L.: Text prior guided scene text image super-resolution. In: Computer Vision and Pattern Recognition, pp. 1–19 (2021)
Ma, J., Liang, Z., Zhang, L.: A text attention network for spatial deformation robust scene text image super-resolution, pp. 1–10 (2022)
Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: Computer Vision—ECCV 2016. ECCV 2016. Lecture Notes in Computer Science, vol. 9906, pp. 694–711 (2016)
Shao, J., Chen, L., Wu, Y.: SRWGANTV: image super-resolution through wasserstein generative adversarial networks with total variational regularization. In: 2021 IEEE 13th International Conference on Computer Research and Development, ICCRD 2021, pp. 21–26 (2021)
Tej, A.R., Sukanta Halder, S., Shandeelya, A.P., Pankajakshan, V.: Enhancing perceptual loss with adversarial feature matching for super-resolution. In: 2020 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2020)
Jo, Y., Yang, S., Kim, S.J.: Investigating loss functions for extreme super-resolution. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 1705–1712 (2020)
Viriyavisuthisakul, S., Kaothanthong, N., Sanguansat, P., Nguyen, M.L., Haruechaiyasak, C.: Parametric regularization loss in super-resolution reconstruction. Mach. Vis. Appl. 33(5), 71 (2022)
Article Google Scholar
Wang, Z., Chen, J., Hoi, S.C.H.: Deep learning for image super-resolution: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 43(10), 3365–3387 (2021)
Article Google Scholar
Dong, C., Loy, C.C., He, K., Tang, X.: Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 38(2), 295–307 (2016)
Article Google Scholar
Lim, B., Son, S., Kim, H., Nah, S., Lee, K.M.: Enhanced deep residual networks for single image super-resolution. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 136–144 (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings—30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, pp. 2261–2269 (2017)
Kim, J., Lee, J.K., Lee, K.M.: Deeply-recursive convolutional network for image super-resolution. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1637–1645 (2016)
Tai, Y., Yang, J., Liu, X.: Image super-resolution via deep recursive residual network. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2790–2798 (2017)
Zhou, S., Zhang, J., Zuo, W., Loy, C.C.: Cross-scale internal graph neural network for image super-resolution. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. NIPS’20, vol. 295, pp. 1–8 (2020)
Zhang, W., Liu, Y., Dong, C., Qiao, Y.: RankSRGAN: generative adversarial networks with ranker for image super-resolution. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3096–3105 (2019)
Wang, X., Xie, L., Dong, C., Shan, Y.: Real-ESRGAN: training real-world blind super-resolution with pure synthetic data. In: 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), pp. 1905–1914 (2021)
Chudasama, V., Upla, K.: RSRGAN: computationally efficient real-world single image super-resolution using generative adversarial network. Mach. Vis. Appl. 32(3), 1–18 (2021)
Google Scholar
Rakotonirina, N.C., Rasoanaivo, A.: ESRGAN+: further improving enhanced super-resolution generative adversarial network. In: 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3637–3641 (2020)
Zhang, Y., Li, K., Li, K., Wang, L., Zhong, B., Fu, Y.: Image super-resolution using very deep residual channel attention networks. In: Lecture Notes in Computer Science (including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11211, pp. 294–310 (2018)
Zhang, Y., Li, K., Li, K., Zhong, B., Fu, Y.: Residual non-local attention networks for image restoration. In: the 7th International Conference on Learning Representations, ICLR 2019, pp. 1–18 (2019)
Liu, D., Wen, B., Fan, Y., Loy, C.C., Huang, T.S.: Non-local recurrent network for image restoration. In: the 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), pp. 1673–1682 (2018)
Niu, B., Wen, W., Ren, W., Zhang, X., Yang, L., Wang, S., Zhang, K., Cao, X., Shen, H.: Single image super-resolution via a holistic attention network. In: European Conference on Computer Vision, vol. 12357, pp. 191–207 (2020)
Mei, Y., Fan, Y., Zhou, Y.: Image super-resolution with non-local sparse attention. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3516–3525 (2021)
Chen, H., Wang, Y., Guo, T., Xu, C., Deng, Y., Liu, Z., Ma, S., Xu, C., Xu, C., Gao, W.: Pre-trained image processing transformer. In: the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12294–12305 (2020)
Liang, J., Cao, J., Sun, G., Zhang, K., Van Gool, L., Timofte, R.: SwinIR: image restoration using swin transformer. In: 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), pp. 1833–1844 (2021)
Li, W., Lu, X., Qian, S., Lu, J., Zhang, X., Jia, J.: On efficient transformer-based image pre-training for low-level vision. In: Computer Vision and Pattern Recognition, pp. 1–29 (2021)
Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2017)
Article Google Scholar
Luo, C., Jin, L., Sun, Z.: MORAN: a multi-object rectified attention network for scene text recognition. Pattern Recogn. 90, 109–118 (2019)
Article Google Scholar
Shi, B., Yang, M., Wang, X., Lyu, P., Yao, C., Bai, X.: ASTER: an attentional scene text recognizer with flexible rectification. IEEE Trans. Pattern Anal. Mach. Intell. 41(9), 2035–2048 (2019)
Article Google Scholar
Dong, C., Loy, C.C., He, K., Tang, X.: Learning a deep convolutional network for image super-resolution. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 8692 LNCS(PART 4), pp. 184–199 (2014)
Lai, W.S., Huang, J.B., Ahuja, N., Yang, M.H.: Deep Laplacian pyramid networks for fast and accurate super-resolution. In: the 30th IEEE Conference on Computer Vision and Pattern Recognition, pp. 5835–5843 (2017)
Viriyavisuthisakul, S., Kaothanthong, N., Sanguansat, P., Racharak, T., Le Nguyen, M., Haruechaiyasak, C., Yamasaki, T.: A regularization-based generative adversarial network for single image super-resolution. In: The Eleventh International Workshop on Image Media Quality and Its Applications, IMQA2022, Campus Plaza Kyoto, Kyoto, Japan, pp. 43–49 (2022)
Cai, J., Zeng, H., Yong, H., Cao, Z., Zhang, L.: Toward real-world single image super-resolution: a new benchmark and a new model. In: the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1–6 (2019)
Zhang, X., Chen, Q., Ng, R., Koltun, V.: Zoom to learn, learn to zoom. In: The IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 3757–3765 (2019)
Niu, B., Wen, W., Ren, W., Zhang, X., Yang, L., Wang, S., Zhang, K., Cao, X., Shen, H.: Single image super-resolution via a holistic attention network. Lect. Notes Comput. Sci. 12357, 191–207 (2020)
Article Google Scholar

Download references

Acknowledgements

This research was supported by the research fund of the Japan Advanced Institute of Science and Technology (JAIST), Japan, and by the research fund of Sirindhorn International Institute of Technology (SIIT), Thammasat University, and the National Electronics and Computer Technology Centre (NECTEC), Thailand.

Author information

Authors and Affiliations

School of Management Technology, Sirindhorn International Institute of Technology, Thammasat University, Khlong Luang, Pathum Thani, 12000, Thailand
Supatta Viriyavisuthisakul & Natsuda Kaothanthong
School of Information Science, Japan Advanced Institute of Information Technology, Nomi city, Ishikawa, 923-1211, Japan
Supatta Viriyavisuthisakul, Teeradaj Racharak & Minh Le Nguyen
Faculty of Engineering and Technology, Panyapiwat Institute of Management, Nonthaburi, 11120, Thailand
Parinya Sanguansat
National Electronics and Computer Technology Center, Khlong Luang, Pathum Thani, 10400, Thailand
Choochart Haruechaiyasak
Department of Information and Communication Engineering, The University of Tokyo, Tokyo, 113-8656, Japan
Toshihiko Yamasaki

Authors

Supatta Viriyavisuthisakul
View author publications
You can also search for this author in PubMed Google Scholar
Parinya Sanguansat
View author publications
You can also search for this author in PubMed Google Scholar
Teeradaj Racharak
View author publications
You can also search for this author in PubMed Google Scholar
Minh Le Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Natsuda Kaothanthong
View author publications
You can also search for this author in PubMed Google Scholar
Choochart Haruechaiyasak
View author publications
You can also search for this author in PubMed Google Scholar
Toshihiko Yamasaki
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Parinya Sanguansat.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Viriyavisuthisakul, S., Sanguansat, P., Racharak, T. et al. Parametric loss-based super-resolution for scene text recognition. Machine Vision and Applications 34, 61 (2023). https://doi.org/10.1007/s00138-023-01416-z

Download citation

Received: 23 January 2023
Revised: 04 May 2023
Accepted: 12 June 2023
Published: 28 June 2023
DOI: https://doi.org/10.1007/s00138-023-01416-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Parametric loss-based super-resolution for scene text recognition

Abstract

Access this article

Similar content being viewed by others

Semantic and Gradient Guided Scene Text Image Super-Resolution

Soft-edge-guided significant coordinate attention network for scene text image super-resolution

PlugNet: Degradation Aware Scene Text Recognition Supervised by a Pluggable Super-Resolution Unit

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Parametric loss-based super-resolution for scene text recognition

Abstract

Access this article

Similar content being viewed by others

Semantic and Gradient Guided Scene Text Image Super-Resolution

Soft-edge-guided significant coordinate attention network for scene text image super-resolution

PlugNet: Degradation Aware Scene Text Recognition Supervised by a Pluggable Super-Resolution Unit

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation