Improvement of Text Image Super-Resolution Benefiting Multi-task Learning

Honda, Kosuke; Fujita, Hamido; Kurematsu, Masaki

doi:10.1007/978-3-031-08530-7_23

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13343))

Included in the following conference series:

International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems

1666 Accesses

Abstract

Text image super-resolution is a pre-processing of scene text recognition, which aims to improve the visual quality of text from low-resolution images. However, existing super-resolution (SR) models designed for general images have difficulty in recovering text from low-resolution images in real scenes. There are several reasons for this, including the fact that the models do not consider text-specific properties and that the background is not important for text images SR. In this paper, we propose a multi-task learning model for reconstruction and SR termed TRSRT using a transformer for text images. Compared to the super-resolution model, the reconstruction model is better at denoising and tends to have structural information about the text. Focusing on this point, the proposed method utilizes these properties of the reconstructed model to the SR model through the transformer. In addition, we attempt to acquire a text-specific model by training with three loss functions including feature-driven loss using a text recognizer. Experimental results on TextZoom show that the proposed method achieves performance comparable to state-of-the-art methods and prove the advantages of multi-task learning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Qadri, M.T., Asif, M.: Automatic number plate recognition system for vehicle identification using optical character recognition. In: 2009 International Conference on Education Technology and Computer, pp. 335–338 (2009). https://doi.org/10.1109/ICETC.2009.54
Tian, Z., Huang, W., He, T., He, P., Qiao, Yu.: Detecting text in natural image with connectionist text proposal network. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 56–72. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_4
Chapter Google Scholar
Wang, Z., et al.: CAMP: cross-modal adaptive message passing for text-image retrieval. In: ICCV, pp. 5763–5772 (2019). https://doi.org/10.1109/ICCV.2019.005
Dong, S., Zhu, X., Deng, Y., Loy, C.C., Qiao, Y.: Boosting optical character recognition: a super-resolution approach, arXiv preprint arXiv:1506.02211 (2015)
Tran, H.T.M., Ho-Phuoc, T.: Deep laplacian pyramid network for text images super-resolution. In: RIVF, pp. 1–6 (2019)
Google Scholar
Dong, C., Loy, C.C., He, K., Tang, X.: Learning a deep convolutional network for image super-resolution. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8692, pp. 184–199. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10593-2_13
Chapter Google Scholar
Wang, W., et al.: Scene text image super-resolution in the wild. In: ECCV (2020)
Google Scholar
Ledig, C., et al.: Photorealistic single image super-resolution using a generative adversarial network. In: CVPR, pp. 4681–4690 (2017)
Google Scholar
Graves, A., Schmidhuber, J.: Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw. 18(5–6), 602–610 (2005)
Article Google Scholar
Collobert, R., Weston, J.: A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th International Conference on Machine Learning, pp. 160–167 (2008)
Google Scholar
Liu, S., Johns, E., Davison, A.J.: End-to-end multi-task learning with attention. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1871–1880 (2019)
Google Scholar
Kim, S., Hori, T., Watanabe, S.: Joint CTC-attention based end-to-end speech recognition using multi-task learning. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4835–4839. IEEE (2017)
Google Scholar
Rad, M.S., et al.: Benefiting from multitask learning to improve single image super-resolution. Neurocomputing 398, 304–313 (2020). https://doi.org/10.1016/j.neucom.2019.07.107
Article Google Scholar
Urazoe, K., Kuroki, N., Kato, Y., Ohtani, S., Hirose, T., Numa, M.: Multi-category image super-resolution with convolutional neural network and multi-task learning. IEICE Trans. Inf. Syst. E104.D(1), 183–193: Released January 01, 2021, Online ISSN 1745–1361. Print ISSN 0916–8532 (2021). https://doi.org/10.1587/transinf.2020EDP7054
Feng, C.-M., Yan, Y., Fu, H., Chen, L., Xu, Y.: Task transformer network for joint MRI reconstruction and super-resolution. In: de Bruijne, M., Cattin, P.C., Cotin, S., Padoy, N., Speidel, S., Zheng, Y., Essert, C. (eds.) MICCAI 2021. LNCS, vol. 12906, pp. 307–317. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87231-1_30
Chapter Google Scholar
Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: 2011 International Conference on Computer Vision, pp. 1457–1464. IEEE (2011)
Google Scholar
He, P., Huang, W., Qiao, Y., Loy, C.C., Tang, X.: Reading scene text in deep convolutional sequences. arXiv preprint arXiv:1506.04395 (2015)
Jaderberg, M., Vedaldi, A., Zisserman, A.: Deep features for text spotting. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8692, pp. 512–528. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10593-2_34
Chapter Google Scholar
Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2016)
Article Google Scholar
Graves, A., Fernandez, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006)
Google Scholar
Shi, B., Yang, M., Wang, X., Lyu, P., Yao, C., Bai, X.: Aster: an attentional scene text recognizer with flexible rectification. IEEE Trans. Pattern Anal. Mach. Intell. 41(9), 2035–2048 (2018)
Article Google Scholar
Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. In: NeurIPS, pp. 2017–2025 (2015)
Google Scholar
Li, H., Wang, P., Shen, C., Zhang, G.: Show, attend and read: a simple and strong baseline for irregular text recognition. In: AAAI, vol. 33, pp. 8610–8617 (2019)
Google Scholar
Yang, L., Wang, P., Li, H., Li, Z., Zhang, Y.: A holistic representation guided attention network for scene text recognition. Neurocomputing 414, 67–75 (2020)
Article Google Scholar
Chen, J., Li, B., Xue, X.: Scene text telescope: text-focused scene image super-resolution. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021, pp. 12021–12030 (2021). https://doi.org/10.1109/CVPR46437.2021.01185
Fang, C., Zhu, Y., Liao, L., Ling, X.: TSRGAN: real-world text image super-resolution based on adversarial learning and triplet attention. Neurocomputing 455, 88–96 (2021). https://doi.org/10.1016/j.neucom.2021.05.060. ISSN 0925–2312
Lim, B., Son, S., Kim, H., Nah, S., Lee, K.M.: Enhanced deep residual networks for single image super-resolution. In: CVPR (2017)
Google Scholar
Sun, J., Sun, J., Xu, Z., Shum, H.: Gradient profile prior and its applications in image super-resolution and enhancement. In: TIP (2011)
Google Scholar
Wang, B., Lu, T., Zhang, Y.: Feature-driven super-resolution for object detection. In: 2020 5th International Conference on Control, Robotics and Cybernetics (CRC), pp. 211–215 (2020). https://doi.org/10.1109/CRC51253.2020.9253468
Luo, C., Jin, L., Sun, Z., Moran: a multi-object rectified attention network for scene text recognition. Pattern Recogn., 109–118 (2019)
Google Scholar
Lai, W., Huang, J., Ahuja, N., Yang, M.: Deep laplacian pyramid networks for fast and accurate super-resolution. In: CVPR (2017)
Google Scholar

Download references

Acknowledgements

This study is supported by JSPS/JAPAN KAKENHI (Grants-in-Aid for Scientific Research) #JP20K11955.

Author information

Authors and Affiliations

Faculty of Software and Information Science, Iwate Prefectural University, Iwate, Japan
Kosuke Honda & Masaki Kurematsu
Regional Research Center, Iwate Prefectural University, Iwate, Japan
Hamido Fujita

Authors

Kosuke Honda
View author publications
You can also search for this author in PubMed Google Scholar
Hamido Fujita
View author publications
You can also search for this author in PubMed Google Scholar
Masaki Kurematsu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kosuke Honda .

Editor information

Editors and Affiliations

i-SOMET, Inc., Morioka-shi, Iwate, Japan
Hamido Fujita
College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, Guangdong, China
Philippe Fournier-Viger
Texas State University, San Marcos, TX, USA
Moonis Ali
Shanghai University of Finance and Economics, Shanghai, China
Yinglin Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Honda, K., Fujita, H., Kurematsu, M. (2022). Improvement of Text Image Super-Resolution Benefiting Multi-task Learning. In: Fujita, H., Fournier-Viger, P., Ali, M., Wang, Y. (eds) Advances and Trends in Artificial Intelligence. Theory and Practices in Artificial Intelligence. IEA/AIE 2022. Lecture Notes in Computer Science(), vol 13343. Springer, Cham. https://doi.org/10.1007/978-3-031-08530-7_23

Download citation

DOI: https://doi.org/10.1007/978-3-031-08530-7_23
Published: 30 August 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-08529-1
Online ISBN: 978-3-031-08530-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Improvement of Text Image Super-Resolution Benefiting Multi-task Learning