Skip to main content

Text Detection and Recognition Using Augmented Reality and Deep Learning

Part of the Lecture Notes in Networks and Systems book series (LNNS,volume 449)

Abstract

In recent years, the detection and recognition of text in natural images has become a very attractive and important subject for researchers. Many applications were developed for text detection and recognition and the majority of them are based on deep learning (DL) and augmented reality (AR). In this article, we propose a perfect solution based on both deep learning and augmented reality in order to make the text reading process more efficient, clear and safer. The system purpose is to help visually impaired people read a text from natural images. First of all, the user has to hover his smartphone’s camera over the image of the text present in his environment. Then, the system executes the detection and recognition module using the DL model. Finally, the system displays the associated graphical data augmented on the identified text on the screen of the smartphone using the AR method. AR method is used to improve the visualization of the detected and recognized word so that the user can read that text more efficiently. This mobile application has the highest-level visual features to improve the reading process of the detected and recognized text. To validate the system performance, the application is tested on a group of people who answer a questionnaire that reflects their experience with our proposed approach. In addition, user study test is performed to test user friendliness and satisfaction.

Keywords

  • Text detection
  • Text recognition
  • Natural image
  • Augmented Reality
  • Deep Learning

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-030-99584-3_2
  • Chapter length: 11 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   219.00
Price excludes VAT (USA)
  • ISBN: 978-3-030-99584-3
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   279.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.
Fig. 3.

References

  1. Ali, A., Pickering, M., Shafi, K.: rdu natural scene character recognition using convolutional neural networks. In: 2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR), pp. 29–34. IEEE (2018)

    Google Scholar 

  2. Ardian, Z., Santoso, P.I., Hantono, B.S.: Argot: text-based detection systems in real time using augmented reality for media translator aceh-indonesia with android-based smartphones. J. Phys. Conf. Ser. 1019, 012074 (2018)

    Google Scholar 

  3. Baek,Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9365–9374 (2019)

    Google Scholar 

  4. Bhatt, P., Panchal, K., Patel, H., Rote, U.: Tourism application using augmented reality. Available at SSRN 3568709 (2020)

    Google Scholar 

  5. Huang, Z., Zhong, Z., Sun, L., Huo, Q.: Mask R-CNN with pyramid attention network for scene text detection. In: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 764–772. IEEE (2019)

    Google Scholar 

  6. Liu, X., Zhou, G., Zhang, R., Wei, X.: An accurate segmentation-based scene text detector with context attention and repulsive text border. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 550–551 (2020)

    Google Scholar 

  7. Lundgren, A., Castro, D., Lima, E., Bezerra, B.: OctShuffleMLT: a compact octave based neural network for end-to-end multilingual text detection and recognition. In: 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW), vol. 4, pp. 37–42. IEEE (2019)

    Google Scholar 

  8. Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: an end-to-end trainable neural network for spotting text with arbitrary shapes. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 67–83 (2018)

    Google Scholar 

  9. Mansoor, K., Olson, C.F.: Recognizing text with a CNN. In: 2019 International Conference on Image and Vision Computing New Zealand (IVCNZ), pp. 1–6. IEEE (2019)

    Google Scholar 

  10. Ouali, I., Ghozzi, F., Taktak, R., Sassi, M.S.H.: Ontology alignment using stable matching. Procedia Comput. Sci. 159, 746–755 (2019)

    CrossRef  Google Scholar 

  11. Ouali, I., Sassi, M.S.H., Halima, M.B., Ali, W.: A new architecture based AR for detection and recognition of objects and text to enhance navigation of visually impaired people. Procedia Comput. Sci. 176, 602–611 (2020)

    CrossRef  Google Scholar 

  12. Ouali, I., Hadj Sassi, M.S., Ben Halima, M., Wali, A.: Architecture for real-time visualizing arabic words with diacritics using augmented reality for visually impaired people. In: Barolli, L., Woungang, I., Enokido, T. (eds.) AINA 2021. LNNS, vol. 225, pp. 285–296. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-75100-5_25

    CrossRef  Google Scholar 

  13. Ouertani, H.C., Tatwany, L.: Augmented reality based mobile application for real-time arabic language translation. Commun. Sci. Technol. 4(1), 30–37 (2019)

    CrossRef  Google Scholar 

  14. Pu, M., Majid, N., Idrus, B.: Framework based on mobile augmented reality for translating food menu in Thai language to Malay language. Int. J. Adv. Sci. Engl. Inf. Technol. 7, 153–159 (2017)

    CrossRef  Google Scholar 

  15. Qin, S., Ren, P., Kim, S., Manduchi, R.: Robust and accurate text stroke segmentation. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 242–250. IEEE (2018)

    Google Scholar 

  16. Qin, X., Zhou, Y., Guo, Y., Wu, D., Wang, W.: Fc2rn: a fully convolutional corner refinement network for accurate multi-oriented scene text detection. arXiv preprint arXiv:2007.05113 (2020)

  17. Sassi, M.S.H., Jedidi, F.G., Fourati, L.C.: A new architecture for cognitive internet of things and big data. Procedia Comput. Sci. 159, 534–543 (2019)

    CrossRef  Google Scholar 

  18. Saudagar, A.K.J., Mohammad, H.: Augmented reality mobile application for arabic text extraction, recognition and translation. J. Stat. Manag. Syst. 21(4), 617–629 (2018)

    Google Scholar 

  19. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  20. Syahidi, A.A., Tolle, H., Supianto, A.A., Arai, K.: Bandoar: real-time text based detection system using augmented reality for media translator Banjar language to Indonesian with smartphone. In: 2018 IEEE 5th International Conference on Engineering Technologies and Applied Sciences (ICETAS), pp. 1–6. IEEE (2018)

    Google Scholar 

  21. Tang, Y., Wu, X.: Scene text detection using superpixel-based stroke feature transform and deep learning based region classification. IEEE Trans. Multimedia 20(9), 2276–2288 (2018)

    CrossRef  Google Scholar 

  22. Wang, X., Jiang, Y., Luo, Z., Liu, C.-L., Choi, H., Kim, S.: Arbitrary shape scene text detection with adaptive text region representation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6449–6458 (2019)

    Google Scholar 

  23. Wang, Y., Xie, H., Fu, Z., Zhang, Y.: DSRN: a deep scale relationship network for scene text detection. In: IJCAI, pp. 947–953 (2019)

    Google Scholar 

  24. Xu, Y., Wang, Y., Zhou, W., Wang, Y., Yang, Z., Bai, X.: Textfield: learning a deep direction field for irregular scene text detection. IEEE Trans. Image Process. 28(11), 5566–5579 (2019)

    MathSciNet  CrossRef  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Imene Ouali .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Ouali, I., Halima, M.B., Wali, A. (2022). Text Detection and Recognition Using Augmented Reality and Deep Learning. In: Barolli, L., Hussain, F., Enokido, T. (eds) Advanced Information Networking and Applications. AINA 2022. Lecture Notes in Networks and Systems, vol 449. Springer, Cham. https://doi.org/10.1007/978-3-030-99584-3_2

Download citation