Skip to main content
Log in

Urdu signboard detection and recognition using deep learning

  • 1177: Advances in Deep Learning for Multimodal Fusion and Alignment
  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Signboard detection and recognition is an important task in automated context-aware marketing. Recently many scripting languages like Latin, Japanese, and Chinese have been effectively detected by several machine learning algorithms. As compared to other languages, outdoor Urdu text needs further attention in detection and recognition due to its cursive nature. Urdu detection and recognition are also difficult due to a wide variety of illuminations, low resolution, inconsistent font styles, color, and backgrounds. To overcome the deficiency of Urdu text detection from the outdoor environment, we have proposed a new Urdu-text signboard dataset with 467 ligature categories, containing a 30 + K images for recognition and 700 base images with annotation are created for detection. We also propose a methodology, that consists of 3-phases. In first phase text regions containing Urdu ligatures from shop-signboard images are detected by a faster regional convolutional neural network (FasterRCNN) using pre-trained CNNs like Alexnet and Vgg16. In the second phase detected regions from the first phase are clustered to identify unique ligatures in a dataset. Lastly in the third phase, all detected regions are recognized by 18-layer convolutional neural network trained model. The proposed system has successfully achieved the precision and recall of 87% and 96% respectively using vgg16 model for detection. For the classification of ligatures, a recognition rate of 97.50% is achieved. Recognition of ligatures was also evaluated using bilingual evaluation understudy (BLEU), and achieved an encouraging score of 0.96 on the newly developed Urdu-Signboard dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  1. Ackley HS (2019) Methods for optical character recognition (OCR). US Patent Application No. 15/793:407

  2. Ahmad I, Wang X, Li R, Rasheed S (2017) Offline Urdu Nastaleeq optical character recognition based on stacked denoising autoencoder. China Communications 14(1):146–157

    Article  Google Scholar 

  3. Ahmed SB, Naz S, Razzak MI, Yousaf R (2017) Deep learning based isolated Arabic scene character recognition. In: 1st International Workshop on Arabic Script Analysis and Recognition (ASAR). IEEE, pp 46–51

  4. Akram QUA, Hussain S (2017) Ligature-based font size independent OCR for Noori Nastalique writing style. 1st International Workshop on Arabic Script Analysis and Recognition (ASAR), IEEE, pp 129–133

  5. Ali A, Pickering MA (2019) Hybrid deep neural network for Urdu text recognition in natural images. In: 4th International Conference on Image. Vision and Computing (ICIVC), IEEE, pp 321–325

    Google Scholar 

  6. Ali A, Pickering M (2019) Urdu-text: A dataset and benchmark for Urdu text detection and recognition in natural scenes. In: 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, pp 323–328

  7. Ali T, Ahmad T, Imran M (2016) UOCR: A ligature based approach for an Urdu OCR system. 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom), IEEE, pp 388–394

  8. Ali A, Pickering M, Shafi K (2018) Urdu natural scene character recognition using convolutional neural networks. In: 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR), IEEE edn, pp 29–34

    Google Scholar 

  9. Arafat SY, Iqbal MJ (2019) Two stream deep neural network for sequence-based Urdu ligature recognition. IEEE Access 7:159090–159099

    Article  Google Scholar 

  10. Arafat SY, Iqbal MJ (2020) Urdu-text detection and recognition in natural scene images using deep learning. IEEE Access 8:96787–96803

    Article  Google Scholar 

  11. Arora A, Chang CC, Rekabdar B, Povey D, Etter D, Raj D, Hadian H, Trmal J, Garcia P (2019) Using ASR methods for OCR. 2019 International Conference on Document Analysis and Recognition (ICDAR), IEEE, pp 663–668

  12. Baran R, Partila P, Wilk R (2018) Automated text detection and character recognition in natural scenes based on local image features and contour processing techniques. International Conference on Intelligent Human Systems Integration. Springer, pp 42–48

  13. Beeferman D, Berger A (2000) Agglomerative clustering of a search engine query log. Proceedings of the sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 407–416

  14. Bhowmik S, Sarkar R, Nasipuri M, Doermann D (2018) Text and non-text separation in offline document images: a survey. International Journal on Document Analysis and Recognition (IJDAR) 21(1-2):1–20

    Article  Google Scholar 

  15. Brants T, Popat AC, Xu P, Och FJ, Dean J (2007) Large language models in machine translation. Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pp 858–867

  16. Breuel TM, Ul-Hasan A, Al-Azawi MA, Shafait F (2013) High-performance OCR for printed English and Fraktur using LSTM networks. 12th International Conference on Document Analysis and Recognition, IEEE, pp 683–687

  17. Chandio AA, Pickering M (2019) Convolutional Feature Fusion for Multi-Language Text Detection in Natural Scene Images. In: 2nd International Conference on Computing, Mathematics and Engineering Technologies (iCoMET). IEEE, pp 1–6

  18. Chandio AA, Pickering M, Shafi K (2018) Character classification and recognition for Urdu texts in natural scene images. In: International Conference on Computing, Mathematics and Engineering Technologies (iCoMET). IEEE, pp 1–6

  19. Chandio AA, Leghari M, Memon MA, Leghari M, Jalbani AH (2020) A database for Urdu text detection and recognition in natural scene images. Mehran University Research Journal of Engineering and Technology 39(1):47–54

    Article  Google Scholar 

  20. Chandio AA, Asikuzzaman M, Pickering M, Leghari M (2020) Cursive-text: A comprehensive dataset for end-to-end Urdu text recognition in natural scene images. Data in Brief 105749

  21. Chen H, Tsai SS, Schroth G, Chen DM, Grzeszczuk R, Girod B (2011) Robust text detection in natural images with edge-enhanced maximally stable extremal regions. In: 18th International Conference on Image Processing. IEEE, pp 2609–2612

  22. Dang S, Wen M, Mumtaz S, Li J, Li C (2020) Enabling Multi-carrier relay selection by sensing fusion and cascaded ANN for intelligent vehicular communications. IEEE Sensors Journal

  23. Darab M, Rahmati M (2012) A hybrid approach to localize farsi text in natural scene images. Procedia Comput Sci 13:171–184

    Article  Google Scholar 

  24. Das D, Philip J, Mathew M, Jawahar C (2019) A cost efficient approach to correct OCR errors in large document collections. In: International Conference on Document Analysis and Recognition (ICDAR). IEEE, pp 655–662

  25. Devlin J, Cheng H, Fang H, Gupta S, Deng L, He X, Zweig G, Mitchell M (2015) Language models for image captioning: the quirks and what works. arXiv preprint:1505.01809

  26. Din IU, Siddiqi I, Khalid S, Azam T (2017) Segmentation-free optical character recognition for printed Urdu text. EURASIP J Image Vide 2017(1):62

    Article  Google Scholar 

  27. Dreyer M, Marcu D (2012) Hyter: Meaning-equivalent semantics for translation evaluation. In: Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 162–171

  28. Epshtein B, Ofek E, Wexler Y (2010) Detecting text in natural scenes with stroke width transform. Computer society conference on computer vision and pattern recognition. IEEE, pp 2963–2970

  29. He P, Huang W, He T, Zhu Q, Qiao Y, Li X (2017) Single shot text detector with regional attention. IEEE International Conference on Computer Vision, pp 3047–3055

  30. He W, Zhang X-Y, Yin F, Liu C-L (2017) Deep direct regression for Multi-oriented scene text detection. Proceedings of the IEEE International Conference on Computer Vision. IEEE, pp 745–753

    Google Scholar 

  31. Hong T, Hull JJ (1995) Algorithms for postprocessing OCR results with visual inter-word constraints. International Conference on Image Processing. IEEE, pp 312–315

    Google Scholar 

  32. Horie F, Goto H (2018) Synthetic scene character generator and multi-scale voting classifier for Japanese scene character recognition. In: International Conference on Image and Vision Computing New Zealand (IVCNZ). IEEE, pp 1–6

  33. Hosozawa K, Wijaya RH, Linh TD, Seya H, Arai M, Maekawa T, Mizutani K (2018) Recognition of expiration dates written on food packages with open source OCR. International Journal of Computer Theory and Engineering 10(5):170–174

    Article  Google Scholar 

  34. Iandola FN, Han S, Moskewicz MW, Ashraf K, Dally WJ, Keutzer K (2016) SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and < 0.5 MB model size. arXiv preprint 1602.07360

  35. Iqbal MS, Ahmad I, Bin L, Khan S, Rodrigues JJ (2020) Deep learning recognition of diseased and normal cell representation. T Emerg Telecommun T: e4017

  36. Jamil AJ, Batool A, Malik Z, Mirza A, Siddiqi I (2016) Multilingual artificial text extraction and script identification from video images. Int J Adv Comput Sci Appl 1(7):529–539

    Google Scholar 

  37. Javed ST, Hussain S, Maqbool A, Asloob S, Jamil S, Moin H (2010) Segmentation free nastalique Urdu OCR. World Acad Sci Eng Technol 46:456–461

    Google Scholar 

  38. Khan WQ, Khan RQ (2015) Urdu optical character recognition technique using point feature matching; a generic approach. In: International Conference on Information and Communication Technologies (ICICT). IEEE, pp 1–7

  39. Khan S, Ali H, Ullah Z, Minallah N, Maqsood S, Hafeez A (2019) Higher accurate recognition of handwritten Pashto letters through zoning feature by using K-nearest neighbour and artificial neural network. arXiv preprint:1904.03391

  40. Khattak IU, Siddiqi I, Khalid S, Djeddi C (2015) Recognition of Urdu ligatures-a holistic approach. In: 13th International Conference on Document Analysis and Recognition (ICDAR). IEEE, pp 71–75

  41. Kolton A, Bentov A (2019) Location based optical character recognition (OCR). U.S. Patent and Trademark Office. US Patent No. 10,489,671

  42. Liao M, Shi B, Bai X, Wang X, Liu W (2017) Textboxes: a fast text detector with a single deep neural network. arXiv preprint:1611.06779

  43. Liu Y, Jin L (2017) Deep matching prior network: toward tighter multi-oriented text detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, pp 1962–1969

  44. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) SSD: Single shot multibox detector. European Conference on Computer Vision, Springer, pp 21–37

  45. Long S, He X, Ya C (2018) Scene text detection and recognition: the deep learning era. arXiv preprint:1811.04256

  46. Mahmood A, Srivastava A (2018) A novel segmentation technique for urdu type-written text. In: Recent advances on engineering, technology and computational sciences (RAETCS). IEEE, pp 1–5

  47. Mirza A, Fayyaz M, Seher Z, Siddiqi I (2018) Urdu caption text detection using textural features. In: 2nd Mediterranean Conference on Pattern Recognition and Artificial Intelligence. ACM, pp 70–75

  48. Mittal A, Roy PP, Singh P, Raman B (2017) Rotation and script independent text detection from video frames using sub pixel mapping. J Vis Commun Image R 46:187–198

    Article  Google Scholar 

  49. Naz S, Hayat K, Anwar MW, Akbar H, Razzak MI (2013) Challenges in baseline detection of cursive script languages. Science and information conference. IEEE, pp 551–556

    Google Scholar 

  50. Naz S, Umar AI, Ahmed R, Razzak MI, Rashid SF, Shafait F (2016) Urdu Nasta’liq text recognition using implicit segmentation based on multi-dimensional long short term memory neural networks. SpringerPlus 5(1):2010

    Article  Google Scholar 

  51. Naz S, Umar AI, Ahmad R, Siddiqi I, Ahmed SB, Razzak MI, Shafait F (2017) Urdu Nastaliq recognition using convolutional–recursive deep learning. Neurocomputing 243:80–87

    Article  Google Scholar 

  52. Neumann L, Matas J (2012) Real-time scene text localization and recognition. IEEE Conference on Computer Vision And Pattern Recognition. IEEE, pp 3538–3545

    Google Scholar 

  53. Papineni K, Roukos S, Ward T, Zhu W-J (2002) BLEU: A method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pp 311–318

  54. Qassim H, Verma A, Feinzimer D (2018) Compressed residual-VGG16 CNN model for big data places image recognition. In: 8th Annual Computing and Communication Workshop and Conference (CCWC). IEEE, pp 169–175

  55. Rafeeq MJ, ur Rehman Z, Khan A, Khan IA, Jadoon W (2019) Ligature categorization based Nastaliq Urdu recognition using deep neural networks. Comput Math Organ Theory 25(2):184–195

  56. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, pp 779–788

    Google Scholar 

  57. Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in neural information processing systems, pp 91–99

    Google Scholar 

  58. Rong X, Yi C, Tian Y (2017) Unambiguous text localization and retrieval for cluttered scenes. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, pp 5494–5502

    Google Scholar 

  59. Samaee M, Tavakoli H (2017) Farsi text localization in natural scene images. International Journal of Computer Science and Information Security 15(2):22

    Google Scholar 

  60. Sami Ur R, Tayyab BU, Naeem MF, Ul-Hasan A, Shafait FA (2018) Multi-faceted OCR Framework for artificial Urdu news ticker text recognition. In: 2018 13th IAPR International Workshop on Document Analysis Systems (DAS), 24–27 April 2018. pp 211–216. https://doi.org/10.1109/DAS.2018.83

  61. Sanjrani AA, Baber J, Bakhtyar M, Noor W, Khalid M (2016) Handwritten optical character recognition system for Sindhi numerals. In: 2016 International Conference on Computing. Electronic and Electrical Engineering (ICE Cube), IEEE, pp 262–267

    Google Scholar 

  62. Shabbir S (2016) Optical character recognition system for Urdu words in nastaliq font. Int J Adv Comput Sci Appl 7(5):567–576

    Google Scholar 

  63. Shi B, Bai X, Belongie S (2017) Detecting oriented text in natural images by linking segments. Conference on Computer Vision and Pattern Recognition. IEEE, pp 2550–2558

    Google Scholar 

  64. Sriman B, Schomaker L (2019) Multi-script text versus non-text classification of regions in scene images. J Vis Commun Image Represent 62:23–42

    Article  Google Scholar 

  65. Sulaiman Khan HA, Ullah Z, Minallah N, Maqsood S, Hafeez A (2018) KNN and ANN-based recognition of handwritten Pashto letters using zoning features. Machine Learning 9(10)

  66. Sun X, Wu P, Hoi SC (2018) Face detection using deep learning: an improved faster RCNN approach. Neurocomputing 299:42–50

    Article  Google Scholar 

  67. Tounsi M, Moalla I, Alimi AM, Lebouregois F (2015) Arabic characters recognition in natural scenes using sparse coding for feature representations. In: 13th International Conference on Document Analysis and Recognition (ICDAR). IEEE, pp 1036–1040

  68. Unar S, Jalbani AH, Jawaid MM, Shaikh M, Chandio AA (2018) Artificial Urdu text detection and localization from individual video frames. Mehran University Research Journal of Engineering and Technology 37(2):429–438

    Article  Google Scholar 

  69. Wang K, Babenko B, Belongie S (2011) End-to-end scene text recognition. In: 2011 International Conference on Computer Vision. IEEE, pp 1457–1464

  70. Wang Q, Liu M, Zhang W, Guo Y, Li T (2019) Automatic proofreading in chinese: detect and correct spelling errors in character-level with deep neural networks. CCF International Conference on Natural Language Processing and Chinese Computing. Springer, pp 349–359

    Google Scholar 

  71. Yan C, Xie H, Liu S, Yin J, Zhang Y, Dai Q (2017) Effective Uyghur language text detection in complex background images for traffic prompt identification. IEEE Trans Intell Transp Syst 19(1):220–229

    Article  Google Scholar 

  72. Yan C, Xie H, Chen J, Zha Z, Hao X, Zhang Y, Dai Q (2018) A fast Uyghur text detector for complex background images. IEEE T Multimedia 20(12):3389–3398

    Article  Google Scholar 

  73. Yan S, Xie Y, Wu F, Smith JS, Lu W, Zhang B (2020) Image captioning via hierarchical attention mechanism and policy gradient optimization. Signal Process 167:107329

    Article  Google Scholar 

  74. Yao T, Pan Y, Li Y, Mei T (2019) Hierarchy parsing for image captioning. Proceedings of the IEEE International Conference on Computer Vision, pp 2621–2629

  75. Ye Q, Doermann D (2014) Text detection and recognition in imagery: a survey. IEEE Trans Pattern Anal Mach Intell 37(7):1480–1500

    Article  Google Scholar 

  76. Zaman S, Anwar K, Khan R (2016) Image character through signal and pattern formation. In: 13th learning and technology conference (L&T). IEEE, pp 1–6

  77. Zhang Z, Shen W, Yao C, Bai X (2015) Symmetry-based text line detection in natural scenes. Conference on Computer Vision and Pattern Recognition. IEEE, pp 2558–2567

  78. Zhang Z, Zhang C, Shen W, Yao C, Liu W, Bai X (2016) Multi-oriented text detection with fully convolutional networks. Conference on Computer Vision and Pattern Recognition. IEEE, pp 4159–4167

    Google Scholar 

  79. Zhang C, Peng G, Tao Y, Fu F, Jiang W, Almpanidis G, Chen K (2019) ShopSign: a diverse scene text dataset of Chinese shop signs in street views. arXiv preprint arXiv:1903.10412

  80. Zhou X, Yao C, Wen H, Wang Y, Zhou S, He W, Liang J (2017) EAST: An efficient and accurate scene text detector. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, pp 5551–5560

    Google Scholar 

Download references

Acknowledgments

The authors would like to acknowledge Higher Education Commission (HEC) for supporting this work under their NRPU Project No. 6338. This work was also supported by FCT/MCTES through national funds and when applicable co-funded EU funds under the Project UIDB/EEA/50008/2020; and by the Brazilian National Council for Research and Development (CNPq) via Grants No. 309335/2017-5.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Syed Yasser Arafat.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Arafat, S.Y., Ashraf, N., Iqbal, M.J. et al. Urdu signboard detection and recognition using deep learning. Multimed Tools Appl 81, 11965–11987 (2022). https://doi.org/10.1007/s11042-020-10175-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-020-10175-2

Keywords

Navigation