Skip to main content
Log in

UrduDeepNet: offline handwritten Urdu character recognition using deep neural network

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Handwritten Urdu character recognition system faces several challenges including the writer-dependent variations and non-availability of benchmark databases for cursive writing scripts. In this study, we propose a handwritten Urdu character dataset for Nasta’liq writing style covering isolated, positional characters as well as numerals. We also propose a convolutional neural network (CNN) architecture for the recognition of handwritten Urdu characters and numerals. CNN is a novel technique for image recognition that does not need explicit feature engineering and extraction and produces efficient results as compared to standard handcrafted feature extraction approaches. The proposed system was trained on a training dataset of 74, 285 samples and evaluated on a test dataset of 21, 223 samples and achieved a recognition rate of 98.82% for 133 classes, outperforming the results of all state-of-the-art systems for the Urdu language.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26

Similar content being viewed by others

References

  1. Ahmad I, Wang X, Hao Mao Y, Liu G, Ahmad H, Ullah R (2018) Ligature based Urdu Nastaleeq sentence recognition using gated bidirectional long short term memory. Clust Comput 21(1):703–714

    Article  Google Scholar 

  2. Ahmad Z, Orakzai JK, Shamsher I, & Adnan A (2007) Urdu Nastaleeq optical character recognition. Paper presented at the Proceedings of world academy of science, engineering and technology, pp. 2380–2383

  3. Ahmed SB, Naz S, Swati S, Razzak I, Umar AI, Khan AA (2017) UCOM offline dataset-an Urdu handwritten dataset generation. Int Arab J Inf Technol (IAJIT) 14(2):239–245

    Google Scholar 

  4. Ahmed Z, Iqbal K, Mehmood I, Ayub MA (2017). Ligature analysis-based Urdu OCR framework. Paper presented at the 2017 International Conference on Frontiers of Information Technology (FIT), pp. 87–92

  5. Akram QUA, Hussain S (2019) Improving Urdu recognition using character-based artistic features of nastalique calligraphy. IEEE Access 7:8495–8507

    Article  Google Scholar 

  6. Al-Rashaideh H (2006) Preprocessing phase for Arabic word handwritten recognition. Inf Process (Russian) 6(1):11–19

    Google Scholar 

  7. Alom MZ, Taha TM, Yakopcic C, Westberg S, Sidike P, Nasrin MS, Asari VK (2019) A state-of-the-art survey on deep learning theory and architectures. Electronics 8(3):292

    Article  Google Scholar 

  8. Arica N, Yarman-Vural FT (2001) An overview of character recognition focused on off-line handwriting. IEEE Trans Syst Man Cybern Part C Appl Rev 31(2):216–233

    Article  Google Scholar 

  9. Daud A, Khan W, Che D (2017) Urdu language processing: a survey. Artif Intell Rev 47(3):279–311

    Article  Google Scholar 

  10. Din IU, Malik Z, Siddiqi I, Khalid S (2016) Line and ligature segmentation in printed Urdu document images. J Appl Environ Biol Sci 6(3):114–120

    Google Scholar 

  11. Din IU, Siddiqi I, Khalid S, Azam T (2017) Segmentation-free optical character recognition for printed Urdu text. EURASIP J Image Video Process 2017(1):62

    Article  Google Scholar 

  12. Farooq F, Govindaraju V, Perrone M (2005) Pre-processing methods for handwritten Arabic documents. Paper presented at the Eighth International Conference on Document Analysis and Recognition (ICDAR'05). pp. 1–5

  13. Jain M, Mathew M, & Jawahar C (2017) Unconstrained ocr for urdu using deep cnn-rnn hybrid networks. Paper presented at the 2017 4th IAPR Asian Conference on Pattern Recognition (ACPR)

  14. Javed N, Shabbir S, Siddiqi I, Khurshid K (2017) Classification of Urdu ligatures using convolutional neural networks-a novel approach. Paper presented at the 2017 International Conference on Frontiers of Information Technology (FIT). pp. 93–97

  15. Javed ST, Fasihi MM, Khan A, Ashraf U (2017) Background and punch-hole noise removal from handwritten urdu text. Paper presented at the 2017 International Multi-topic Conference (INMIC). pp. 1–6

  16. Javed ST, Hussain S, Maqbool A, Asloob S, Jamil S, Moin H (2010) Segmentation free nastalique urdu ocr. World Academy Sci Eng Technol 46:456–461

    Google Scholar 

  17. Kadhm MS, Abdul APDAK (2015) Handwriting word recognition based on SVM classifier. Int J Adv Comput Sci Appl 1:64–68

    Google Scholar 

  18. Kaushal DS, Khan Y, Varma DS (2014) Handwritten Urdu character recognition using Zernike MI’s feature extraction and support vector machine classifier. Int J Res 1(7):1084–1089

    Google Scholar 

  19. Khan K, Khan RU, Alkhalifah A, Ahmad N (2015). Urdu text classification using decision trees. Paper presented at the 2015 12th International Conference on High-capacity Optical Networks and Enabling/Emerging Technologies (HONET). pp. 56–59

  20. Khan NH, Adnan A, Basar S (2018) Urdu ligature recognition using multi-level agglomerative hierarchical clustering. Clust Comput 21(1):503–514

    Article  Google Scholar 

  21. Khan SN, Khan K, Khan A, Khan A, Khan AU, Ullah B (2018) Urdu word segmentation using machine learning approaches. Int J Adv Comput Sci Appl 9(6):193–200

    Google Scholar 

  22. Kumar G, Bhatia PK, Banger I (2013) Analytical review of preprocessing techniques for offline handwritten character recognition. Int J Adv Eng Sci 3(3):14–22

    Google Scholar 

  23. Latif G, Alghazo J, Alzubaidi L, Naseer MM, Alghazo Y (2018) Deep Convolutional Neural Network for Recognition of Unified Multi-Language Handwritten Numerals. Paper presented at the 2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR), 90–95

  24. Mahmood A (2013) Arabic and Urdu text segmentation challenges and techniques. Int J Comput Sci Technol 4:32–34

    Google Scholar 

  25. Muaz A (2010) Urdu optical character recognition system MS thesis. National University of Computer and Emerging Sciences, Lahore Pakistan

    Google Scholar 

  26. Nautiyal CT, Singh S, Rana US (2017) Noisy Character Recognition. Global J Pure Appl Math 13(6):1875–1892

    Google Scholar 

  27. Naz S, Ahmed S, Ahmad R, Razza M (2015) Arabic script based digit recognition systems. Paper presented at the International Conference on Recent Advances in Computer Systems, pp. 67–73

  28. Naz S, Ahmed SB, Ahmad R, Razzak MI (2016) Zoning features and 2DLSTM for Urdu text-line recognition. Procedia Comput Sci 96:16–22

    Article  Google Scholar 

  29. Naz S, Hayat K, Razzak MI, Anwar MW, Madani SA, Khan SU (2014) The optical character recognition of Urdu-like cursive scripts. Pattern Recogn 47(3):1229–1248

    Article  Google Scholar 

  30. Naz S, Umar AI, Ahmad R, Ahmed SB, Shirazi SH, Siddiqi I, Razzak MI (2016) Offline cursive Urdu-Nastaliq script recognition using multidimensional recurrent neural networks. Neurocomputing 177:228–241

    Article  Google Scholar 

  31. Naz S, Umar AI, Ahmad R, Siddiqi I, Ahmed SB, Razzak MI, Shafait F (2017) Urdu Nastaliq recognition using convolutional–recursive deep learning. Neurocomputing 243:80–87

    Article  Google Scholar 

  32. Pal U, Sarkar A (2003) Recognition of printed Urdu script. Paper presented at the Seventh International Conference on Document Analysis and Recognition, 2003, pp. 1–5

  33. Raza AA, Habib A, Ashraf J, Javed M (2017) A review on Urdu language parsing. Int J Adv Comput Sci Appl 8(4):93–97

    Google Scholar 

  34. Rizvi SS, Sagheer A, Adnan K, Muhammad A (2019) Optical character recognition system for Nastalique Urdu-like script languages using supervised learning. Int J Pattern Recogn Artif Intell 33:1953004

    Article  Google Scholar 

  35. Sardar S, Wahab, A (2010) Optical character recognition system for Urdu. Paper presented at the 2010 International Conference on Information and Emerging Technologies, pp. 1–5

  36. Sattar SA (2009) A Technique for the Design and Implementation of an OCR for Printed Nastalique Text. NED University of Engineering and Technology Karachi, Sindh Pakistan

    Google Scholar 

  37. Shafait F, Keysers D, Breuel T (2008) Efficient implementation of local adaptive thresholding techniques using integral images. Electron Imaging Int Soc Optics Photonics 6815:681510–681510

    Google Scholar 

  38. Shrestha A, Mahmood A (2019) Review of deep learning algorithms and architectures. IEEE Access 7:53040–53065

    Article  Google Scholar 

  39. Singh D, Khan MA, Bansal A, Bansal N (2015) An application of SVM in character recognition with chain code. Paper presented at the 2015 Communication, Control and Intelligent Systems (CCIS). pp. 1–5

  40. Yamashita R, Nishio M, Do RKG, Togashi K (2018) Convolutional neural networks: an overview and application in radiology. Insights Imaging 9(4):611–629

    Article  Google Scholar 

Download references

Funding

No funding was received.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Munish Kumar.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mushtaq, F., Misgar, M.M., Kumar, M. et al. UrduDeepNet: offline handwritten Urdu character recognition using deep neural network. Neural Comput & Applic 33, 15229–15252 (2021). https://doi.org/10.1007/s00521-021-06144-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-021-06144-x

Keywords

Navigation