Skip to main content
Log in

Recognition of offline handwritten Urdu characters using RNN and LSTM models

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Optical Character Recognition (OCR), helps to convert different types of scanned documents, such as images into searchable and editable content. OCR is language dependant and very limited research has been carried out in this field for Urdu and Urdu like scriptures (E.g. Farsi, Arabic, and Urdu) unlike other languages like English, Hindi, etc. The lack of research work is attributed to a lack of publically available benchmark databases and inherent complexities involved in these languages like cursive nature and change in the shape of a character depending upon its position in a ligature. Each character has 2–4 different shapes depending upon its position in the word; initial, medial, or final. In this article, the we have proposed a methodology to automate the data collection process and collected a large handwritten dataset of 110,785 Urdu characters and laid out the comaparative analysis of two deep learning models SimpleRNN and LSTM to showcase the potential of RNN models for chararacter recognition. Data was collected from 250 authors on the A4 size sheet. Each sheet contains 132 shapes for Urdu characters and 10 numerals. As far as the authors know, this is the first time that such a large dataset has been proposed which contains all the possible shapes of Urdu character numerals as well. Experimentation has been done for the numeral, full characters, and for whole data set separately to lay a comparative analysis of classification capabilities of RNN and LSTM models. Despite of such inherit complexities in Urdu script, the RNN and LSTM models proved to be more effective in achieving a high accuracy rates. Respective accuracy for RNN achieved for each category are: 96.96% for numerals, 85.22% for full characters and 73.62% for whole data and LSTM outperforms the prior one with max accuracy for each category of data as 97.80% for numerals, 97.43% for full characters and 91.30% for whole data. Besides, the proposed dataset opens a new window for future research, showcasing the huge potential of this dataset for data analysis not only for Urdu language but for other languages like Arabic, Persian,etc. which uses similar kind of character sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Abdul SS, Shams-ul H, Khan PM (2009) A Finite State Model for Urdu Nastalique Optical Character Recognition. 12th International Conference on Document Analysis and Recognition, 9, 116–122

  2. Ahmad Z, Orakzai JK, Shamsher I, Adnan A (2007) Urdu Nastaleeq Optical Character Recognition. World Academy of Science, Engineering and Technology 32:249–252

    Google Scholar 

  3. Ahmed SB, Naz S, Swat S, Razzak MI (2017) Handwritten Urdu Character Recognition using 1-Dimensional BLSTM Classifier. Neural Computing & Applications 31:1143–1151

    Article  Google Scholar 

  4. Ali J, Nazir S (2014) Diacritics Recognition Based Urdu Nastalique OCR System. The Nucleus 51:361–367

    Google Scholar 

  5. Benediktsson JA, Ghamisi P (2015) Spectral-Spatial Classification of Hyperspectral Remote Sensing Images. Boston, MA, USA

  6. Bhat GM, Hafiz A (2016) Arabic OCR Using a Novel Hybrid Classification Scheme. Journal of Pattern Recognition Research, 55–60

  7. Dong S, Wang P, Abbas K (2021) A survey on deep learning and its applications. Computer Science Review:1–22

  8. Ebrahinpour R, Amini M, Sharifizadehi F (2011) Farsi Handwritten Recognition Using Combining Neural Networks. Int J Electr Eng Inform, 3

  9. Hochreiter, Schmidhuber, J. (1997). Long short-term memory. Neural Comput, (pp. 1735–1780)

  10. Javed ST, Hussain S (2013) Segmentation Based Urdu Nastalique OCR. CIARP 2013, (pp. 41–49)

  11. Javed ST, Hussain S, Maqbool S, Asloob A, Jamil SS, Moin H (2010) Segmentation Free Nastalique Urdu OCR. International Journal of Computer and Information Engineering 10:1514–1519

    Google Scholar 

  12. Javed N, Shabbir S, Siddiqi I, Khurshid K (2017) Classification of Urdu ligatures using convolutional neural networks-a novel pproach

  13. Khan K, Ullah R, Khan NA, Naveed K (2012) Urdu Character Recognition using Principal Component Analysis. International Journal of Computer Applications 0975–8887(60):1–4

    Google Scholar 

  14. Khan, K., Khan, R. U., Alkhalifah, A., & Ahmad, N. (2015) Urdu Text Classification using decision Trees. IEEE, (pp. 56–59)

  15. Kumar M, Sharma RK, Jindal MK (2014) Efficient Feature Extraction Techniques for Offline Handwritten Gurmukhi Character Recognition. National Academy Science Letters 37(4):381–391

    Article  Google Scholar 

  16. Mou L, Ghamisi P, Zhu XX (2017a) Deep Recurrent Neural Networks for Hyperspectral Image Classification. IEEE Trans Geosci Remote Sens, (pp. 3639–3655)

  17. Mou L, Ghamisi P, Zhu XX (2017b) Deep Recurrent Neural Networks for Hyperspectral Image Classification. IEEE Transactions Geosci Remote Sens 55:3639–3655

    Article  Google Scholar 

  18. Mushtaq F, Misgar M, Khurana MK, Singh S (2021) UrduDeepNet: offline handwritten Urdu character recognition using deep neural network. Neural Comput Applic 15:229–15,252

    Google Scholar 

  19. Naz S, Khizar H, Imran RM, Waqas AM (2014) The optical character recognition of Urdu-like cursives cripts. Elsevier 47:1229–1248

    Google Scholar 

  20. Pal U, Sarkar A (2003) Recognition of Printed Urdu Script. International Conference on Document Analysis and Recognition

  21. Pradeep J, Srinivasan E, Himavathi S (2010) Diagonal Feature Extraction Based Handwritten Character System Using Neural Network. International Journal of Computer Applications 0975–8887(8):17–22

    Article  Google Scholar 

  22. Rizvi S, Sagheer A, Adnan K, Muhammad A (2019) Optical character recognition system for Nastalique Urdu-like script languages using supervised learning. Int J Pattern Recognit Artif Intell

  23. Sagheer MW, He CL, Nobile N, Suen CY (2009) A New Large Urdu Database for Off-Line Handwriting recognition. International Confrence on image Analysis and Processing (pp. 538–546). Springer, Berlin,Heidelberg

    Google Scholar 

  24. Sagheer MW, He CL, Nobile N, Suen CY (2010) Holistic Urdu Handwritten Word Recognition Using Support Vector Machine. In: International Conference on Pattern Recognition, pp 1900–1903

    Google Scholar 

  25. Shahzad N, Brandon, Hammond T (2009) Urdu Qaeda: Recognition System for Isolated Urdu Characters. (pp. 1–5)

  26. Shamsher I, Ahmad Z, Orakzai JK, Adnan A (2007) OCR For Printed Urdu Script Using Feed Forward Neural Network. International Journal of World Academy of Science, Engineering and Technology 1:2987–2989

    Google Scholar 

  27. Ul-Hasan A, Ahmed SB, Rashid SF, Shafait F, Breuel TM (2013) Offline Printed Urdu Nastaleeq Script Recognition with Bidirectional LSTM Networks. 12th International Conference on Document Analysis and Recognition

  28. Wahab A, Haque SN (2010) Optical Character Recognition System for Urdu Online and Offline OCR Irrespective of Fonts. J Ind Stud Res Comput, 8

  29. Zand M, Nilchi AN, Monadjemi SA (2008) Recognition-based Segmentation in Persian Character Recognition. International Journal of Computer, Electrical, Automation, Control and Information Engineering 2:312–315

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Munish Kumar.

Ethics declarations

Conflict of interest

None.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Misgar, M.M., Mushtaq, F., Khurana, S.S. et al. Recognition of offline handwritten Urdu characters using RNN and LSTM models. Multimed Tools Appl 82, 2053–2076 (2023). https://doi.org/10.1007/s11042-022-13320-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-13320-1

Keywords

Navigation