Abstract
Handwritten Urdu character recognition system faces several challenges including the writer-dependent variations and non-availability of benchmark databases for cursive writing scripts. In this study, we propose a handwritten Urdu character dataset for Nasta’liq writing style covering isolated, positional characters as well as numerals. We also propose a convolutional neural network (CNN) architecture for the recognition of handwritten Urdu characters and numerals. CNN is a novel technique for image recognition that does not need explicit feature engineering and extraction and produces efficient results as compared to standard handcrafted feature extraction approaches. The proposed system was trained on a training dataset of 74, 285 samples and evaluated on a test dataset of 21, 223 samples and achieved a recognition rate of 98.82% for 133 classes, outperforming the results of all state-of-the-art systems for the Urdu language.
Similar content being viewed by others
References
Ahmad I, Wang X, Hao Mao Y, Liu G, Ahmad H, Ullah R (2018) Ligature based Urdu Nastaleeq sentence recognition using gated bidirectional long short term memory. Clust Comput 21(1):703–714
Ahmad Z, Orakzai JK, Shamsher I, & Adnan A (2007) Urdu Nastaleeq optical character recognition. Paper presented at the Proceedings of world academy of science, engineering and technology, pp. 2380–2383
Ahmed SB, Naz S, Swati S, Razzak I, Umar AI, Khan AA (2017) UCOM offline dataset-an Urdu handwritten dataset generation. Int Arab J Inf Technol (IAJIT) 14(2):239–245
Ahmed Z, Iqbal K, Mehmood I, Ayub MA (2017). Ligature analysis-based Urdu OCR framework. Paper presented at the 2017 International Conference on Frontiers of Information Technology (FIT), pp. 87–92
Akram QUA, Hussain S (2019) Improving Urdu recognition using character-based artistic features of nastalique calligraphy. IEEE Access 7:8495–8507
Al-Rashaideh H (2006) Preprocessing phase for Arabic word handwritten recognition. Inf Process (Russian) 6(1):11–19
Alom MZ, Taha TM, Yakopcic C, Westberg S, Sidike P, Nasrin MS, Asari VK (2019) A state-of-the-art survey on deep learning theory and architectures. Electronics 8(3):292
Arica N, Yarman-Vural FT (2001) An overview of character recognition focused on off-line handwriting. IEEE Trans Syst Man Cybern Part C Appl Rev 31(2):216–233
Daud A, Khan W, Che D (2017) Urdu language processing: a survey. Artif Intell Rev 47(3):279–311
Din IU, Malik Z, Siddiqi I, Khalid S (2016) Line and ligature segmentation in printed Urdu document images. J Appl Environ Biol Sci 6(3):114–120
Din IU, Siddiqi I, Khalid S, Azam T (2017) Segmentation-free optical character recognition for printed Urdu text. EURASIP J Image Video Process 2017(1):62
Farooq F, Govindaraju V, Perrone M (2005) Pre-processing methods for handwritten Arabic documents. Paper presented at the Eighth International Conference on Document Analysis and Recognition (ICDAR'05). pp. 1–5
Jain M, Mathew M, & Jawahar C (2017) Unconstrained ocr for urdu using deep cnn-rnn hybrid networks. Paper presented at the 2017 4th IAPR Asian Conference on Pattern Recognition (ACPR)
Javed N, Shabbir S, Siddiqi I, Khurshid K (2017) Classification of Urdu ligatures using convolutional neural networks-a novel approach. Paper presented at the 2017 International Conference on Frontiers of Information Technology (FIT). pp. 93–97
Javed ST, Fasihi MM, Khan A, Ashraf U (2017) Background and punch-hole noise removal from handwritten urdu text. Paper presented at the 2017 International Multi-topic Conference (INMIC). pp. 1–6
Javed ST, Hussain S, Maqbool A, Asloob S, Jamil S, Moin H (2010) Segmentation free nastalique urdu ocr. World Academy Sci Eng Technol 46:456–461
Kadhm MS, Abdul APDAK (2015) Handwriting word recognition based on SVM classifier. Int J Adv Comput Sci Appl 1:64–68
Kaushal DS, Khan Y, Varma DS (2014) Handwritten Urdu character recognition using Zernike MI’s feature extraction and support vector machine classifier. Int J Res 1(7):1084–1089
Khan K, Khan RU, Alkhalifah A, Ahmad N (2015). Urdu text classification using decision trees. Paper presented at the 2015 12th International Conference on High-capacity Optical Networks and Enabling/Emerging Technologies (HONET). pp. 56–59
Khan NH, Adnan A, Basar S (2018) Urdu ligature recognition using multi-level agglomerative hierarchical clustering. Clust Comput 21(1):503–514
Khan SN, Khan K, Khan A, Khan A, Khan AU, Ullah B (2018) Urdu word segmentation using machine learning approaches. Int J Adv Comput Sci Appl 9(6):193–200
Kumar G, Bhatia PK, Banger I (2013) Analytical review of preprocessing techniques for offline handwritten character recognition. Int J Adv Eng Sci 3(3):14–22
Latif G, Alghazo J, Alzubaidi L, Naseer MM, Alghazo Y (2018) Deep Convolutional Neural Network for Recognition of Unified Multi-Language Handwritten Numerals. Paper presented at the 2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR), 90–95
Mahmood A (2013) Arabic and Urdu text segmentation challenges and techniques. Int J Comput Sci Technol 4:32–34
Muaz A (2010) Urdu optical character recognition system MS thesis. National University of Computer and Emerging Sciences, Lahore Pakistan
Nautiyal CT, Singh S, Rana US (2017) Noisy Character Recognition. Global J Pure Appl Math 13(6):1875–1892
Naz S, Ahmed S, Ahmad R, Razza M (2015) Arabic script based digit recognition systems. Paper presented at the International Conference on Recent Advances in Computer Systems, pp. 67–73
Naz S, Ahmed SB, Ahmad R, Razzak MI (2016) Zoning features and 2DLSTM for Urdu text-line recognition. Procedia Comput Sci 96:16–22
Naz S, Hayat K, Razzak MI, Anwar MW, Madani SA, Khan SU (2014) The optical character recognition of Urdu-like cursive scripts. Pattern Recogn 47(3):1229–1248
Naz S, Umar AI, Ahmad R, Ahmed SB, Shirazi SH, Siddiqi I, Razzak MI (2016) Offline cursive Urdu-Nastaliq script recognition using multidimensional recurrent neural networks. Neurocomputing 177:228–241
Naz S, Umar AI, Ahmad R, Siddiqi I, Ahmed SB, Razzak MI, Shafait F (2017) Urdu Nastaliq recognition using convolutional–recursive deep learning. Neurocomputing 243:80–87
Pal U, Sarkar A (2003) Recognition of printed Urdu script. Paper presented at the Seventh International Conference on Document Analysis and Recognition, 2003, pp. 1–5
Raza AA, Habib A, Ashraf J, Javed M (2017) A review on Urdu language parsing. Int J Adv Comput Sci Appl 8(4):93–97
Rizvi SS, Sagheer A, Adnan K, Muhammad A (2019) Optical character recognition system for Nastalique Urdu-like script languages using supervised learning. Int J Pattern Recogn Artif Intell 33:1953004
Sardar S, Wahab, A (2010) Optical character recognition system for Urdu. Paper presented at the 2010 International Conference on Information and Emerging Technologies, pp. 1–5
Sattar SA (2009) A Technique for the Design and Implementation of an OCR for Printed Nastalique Text. NED University of Engineering and Technology Karachi, Sindh Pakistan
Shafait F, Keysers D, Breuel T (2008) Efficient implementation of local adaptive thresholding techniques using integral images. Electron Imaging Int Soc Optics Photonics 6815:681510–681510
Shrestha A, Mahmood A (2019) Review of deep learning algorithms and architectures. IEEE Access 7:53040–53065
Singh D, Khan MA, Bansal A, Bansal N (2015) An application of SVM in character recognition with chain code. Paper presented at the 2015 Communication, Control and Intelligent Systems (CCIS). pp. 1–5
Yamashita R, Nishio M, Do RKG, Togashi K (2018) Convolutional neural networks: an overview and application in radiology. Insights Imaging 9(4):611–629
Funding
No funding was received.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Mushtaq, F., Misgar, M.M., Kumar, M. et al. UrduDeepNet: offline handwritten Urdu character recognition using deep neural network. Neural Comput & Applic 33, 15229–15252 (2021). https://doi.org/10.1007/s00521-021-06144-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-021-06144-x