Advertisement

Ligature categorization based Nastaliq Urdu recognition using deep neural networks

  • Muhammad Jawad Rafeeq
  • Zia ur Rehman
  • Ahmad Khan
  • Iftikhar Ahmed Khan
  • Waqas Jadoon
S.I.: CMKBO
  • 77 Downloads

Abstract

The cursive nature, Nastaliq writing style and a large number of different ligatures make ligature recognition very difficult in Urdu. In this paper, we present a segmentation-free approach to holistically recognize Urdu ligatures. We first generate a rich dataset which contains 17,010 ligatures with different orientation and different degrees of noise. Secondly, the ligatures are clustered (categorized) in order to reduce the search space and make the learning robust. Finally, we employ a deep neural network with dropout regularization to classify ligatures. The detailed experiments show that a deep neural network with dropout regularization and clustering of ligatures significantly enhances the classification accuracy.

Keywords

Ligatures Nastaliq Deep neural network Classification Categorization 

References

  1. Ahmad I, Wang X, Mao YH, Liu G, Ahmad H, Ullah R (2017a) Ligature based Urdu Nastaleeq sentence recognition using gated bidirectional long short term memory. Clust Comput 17:1–12.  https://doi.org/10.1007/s10586-017-0990-5 Google Scholar
  2. Ahmad I, Wang X, Li R, Rasheed S (2017b) Offline Urdu Nastaleeq optical character recognition based on stacked denoising autoencoder. China Commun 14(1):146–157CrossRefGoogle Scholar
  3. Asad M, Butt AS, Chaudhry S, Hussain S (2004) Rule-based expert system for urdu Nastaleeq justification. In: Multitopic Conference, 2004. Proceedings of INMIC 2004. 8th International, IEEE. pp. 591–596Google Scholar
  4. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005, CVPR 2005, IEEE vol. 1, pp. 886–893Google Scholar
  5. Dalb SKS et al (2015) Review of online and offline character recognition. Int J Eng Comput Sci 4(5):11729–11732Google Scholar
  6. Din IU, Siddiqi I (2017) Khalid S (2017) Segmentation-free optical character recognition for printed Urdu text. EURASIP J Image Video Process 1:62CrossRefGoogle Scholar
  7. El-Korashy A, Shafait F (2013) Search space reduction for holistic ligature recognition in Urdu Nastalique script. In: 2013 12th International Conference on Document Analysis and Recognition (ICDAR), IEEE, pp. 1125–1129Google Scholar
  8. Gonzalez RC, Woods RE (2004) Eddins SL (2004) Digital image processing using MATLAB. Cambridge, p, Pearson EducationGoogle Scholar
  9. Hussain S, Niazi A, Anjum U, Irfan E, et al (2014) Adapting Tesseract for complex scripts: an example for Urdu Nastalique. In: 2014 11th IAPR International Workshop on Document Analysis Systems (DAS), IEEE, pp. 191–195Google Scholar
  10. Impedovo S, Ottaviano L, Occhinegro S (1991) Optical character recognition–a survey. Int J Pattern Recogn Artif Intell 5(01n02):1–24CrossRefGoogle Scholar
  11. Javed ST, Hussain S (2009) Improving Nastalique specific pre-recognition process for Urdu OCR. In: Multitopic Conference, 2009. INMIC 2009. IEEE 13th International pp. 1–6. IEEEGoogle Scholar
  12. Javed ST (2007) Investigation into a segmentation based OCR for the Nastaleeq writing system. National University of Computer and Emerging Sciences, Islamabad, p 2007Google Scholar
  13. Javed ST, Hussain S (2013) Segmentation based urdu nastalique OCR. Iberoamerican Congress on Pattern Recognition. Springer, Berlin, pp 41–49Google Scholar
  14. Javed ST, Hussain S, Maqbool A, Asloob S, Jamil S, Moin H (2013) Segmentation free nastalique Urdu OCR. World Acad Sci Eng Technol 46:456–461Google Scholar
  15. Khattak IU, Siddiqi I, Khalid S, Djeddi C (2015) Recognition of Urdu ligatures-a holistic approach. In : 2015 13th International Conference on Document Analysis and Recognition (ICDAR), IEEE, pp. 71–75Google Scholar
  16. Khattak IU, Siddiqi I, Khalid S, Djeddi C (2015) Recognition of Urdu ligatures—a holistic approach. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 71–75, Aug 2015Google Scholar
  17. Khorsheed MS (2015) Recognizing cursive typewritten text using segmentation-free system. Sci World J 2015:7.  https://doi.org/10.1155/2015/818432 CrossRefGoogle Scholar
  18. Lehal GS, Rana A (2013) Recognition of nastalique urdu ligatures. In: Proceedings of the 4th International Workshop on Multilingual OCR, ACM, p. 7Google Scholar
  19. Lehal GS (2013) Ligature segmentation for Urdu OCR. In: 2013 12th International Conference on Document Analysis and Recognition (ICDAR), pp. 1130–1134Google Scholar
  20. Line Eikvil (1993) Optical character recognition citeseer.ist.psu.edu/142042.html
  21. Marques O (2011) Practical image and video processing using MATLAB. Wiley, New JerseyCrossRefGoogle Scholar
  22. Mori S, Suen CY, Yamamoto K (1992) Historical review of OCR research and development. Proc IEEE 80(7):1029–1058CrossRefGoogle Scholar
  23. Naz S, Hayat K, Razzak MI, Anwar MW, Madani SA, Khan SU (2014) The optical character recognition of Urdu-like cursive scripts. Pattern Recogn 47(3):1229–1248CrossRefGoogle Scholar
  24. Naz S, Umar AI, Shirazi SH, Ahmed SB, Razzak MI, Siddiqi I (2016) Segmentation techniques for recognition of Arabic-like scripts: a comprehensive survey. Educ Inf Technol 21(5):1225–1241CrossRefGoogle Scholar
  25. Naz S, Umar AI, Ahmad R, Siddiqi I, Ahmed SB, Razzak MI, Shafait F (2017) Urdu Nastaliq recognition using convolutional-recursive deep learning. Neurocomputing 243:80–87CrossRefGoogle Scholar
  26. Rana A, Lehal GS (2015) Offline Urdu OCR using ligature based segmentation for Nastaliq Script. Indian J Sci Technol 8(35):1–9CrossRefGoogle Scholar
  27. Satti DA, Saleem K (2012) Complexities and implementation challenges in offline urdu Nastaliq OCR. In: Proceedings of the Conference on Language & Technology, pp. 85–91Google Scholar
  28. Shafait F, Sabbour N (2013) A segmentation-free approach to Arabic and Urdu OCR. Proc SPIE 8658:8658Google Scholar
  29. Srivastava N, Hinton GE, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958Google Scholar
  30. Su T-H, Zhang T-W, Guan D-J, Huang H-J (2009) Off-line recognition of realistic Chinese handwriting using segmentation-free strategy. Pattern Recogn 42(1):167–182CrossRefGoogle Scholar
  31. Venkata Rao N, Sastry ASCS, Chakravarthy ASN, Kalyanchakravarthi P, Kalyanchakravarthi P (2016) Optical character recognition technique algorithms. J Theor Appl Inf Technol 83(2):275Google Scholar
  32. Xie XL, Beni G (1991) A validity measure for fuzzy clustering. IEEE Pattern Anal Mach Intell 13(8):841–847CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Computer Science COMSATSInstitute of Information TechnologyVehariPakistan
  2. 2.Department of Computer Science COMSATSInstitute of Information TechnologyAbbottabadPakistan

Personalised recommendations