Abstract
Bidirectional long short term memory (BLSTM) architecture—a special case of recurrent neural network—is successfully applied for recognition of Urdu Nastaleeq sentence images based on character information. In such cases, manual labeling of characters in sentences for a large dataset is an intensive job, because identical characters observe different shapes at different positions inside ligatures and words. On the other hand, labeling any dataset with ligatures is a relatively easier and more accurate phenomenon. In the current paper, we propose a novel gated BLSTM (GBLSTM) model for recognition of printed Urdu Nastaleeq text based on ligature information. Our proposed model incorporates raw pixel values as features instead of human crafted features, because of the latter being more error prone. The model is trained on un-degraded and tested on unseen artificially degraded versions of Urdu printed text images dataset. The recognition accuracy of the proposed GBLSTM model is 96.71% that is higher than the prevalent Urdu optical character recognition systems.
Similar content being viewed by others
Notes
UPTIs dataset is provided by faisal.shafait@uwa.edu.au and adnan@cs.uni-kl.de.
References
Naz, S., Umar, A.I., Shirazi, S.H., Khan, S.A., Ahmed, I., Khan, A.A.: Challenges of Urdu named entity recognition: a scarce resourced language. Res. J. Appl. Sci. Eng. Technol. 8(10), 1272–1278 (2014)
Daud, A., Khan, W., Che, D.: Urdu language processing: a survey. Artif. Intell. Rev. (2016). doi:10.1007/s10462-016-9482-x
Weber, G.: Top languages. World 10 (2008)
Naz, S., Hayat, K., Razzak, M.I., Anwar, M.W., Madani, S.A., Khan, S.U.: The optical character recognition of Urdu-like cursive scripts. Pattern Recognit. 47(3), 1229–1248 (2014)
Javed, S.T., Hussain, S., Maqbool, A., Asloob, S., Jamil, S., Moin, H.: Segmentation free Nastalique Urdu OCR. World Acad. Sci. Eng. Technol. 46, 456–461 (2010)
Husain, S.A.: A multi-tier holistic approach for Urdu Nastaleeq recognition. In: International Multi-topic Conference, 2002. INMIC 2002. Abstracts, pp. 84–84. IEEE (2002)
Ahmad, I., Wang, X., Li, R., Rasheed, S.: Offline Urdu Nastaleeq optical character recognition based on stacked denoising autoencoder. China Commun. 14(1), 146–157 (2017)
Sabbour, N., Shafait, F.: A segmentation-free approach to Arabic and Urdu OCR. IS&T/SPIE Electron. Imaging Int. Soc. Opt. Photon. (2013) doi:10.1117/12.2003731
Lehal, G.S., Rana, A.: Recognition of Nastalique Urdu ligatures. In: Proceedings of the 4th International Workshop on Multilingual OCR, p. 7. ACM, New York (2013)
El-Korashy, A., Shafait, F.: Search space reduction for holistic ligature recognition in Urdu Nastalique script. In: 2013 12th International Conference on Document Analysis and Recognition (ICDAR), pp. 1125–1129. IEEE (2013)
Hussain, D., Hassan, D., Guo, P.: Improved Arabic word classification using spatial pyramid matching method. In: International Conference Image and Vision Computing New Zealand (IVCNZ, 2011)
Hussain, S., Ali, S., Akram, Q.: Nastalique segmentation-based approach for Urdu OCR. Int. J. Doc. Anal. Recognit. 18(4), 357–374 (2015)
Javed, S.T., Hussain, S.: Segmentation based Urdu Nastalique OCR. In: Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications, pp. 41–49. Springer, Berlin (2013)
Ul-Hasan, A., Ahmed, S.B., Rashid, F., Shafait, F., Breuel, T.M.: Offline printed Urdu Nastaleeq script recognition with bidirectional LSTM networks. In: 2013 12th International Conference on Document Analysis and Recognition (ICDAR), pp. 1061–1065. IEEE (2013)
Naz, S., Umar, A.I., Ahmad, R., Ahmed, S.B., Shirazi, S.H., Razzak, M.I.: Urdu Nastaliq text recognition system based on multidimensional recurrent neural network and statistical features. Neural Comput. Appl. (2015). doi:10.1007/s00521-015-2051-4
Naz, S., Umar, A.I., Ahmad, R., Ahmed, S.B., Shirazi, S.H., Siddiqi, I., Razzak, M.I.: Offline cursive Urdu-Nastaliq script recognition using multidimensional recurrent neural networks. Neurocomputing (2015). doi:10.1016/j.neucom.2015.11.030
Ahmed, S.B., Naz, S., Razzak, M.I., Rashid, S.F., Afzal, M.Z., Breuel, T.M.: Evaluation of cursive and non-cursive scripts using recurrent neural networks. Neural Comput. Appl. (2015). doi:10.1007/s00521-015-1881-4
Naz, S., Ahmed, S.B., Ahmad, R., Razzak, M.I.: Zoning features and 2DLSTM for Urdu text-line recognition. Procedia Comput. Sci. 96, 16–22 (2016)
Naz, S., Umar, A.I., Ahmad, R., Razzak, M.I., Rashid, S.F., Shafait, F.: Urdu Nasta’liq text recognition using implicit segmentation based on multi-dimensional long short term memory neural networks. SpringerPlus 5(1), 2010 (2016)
Naz, S., Umar, A.I., Ahmad, R., Siddiqi, I., Ahmed, S.B., Razzak, M.I., Shafait, F.: Urdu Nastaliq recognition using convolutional-recursive deep learning. Neurocomputing 243, 80–87 (2017)
Graves, A.: Supervised Sequence Labelling. Springer, Berlin (2012)
Satti, D.A., Saleem, K.: Complexities and implementation challenges in offline Urdu Nastaliq OCR. In: Proceedings of the Conference on Language and Technology, pp. 85–91 (2012)
Sarfraz, H., Dilawari, A., Hussain, S.: Assessing Urdu language support on the multilingual web. In: Proceedings of the 12th AMIC Annual Conference on e-Worlds: Governments, Business and Civil Society. Asian Media Information Center, Singapore (2003)
Bridle, J.S.: Probabilistic interpretation of feed forward classification network outputs, with relationships to statistical pattern recognition. Neurocomputing (1990). doi:10.1007/978-3-642-76153-9_28
Canyameres Masip, S., López Peña, A.M., et al.: On the use of convolutional neural networks for pedestrian detection (2015)
Graves, A., Liwicki, M., Fernández, S., Bertolami, R., Bunke, H., Schmidhuber, J.: A novel connectionist system for unconstrained handwriting recognition. IEEE Trans. Pattern Anal. Mach. Intell. 31(5), 855–868 (2009)
Wu, D.: Human action recognition using deep probabilistic graphical models. PhD Thesis, University of Sheffield (2014)
Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.-A.: Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th International Conference on Machine Learning, pp. 1096–1103. ACM, New York (2008)
Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)
Plamondon, R., Srihari, S.N.: Online and off-line handwriting recognition: a comprehensive survey. IEEE Trans. Pattern Anal. Mach. Intell. 22(1), 63–84 (2000)
Jaeger, H.: Tutorial on Training Recurrent Neural Networks, Covering BPPT, RTRL, EKF and the “echo state network” Approach. GMDForschungszentrum Informationstechnik (2002)
Senior, A., Robinson, T.: Forward–backward retraining of recurrent neural networks. In: Advances in Neural Information Processing Systems, pp. 743–749 (1996)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Graves, A.: Offline Arabic handwriting recognition with multidimensional recurrent neural networks. In: Guide to OCR for Arabic Scripts, pp. 297–313. Springer, London (2012)
Graves, A., Schmidhuber, J.: Offline handwriting recognition with multidimensional recurrent neural networks. Adv. Neural Inf. Process. Syst. 19, 545–552 (2009)
Graves, A., Fernández, S., Schmidhuber, J.: Bidirectional LSTM networks for improved phoneme classification and recognition. In: International Conference on Artificial Neural Networks, pp. 799–804. Springer, Berlin (2005)
Graves, A., Liwicki, M., Bunke, H., Schmidhuber, J., Fern’andez, S.: Unconstrained on-line handwriting recognition with recurrent neural networks. In: Advances in Neural Information Processing Systems, pp. 577–584 (2008)
Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45(11), 2673–2681 (1997)
Wojna, Z.: Fast methods in training deep neural networks for image recognition. PhD Thesis, University College London (2015)
Lian, Z., Jing, X., Wang, X., Huang, H., Tan, Y., Cui, Y.: DropConnect regularization method with sparsity constraint for neural networks. Chin. J. Electron. 25(1), 152–158 (2016)
Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: Proceedings of the fourteenth international conference on artificial intelligence and statistics (AISTATS-11) (G.J. Gordon and D.B. Dunson, eds.). J. Mach. Learn. Res. Workshop Conf. Proc. 15, 315–323 (2011)
Baird, H.S.: Document image defect models. In: Structured Document Image Analysis, pp. 546–556. Springer, New York (1992)
Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 369–376. ACM, New York (2006)
Ahmad, I., Wang, X., Li, R., Ahmad, M., Ullah, R.: Line and ligature segmentation of Urdu Nastaleeq text. IEEE Access (2017). doi:10.1109/ACCESS.2017.2703155
Acknowledgements
The authors would like to thank Dr. Ruifan Li, Center for Intelligence of Science and Technology (CIST), Beijing University of Posts and Telecommunications, China. We would also like to thank Mr. Mohammad Saad Khan, Beijing University of Posts and Telecommunications, China for his guidance and unprecedented help.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ahmad, I., Wang, X., Mao, Y.h. et al. Ligature based Urdu Nastaleeq sentence recognition using gated bidirectional long short term memory. Cluster Comput 21, 703–714 (2018). https://doi.org/10.1007/s10586-017-0990-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-017-0990-5