Abstract
This chapter describes a complete system for the recognition of unconstrained handwritten Arabic words using over-segmentation of characters and a variable duration hidden Markov model (VDHMM). First, a segmentation algorithm based on morphology and linguistic information is used to translate the 2D image into a 1D sequence of subcharacter symbols. This sequence of symbols is modeled by one single contextual VDHMM. Generally, there are two information sources associated with the written text: shape information and linguistic information. Forty-five features are selected to represent the shape information of character and subcharacter symbols in the feature space. The shape information of each character symbol, i.e., a feature vector, is modeled as an independently distributed multivariate discrete distribution or a joint continuous distribution. Linguistic knowledge about character transition is modeled as a Markov chain, where each character in the alphabet is a state and bigram probabilities are the state transition probabilities. In this context, the variable duration state is used to take care of the segmentation ambiguity among the consecutive characters. We outline the substantial effort that has been expended to create a corpus of handwritten Arabic words and characters extracted from these handwritten words. Using this corpus and the IFN dataset 2003, detailed experimental results are described to demonstrate the success of the proposed scheme.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abuhaiba, I.S.I., Holt, M.J.J., Datta, S.: Processing of off-line handwritter text polygonal approximation and enforcement of temporal information. CVGIP Graph. Model Image Process. 56(4), 324–335 (1994)
Al-Badr, B., Mahmoud, S.: Survey and bibliography of Arabic optical text recognition. Signal Process. 41, 49–77 (1995)
Alma’adeed, C.H.S., Elliman, D.: Recognition of offline handwritten Arabic words using hidden Markov model approach. In: Proceedings of 16th International Conference on Pattern Recognition, vol. 3, pp. 481–484 (2002)
Almuallim, H., Yamaguchi, S.: A method of recognition of Arabic cursive handwriting. IEEE Trans. Pattern Anal. Mach. Intel. 9(5), 715–722 (1987)
Amin, A.: Arabic character recognition. In: Bunke, H., Wang, P.S.P. (eds.) Handbook of Character Recognition and Document Image Analysis, pp. 397–420. World Scientific, Singapore (1997)
Amin, A.: Off-line Arabic character recognition: the state of the art. Pattern Recognit. 31(5), 517–530 (1998)
Amin, A., Al-Sadoun, H., Fischer, S.: Hand printed character recognition system using artificial network. Pattern Recogn. 29(4) (1996)
Amin, A., Mari, J.: Machine recognition and correction of printed Arabic text. IEEE Trans. Syst. Man Cybern. 19(5), 1300–1306 (1989)
Bazzi, I., Schwartz, R., Makhoul, J.: An omni-font open vocabulary system for English and Arabic. IEEE Trans. Pattern Anal. Mach. Intell. 21(46), 482–494 (1999)
Chen, M.Y., Kundu, A., Srihari, S.N.: Variable duration HMM and morphological segmentation for handwritten word recognition. IEEE Trans. Image Process. 4(12), 1675–1688 (1995)
Chen, M.Y., Kundu, A., Zhou, J.: Off-line handwritten word recognition using a hidden Markov model type stochastic network. IEEE Trans. Pattern Anal. Mach. Intell. 16(5), 481–496 (1994)
Graff, D.: Arabic Gigaword LDC2003T12. CD-ROM. Linguistic Data Consortium, Philadelphia (2003)
Hamdani, M., Abed, H.E., Hamdani, T.M., Märgner, V., Alimi, A.M.: Improving a HMM based offline handwriting recognition system using MME-PSO optimization. In: Proceedings of SPIE Document Recognition and Retrieval, XV111, Conference, vol. 7874, p. 787408 (2011)
Kanoun, S., Ennaji, A., Lecourtier, Y., Alimi, A.M.: Linguistic integration information in the AABTAS Arabic text analysis system. In: Proc. of International Workshop on Frontiers in Handwriting Recognition (IWFHR), Aug. 2002, vol. 8, pp. 389–394 (2002)
Khorsheed, M.S., Clocksin, W.F.: Off-line Arabic word recognition using a hidden Markov model. In: Statistical Methods for Image Processing—A Satellite Conference of the 52nd ISI Session, Uppsala (1999)
Khorsheed, M.S., Clocksin, W.F.: Multi-font Arabic word recognition using spectral features. In: Proc. of International on Pattern Recognition, vol. 4, pp. 543–546 (2000)
Kundu, A., Phillips, J., Hines, T., Huyck, B., Van Guilder, L.C.: Arabic handwriting recognition using variable duration HMM. In: Proceedings of International Conference of Document Analysis and Recognition (ICDAR), pp. 644–648, Brazil, Sep. 2007
Llolje, A., Levinson, S.E.: Development of an acoustic-phonetic hidden Markov model for continuous speech recognition. IEEE Trans. Signal Process. 39(1), 29–39 (1991)
Lorigo, L.M., Govindaraju, V.: Offline Arabic handwriting recognition: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 28(5) (2006)
Obaid, A.M.: Arabic handwritten character recognition by neural nets. J. Commun. 45 (1994)
Pechwitz, M., Märgner, V.: Baseline estimation for Arabic handwritten words. In: Proc. of International Workshop on Frontiers in Handwriting Recognition (IWFHR), Aug. 2002, vol. 8, pp. 479–484 (2002)
Pechwitz, et al.: HMM based approach for handwritten Arabic word recognition using the IFN/ENIT-database. In: Proc. of 7th International Conference on Document Analysis and Recognition (ICDAR) (2003)
Rabiner, L.: A tutorial on HMM and selected applications in speech recognition. Proc. IEEE 77(2), 257–286 (1989)
Shlens, J.: A tutorial on principal component analysis. http://www.cs.cmu.edu/elaw/papers/pca.pdf pp. 1–13 (2007)
Wagner, R.A., Fischer, M.J.: The string-to-string correction problem. J. ACM 21, 168–173 (1974)
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
Acknowledgements
We would like to thank Linda Van Guilder, Ben Huyck, and Jon Phillips for their work on the VDHMM system, and Jon again for his work on the HTK system.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag London
About this chapter
Cite this chapter
Kundu, A., Hines, T. (2012). Arabic Handwriting Recognition Using VDHMM and Over-segmentation. In: Märgner, V., El Abed, H. (eds) Guide to OCR for Arabic Scripts. Springer, London. https://doi.org/10.1007/978-1-4471-4072-6_21
Download citation
DOI: https://doi.org/10.1007/978-1-4471-4072-6_21
Publisher Name: Springer, London
Print ISBN: 978-1-4471-4071-9
Online ISBN: 978-1-4471-4072-6
eBook Packages: Computer ScienceComputer Science (R0)