Abstract
A new methodology for Arabic handwritten document images segmentation is done in this paper to segment the documents into distinct entities as words and text lines. Based on features of Arabic scripts, the document images are divided into three main subsets of connected components where the Hough transform method is applied to them to achieve text lines segmentation. To enhance the result by avoiding the Hough transform text line detection failure, the authors used a method in postprocessing stage based on skeletonization that covers the possible false correction alarms to create proficiency vertical connected characters’ segmentation. The segmentation of the Arabic words is pointed as a two-class problem. The authors used fusion of convex and Euclidean distance metrics to calculate the distance between neighboring overlapped components, which in the Gaussian mixture modeling framework is classified as a distance of an intra-word or as an inter-word. The proposed method performance is depended on a constant and particular evaluation method that appropriate measures of the performance used to compare the segmentation of our result against the other strong researcher result. The proposed method showed higher efficiency and accuracy in the experimentation, which was conducted on two various Arabic handwriting datasets that are IFN/ENIT and AHDB.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Amani, A. A. A., & Suresha, M. (2017). A novel approach to correction of a skew at document level using an Arabic script. International Journal of Computer Science and Information Technologies, 8(5), 569–573.
Adiguzel, H., Sahin, E., & Duygulu, P. (2012). A hybrid for line segmentation in handwritten documents. In International Conference on Frontiers in Handwriting Recognition (ICFHR), (pp. 503–508), September 18–20, 2012.
AlKhateeb, J. H., Jiang, J., Ren, J., & Ipson, S. (2009). Interactive knowledge discovery for baseline estimation and word segmentation in handwritten Arabic text. In M. A. Strangio (Ed.), Recent Advances in Technologies.
Al-Dmour, A., & Fraij, F. (2014). Segmenting Arabic handwritten documents into text lines and words. International Journal of Advancements in Computing Technology (IJACT), 6(3), 109–119.
Al-Maadeed, S., Elliman, D., & Higgins, C. A. (2004). A data base for Arabic handwritten text recognition research. International Arab Journal of Information Technology, 1, 117–121.
Al-Maadeed, S., Elliman, D., & Higgins, C. A. (2002). A database for Arabic handwritten text recognition research. In Proceedings of the 8th International Workshop on Frontiers in Handwriting Recognition (pp. 485–489).
Ataer, E., & Duygulu, P. (2006). Retrieval of Ottoman documents. In Proceedings of the Eighth ACM SIGMM International Workshop on Multimedia Information Retrieval, 26–27, Santa Barbara, CA, USA, October 2006.
Boussellaa, W., & Zahour, A., et al. (2010). Unsupervised block covering analysis for text-line segmentation of Arabic ancient handwritten document images. In 20th International Conference on Pattern Recognition (ICPR). New York: IEEE.
Khandelwal, A., Choudhury, P., Sarkar, R., Basu, S., Nasipuri, M., & Das, N. (2009). Text line segmentation for unconstrained handwritten document images using neighborhood connected component analysis. In: Pattern Recognition and Machine Intelligence (pp. 369–374). Berlin: Springer.
Khayyat, M., Lam., L., Suen, C. Y., Yin., F., & Liu, C. L. (2012, March). Arabic handwritten text line extraction by applying an adaptive mask to morphological dilation. In 10th IAPR International Workshop on Document Analysis Systems (DAS) (pp. 100–104). New York: IEEE.
Kumar, J., Abd-Almageed, W., Kang, L., & Doermann, D. (2010, June). Handwritten Arabic text line segmentation using affinity propagation. In Proceedings of the 9th IAPR International Workshop on Document Analysis Systems (pp. 135–142). New York: ACM.
Likforman-Sulem, L., Hanimyan, A., & Faure, C. (1995). A Hough based algorithm for extracting text lines in handwritten documents. In Proceedings of the Third International Conference on Document Analysis and Recognition (pp. 774–777), Montreal, Canada.
Louloudis, G., Gatos, B., & Pratikakis, I. (2008, August). Line and word segmentation of handwritten documents. In International Conference on Frontiers in Handwriting Recognition (ICFHR’08) (pp. 247–252), Montreal, Canada.
Louloudis, G., Halatsis, K., Gatos, B., & Pratikakis, I. (2006, October). A block-based Hough transform mapping for text line detection in handwritten documents. In The 10th International Workshop on Frontiers in Handwriting Recognition (IWFHR) (pp. 515–520), La Baule.
Mahadevan, U., & Nagabushnam, R. C. (1995). Gap metrics for word separation in handwritten lines. In The Third International Conference on Document Analysis and Recognition (pp. 124–127), Montreal, Canada.
Marin, J. M., Mengersen, K., & Robert, C. P. (2005). Bayesian modelling and inference on mixtures of distributions, handbook of statistics (Vol. 25). Amsterdam: Elsevier-Sciences.
Pechwitz, M., Maddouri, S. S., Maergner, V., Ellouze, N., & Amiri, H. (2002). IFN/ENIT-database of handwritten Arabic words. In Proceedings of CIFED (Vol. 2, pp. 127–136).
Phillips, I., & Chhabra, A. (1999). Empirical performance evaluation of graphics recognition systems. IEEE Transaction of Pattern Analysis and Machine Intelligence, 21(9), 849–870.
Seni, G., & Cohen, E. (1994). External word segmentation of off-line handwritten text lines. Pattern Recognition, 27(1), 41–52.
Shi, Z., & Govindaraju, V. (2004). Line separation for complex document images using fuzzy run length. In First International Workshop on Document Image Analysis for Libraries (p. 306).
Vinciarelli, A., & Luettin, J. (2001). A new normalization technique for cursive handwritten words. Pattern Recognition Letters, 2(9), 1043–1050.
Wshah, S., & Shi, Z., et al. (2009). Segmentation of Arabic handwriting based on both contour and skeleton segmentation. In 10th International Conference on Document Analysis and Recognition, ICDAR’09. New York: IEEE.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Ali, A.A.A., Suresha, M. (2019). Efficient Algorithms for Text Lines and Words Segmentation for Recognition of Arabic Handwritten Script. In: Shetty, N., Patnaik, L., Nagaraj, H., Hamsavath, P., Nalini, N. (eds) Emerging Research in Computing, Information, Communication and Applications. Advances in Intelligent Systems and Computing, vol 882. Springer, Singapore. https://doi.org/10.1007/978-981-13-5953-8_32
Download citation
DOI: https://doi.org/10.1007/978-981-13-5953-8_32
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-5952-1
Online ISBN: 978-981-13-5953-8
eBook Packages: EngineeringEngineering (R0)