Skip to main content

Efficient Algorithms for Text Lines and Words Segmentation for Recognition of Arabic Handwritten Script

  • Conference paper
  • First Online:
Emerging Research in Computing, Information, Communication and Applications

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 882))

Abstract

A new methodology for Arabic handwritten document images segmentation is done in this paper to segment the documents into distinct entities as words and text lines. Based on features of Arabic scripts, the document images are divided into three main subsets of connected components where the Hough transform method is applied to them to achieve text lines segmentation. To enhance the result by avoiding the Hough transform text line detection failure, the authors used a method in postprocessing stage based on skeletonization that covers the possible false correction alarms to create proficiency vertical connected characters’ segmentation. The segmentation of the Arabic words is pointed as a two-class problem. The authors used fusion of convex and Euclidean distance metrics to calculate the distance between neighboring overlapped components, which in the Gaussian mixture modeling framework is classified as a distance of an intra-word or as an inter-word. The proposed method performance is depended on a constant and particular evaluation method that appropriate measures of the performance used to compare the segmentation of our result against the other strong researcher result. The proposed method showed higher efficiency and accuracy in the experimentation, which was conducted on two various Arabic handwriting datasets that are IFN/ENIT and AHDB.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Amani, A. A. A., & Suresha, M. (2017). A novel approach to correction of a skew at document level using an Arabic script. International Journal of Computer Science and Information Technologies, 8(5), 569–573.

    Google Scholar 

  2. Adiguzel, H., Sahin, E., & Duygulu, P. (2012). A hybrid for line segmentation in handwritten documents. In International Conference on Frontiers in Handwriting Recognition (ICFHR), (pp. 503–508), September 18–20, 2012.

    Google Scholar 

  3. AlKhateeb, J. H., Jiang, J., Ren, J., & Ipson, S. (2009). Interactive knowledge discovery for baseline estimation and word segmentation in handwritten Arabic text. In M. A. Strangio (Ed.), Recent Advances in Technologies.

    Google Scholar 

  4. Al-Dmour, A., & Fraij, F. (2014). Segmenting Arabic handwritten documents into text lines and words. International Journal of Advancements in Computing Technology (IJACT), 6(3), 109–119.

    Google Scholar 

  5. Al-Maadeed, S., Elliman, D., & Higgins, C. A. (2004). A data base for Arabic handwritten text recognition research. International Arab Journal of Information Technology, 1, 117–121.

    Google Scholar 

  6. Al-Maadeed, S., Elliman, D., & Higgins, C. A. (2002). A database for Arabic handwritten text recognition research. In Proceedings of the 8th International Workshop on Frontiers in Handwriting Recognition (pp. 485–489).

    Google Scholar 

  7. Ataer, E., & Duygulu, P. (2006). Retrieval of Ottoman documents. In Proceedings of the Eighth ACM SIGMM International Workshop on Multimedia Information Retrieval, 26–27, Santa Barbara, CA, USA, October 2006.

    Google Scholar 

  8. Boussellaa, W., & Zahour, A., et al. (2010). Unsupervised block covering analysis for text-line segmentation of Arabic ancient handwritten document images. In 20th International Conference on Pattern Recognition (ICPR). New York: IEEE.

    Google Scholar 

  9. Khandelwal, A., Choudhury, P., Sarkar, R., Basu, S., Nasipuri, M., & Das, N. (2009). Text line segmentation for unconstrained handwritten document images using neighborhood connected component analysis. In: Pattern Recognition and Machine Intelligence (pp. 369–374). Berlin: Springer.

    Google Scholar 

  10. Khayyat, M., Lam., L., Suen, C. Y., Yin., F., & Liu, C. L. (2012, March). Arabic handwritten text line extraction by applying an adaptive mask to morphological dilation. In 10th IAPR International Workshop on Document Analysis Systems (DAS) (pp. 100–104). New York: IEEE.

    Google Scholar 

  11. Kumar, J., Abd-Almageed, W., Kang, L., & Doermann, D. (2010, June). Handwritten Arabic text line segmentation using affinity propagation. In Proceedings of the 9th IAPR International Workshop on Document Analysis Systems (pp. 135–142). New York: ACM.

    Google Scholar 

  12. Likforman-Sulem, L., Hanimyan, A., & Faure, C. (1995). A Hough based algorithm for extracting text lines in handwritten documents. In Proceedings of the Third International Conference on Document Analysis and Recognition (pp. 774–777), Montreal, Canada.

    Google Scholar 

  13. Louloudis, G., Gatos, B., & Pratikakis, I. (2008, August). Line and word segmentation of handwritten documents. In International Conference on Frontiers in Handwriting Recognition (ICFHR’08) (pp. 247–252), Montreal, Canada.

    Google Scholar 

  14. Louloudis, G., Halatsis, K., Gatos, B., & Pratikakis, I. (2006, October). A block-based Hough transform mapping for text line detection in handwritten documents. In The 10th International Workshop on Frontiers in Handwriting Recognition (IWFHR) (pp. 515–520), La Baule.

    Google Scholar 

  15. Mahadevan, U., & Nagabushnam, R. C. (1995). Gap metrics for word separation in handwritten lines. In The Third International Conference on Document Analysis and Recognition (pp. 124–127), Montreal, Canada.

    Google Scholar 

  16. Marin, J. M., Mengersen, K., & Robert, C. P. (2005). Bayesian modelling and inference on mixtures of distributions, handbook of statistics (Vol. 25). Amsterdam: Elsevier-Sciences.

    Google Scholar 

  17. Pechwitz, M., Maddouri, S. S., Maergner, V., Ellouze, N., & Amiri, H. (2002). IFN/ENIT-database of handwritten Arabic words. In Proceedings of CIFED (Vol. 2, pp. 127–136).

    Google Scholar 

  18. Phillips, I., & Chhabra, A. (1999). Empirical performance evaluation of graphics recognition systems. IEEE Transaction of Pattern Analysis and Machine Intelligence, 21(9), 849–870.

    Article  Google Scholar 

  19. Seni, G., & Cohen, E. (1994). External word segmentation of off-line handwritten text lines. Pattern Recognition, 27(1), 41–52.

    Article  Google Scholar 

  20. Shi, Z., & Govindaraju, V. (2004). Line separation for complex document images using fuzzy run length. In First International Workshop on Document Image Analysis for Libraries (p. 306).

    Google Scholar 

  21. Vinciarelli, A., & Luettin, J. (2001). A new normalization technique for cursive handwritten words. Pattern Recognition Letters, 2(9), 1043–1050.

    Article  Google Scholar 

  22. Wshah, S., & Shi, Z., et al. (2009). Segmentation of Arabic handwriting based on both contour and skeleton segmentation. In 10th International Conference on Document Analysis and Recognition, ICDAR’09. New York: IEEE.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Amani Ali Ahmed Ali .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ali, A.A.A., Suresha, M. (2019). Efficient Algorithms for Text Lines and Words Segmentation for Recognition of Arabic Handwritten Script. In: Shetty, N., Patnaik, L., Nagaraj, H., Hamsavath, P., Nalini, N. (eds) Emerging Research in Computing, Information, Communication and Applications. Advances in Intelligent Systems and Computing, vol 882. Springer, Singapore. https://doi.org/10.1007/978-981-13-5953-8_32

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-5953-8_32

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-5952-1

  • Online ISBN: 978-981-13-5953-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics