Advertisement

A Novel Baseline Detection Method of Handwritten Arabic-Script Documents Based on Sub-Words

  • Tarik Abu-Ain
  • Siti Norul Huda Sheikh Abdullah
  • Bilal Bataineh
  • Khairuddin Omar
  • Ashraf Abu-Ein
Part of the Communications in Computer and Information Science book series (CCIS, volume 378)

Abstract

Baseline detection is an important process in document image analysis and recognition systems. It is extensively used to many various preprocessing stages such as text normalization, skew correction, characters segmentation, slant and slop correction as well as in feature extraction. in this work, we proposed a new method for baseline detection based on horizontal projection histogram and directions features of subwords skeleton for Arabic script; which form the main component of the text that may consist of at least one letter, in addition of diacritic and dots. The efficiency of the proposed method is has been proven by the experiment’s results on an IFN/ENIT Arabic benchmark dataset.

Keywords

Preprocessing Text normalization Arabic handwriting Baseline detection Sub-word extraction 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    U. Nations, http://www.un.org (March 13, 2013)
  2. 2.
    Abu-Ain, T.A.H., Abu-Ain, W.A.H., Sheikh Abdullah, S.N.H., Omar, K.: Off-line Arabic Character-Based Writer Identification – a Survey. In: International Journal on Advanced Science, Engineering and Information Technology, pp. 161–166 (2011); Proceeding of the International Conference on Advanced Science, Engineering and Information Technology Bangi, MalaysiaGoogle Scholar
  3. 3.
    Bataineh, B., Abdullah, S.N.H.S., Omar, K.: Arabic calligraphy recognition based on binarization methods and degraded images. In: International Conference in Pattern Analysis and Intelligent Robotics (ICPAIR 2011), pp. 65–70 (2011)Google Scholar
  4. 4.
    Gacek, A.: Arabic Manuscripts: A Vademecum for Readers, BRILL (2009)Google Scholar
  5. 5.
    Parhami, B., Taraghi, M.: Automatic Recognition of Printed Farsi Texts. Presented at the Proc. Conf. Pattern Recognition, England (1980)Google Scholar
  6. 6.
    Pechwitz, M., Margner, V.: Baseline estimation for Arabic handwritten words. In: Proceeding in Eighth International Workshop on Frontiers and Handwriting Recognition, pp. 479–484 (2002)Google Scholar
  7. 7.
    Farooq, F., Govindaraju, V., Perrone, M.: Pre-processing methods for handwritten Arabic documents. In: Proceedings in Eighth International Conference on Document Analysis and Recognition, vol. 1, pp. 267–271 (2005)Google Scholar
  8. 8.
    Ziaratban, M., Faez, K.: A novel two-stage algorithm for baseline estimation and correction in Farsi and Arabic handwritten text line. In: 19th International Conference on Pattern Recognition, ICPR 2008, pp. 1–5 (2008)Google Scholar
  9. 9.
    Boubaker, H., Kherallah, M., Alimi, A.M.: New Algorithm of Straight or Curved Baseline Detection for Short Arabic Handwritten Writing. In: 10th International Conference on Document Analysis and Recognition, ICDAR 2009, pp. 778–782 (2009)Google Scholar
  10. 10.
    Boukerma, H., Farah, N.: A Novel Arabic Baseline Estimation Algorithm Based on Sub-Words Treatment. In: International Conference on Frontiers in Handwriting Recognition (ICFHR 2010), pp. 335–338 (2010)Google Scholar
  11. 11.
    Nagabhushan, P., Alaei, A.: Tracing and Straightening the Baseline in Handwritten Persian/Arabic Text-line: A New Approach Based on Painting-technique. International Journal on Computer Science and Engineering 2, 907–916 (2010)Google Scholar
  12. 12.
    Bataineh, B., Abdullah, S.N.H.S., Omar, K.: An adaptive local binarization method for document images based on a novel thresholding method and dynamic windows. Pattern Recognition Letters 32, 1805–1813 (2011)CrossRefGoogle Scholar
  13. 13.
    Linda, G.C.S., Shapiro, G.: Computer Vision. Prentice Hall (2002)Google Scholar
  14. 14.
    Abu-Ain, W., Abdullah, S.N.H.S., Bataineh, B., Abu-Ain, T., Omar, K.: Skeletonization Algorithm for Binary Images. In: International Conference on Electrical Engineering and Informatics, ICEEI 2013 (2013)Google Scholar
  15. 15.
    IFN/ENIT - Database of Arabic Handwritten words, T. U. Institute of Communications Technology, Braunschweig, Germany (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Tarik Abu-Ain
    • 1
  • Siti Norul Huda Sheikh Abdullah
    • 1
  • Bilal Bataineh
    • 1
  • Khairuddin Omar
    • 1
  • Ashraf Abu-Ein
    • 2
  1. 1.Pattern Recognition Research Group, Center for Artificial Intelligence Technology, Faculty of Information Science and TechnologyUniversiti Kebangsaan MalaysiaBangiMalaysia
  2. 2.Computer Engineering DepartmentAl-Balqa’ Applied University, Faculty of Engineering TechnologyAmmanJordan

Personalised recommendations