Advertisement

Overlapping and multi-touching text-line segmentation by Block Covering analysis

  • Abderrazak Zahour
  • Brunco Taconet
  • Laurence Likforman-SulemEmail author
  • Wafa Boussellaa
Theoretical Advances

Abstract

This paper presents a new approach for text-line segmentation based on Block Covering which solves the problem of overlapping and multi-touching components. Block Covering is the core of a system which processes a set of ancient Arabic documents from historical archives. The system is designed for separating text-lines even if they are overlapping and multi-touching. We exploit the Block Covering technique in three steps: a new fractal analysis (Block Counting) for document classification, a statistical analysis of block heights for block classification and a neighboring analysis for building text-lines. The Block Counting fractal analysis, associated with a fuzzy C-means scheme, is performed on document images in order to classify them according to their complexity: tightly (closely) spaced documents (TSD) or widely spaced documents (WSD). An optimal Block Covering is applied on TSD documents which include overlapping and multi-touching lines. The large blocks generated by the covering are then segmented by relying on the statistical analysis of block heights. The final labeling into text-lines is based on a block neighboring analysis. Experimental results provided on images of the Tunisian Historical Archives reveal the feasibility of the Block Covering technique for segmenting ancient Arabic documents.

Keywords

Block covering Text-line segmentation Overlapping and multi-touching lines Block Counting Ancient Arabic documents 

References

  1. 1.
  2. 2.
    Kolcz A, Alspector J, Augusteyn M, Carlson R, Viorel Popescu G (2000) A line-oriented approach to word spotting in handwritten documents. Pattern Anal Appl 3:155–168CrossRefGoogle Scholar
  3. 3.
    Lakshmi CV, Patvardhan C (2004) An optical character recognition system for printed Telugu text. Pattern Anal Appl 7:190–204MathSciNetGoogle Scholar
  4. 4.
    Likforman-Sulem L, Zahour A, Taconet B (2007) Text line segmentation of historical documents: a survey. IJDAR 9(2–4):123–138Google Scholar
  5. 5.
    Abuhaiba ISI, Datta S, Holt MJJ (2005) Line extraction and stroke ordering of text pages. In: Proceedings of ICDAR’05, Seoul (South Korea), pp 390–393Google Scholar
  6. 6.
    Oztop E, Mulayim AY, Atalay V, Yarman-Vural F (1999) Repulsive attractive network for baseline extraction on document images. Signal Process 75:1–10CrossRefGoogle Scholar
  7. 7.
    Li Y, Zheng Y, Doermann D (2006) Detecting text lines in handwritten documents. In: Proceedings of ICPR’06, Hong Kong, pp 1030–1033Google Scholar
  8. 8.
    Khorsheed MS (2002) Off-Line Arabic character recognition—a review. Pattern Anal Appl 5:31–45CrossRefMathSciNetGoogle Scholar
  9. 9.
    Lorigo LM, Govindaraju V (2006) Off-line Arabic handwriting recognition—a survey. IEEE PAMI 28(5):712–724Google Scholar
  10. 10.
    Arivazhagan M, Srinivasan H, Srihari S (2007) A statistical approach to line segmentation in handwritten documents. In: Proceedings of Document Recognition and Retrieval XIV, IST&SPIE, San JoseGoogle Scholar
  11. 11.
    Zahour A, Taconet B, Mercy P, Ramdane S (2001) Arabic hand-written text-line extraction. In: Proceedings of ICDAR’01, 10–13 Sept., Seattle, USA, pp 281–285Google Scholar
  12. 12.
    Amin A, Fischer S (2000) A document skew detection method using the Hough transform. Pattern Anal Appl 3:243–253zbMATHCrossRefGoogle Scholar
  13. 13.
    Boussellaa W, Zahour A, El Abed H (2006) A concept for the separation of foreground/background in Arabic historical manuscripts using hybrid methods. In: Ioannides M, Arnold D, Niccolucci F, Mania K (eds) Proceedings of the 7th internat. symp. on virtual reality, archaeology and cultural heritage VAST, pp 1–5Google Scholar
  14. 14.
    Dodson M, Kristensen S (2004) Hausdorff dimension and diophantine approximation. Fractal geometry and applications: a jubilee of Benoit Mandelbrot. Part 1. Proceedings of Sympos. Pure Math., vol 72, Part 1, Amer. Math. Soc., Providence, pp 305–347Google Scholar
  15. 15.
    Boulétreau V, Vincent N, Emptoz H, Sabourin R (2000) How to use fractal dimension to qualify writings and writers. Fractals Complex Geometry Patterns Scaling Nat Soc 8(1):85–98CrossRefGoogle Scholar
  16. 16.
    Vincent N, Emptoz H (1995) A classification of writing based on fractals. In: Novak MM (ed) Fractal reviews in the natural and applied sciences. Chapman & Hall, London, pp 320–331Google Scholar
  17. 17.
    Ben Moussa S, Zahour A, Alimi MA, Benabdelhafid A (2005) Can fractal dimension be used in font classification. In: Proceedings of ICDAR 2005, Seoul (South Korea)Google Scholar
  18. 18.
    Hausdorff F (1919) Dimension und äußeres Maß. Math Ann 79:157CrossRefMathSciNetGoogle Scholar
  19. 19.
    Wu S, Chow TWS (2005) Clustering of the self-organizing map using a clustering validity index based on inter and intra-cluster density. Pattern Recognit 37(2):175–188CrossRefGoogle Scholar
  20. 20.
    Falconer K (1997) Techniques in fractal geometry. Willey, New York, ISBN 0–471-92287-0Google Scholar

Copyright information

© Springer-Verlag London Limited 2008

Authors and Affiliations

  • Abderrazak Zahour
    • 1
  • Brunco Taconet
    • 1
  • Laurence Likforman-Sulem
    • 2
    Email author
  • Wafa Boussellaa
    • 3
  1. 1.IUTUniversité du Havre/GEDLe HavreFrance
  2. 2.TELECOM ParisTech/TSI and CNRS-LTCIParisFrance
  3. 3.Université de Sfax, REGIMSfax (BPW)Tunisia

Personalised recommendations