Advertisement

Using Scale-Space Anisotropic Smoothing for Text Line Extraction in Historical Documents

  • Rafi CohenEmail author
  • Itshak Dinstein
  • Jihad El-Sana
  • Klara Kedem
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8814)

Abstract

Text line extraction is vital pre-requisite for various document processing tasks. This paper presents a novel approach for text line extraction which is based on Gaussian scale space and dedicated binarization that utilize the inherent structure of smoothed text document images. It enhances the text lines in the image using multi-scale anisotropic second derivative of Gaussian filter bank at the average height of the text line. It then applies a binarization, which is based on component-tree and is tailored towards line extraction. The final stage of the algorithm is based on an energy minimization framework for removing spurious text line and assigning connected components to lines. We have tested our approach on various datasets written in different languages at range of image quality and received high detection rates, which outperform state-of-the-art algorithms. Our MATLAB code is publicly available. (http://www.cs.bgu.ac.il/~rafico/LineExtraction.zip)

Keywords

Historical document processing Text lines extraction 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Baechler, M., Liwicki, M., Ingold, R.: Text line extraction using dmlp classifiers for historical manuscripts. In: ICDAR, pp. 1029–1033 (2013)Google Scholar
  2. 2.
    Bar-Yosef, I., Hagbi, N., Kedem, K., Dinstein, I.: Text Line Segmentation for Degraded Handwritten Historical Documents. In: ICDAR, pp. 1161–1165 (2009)Google Scholar
  3. 3.
    Biller, O., Kedem, K., Dinstein, I., El-Sana, J.: Evolution maps for connected components in text documents. In: ICFHR, pp. 403–408 (2012)Google Scholar
  4. 4.
    Bukhari, S.S., Shafait, F., Breuel, T.M.: Script-independent handwritten textlines segmentation using active contours. In: ICDAR, pp. 446–450 (2009)Google Scholar
  5. 5.
    Cohen, R., Asi, A., Kedem, K., El-Sana, J., Dinstein, I.: Robust text and drawing segmentation algorithm for historical documents. In: HIP, pp. 110–117 (2013)Google Scholar
  6. 6.
    Delong, A., Osokin, A., Isack, H.N., Boykov, Y.: Fast approximate energy minimization with label costs. IJCV 96(1), 1–27 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
  7. 7.
    Diem, M., Kleber, F., Sablatnig, R.: Text line detection for heterogeneous documents. In: ICDAR, pp. 743–747 (2013)Google Scholar
  8. 8.
    Gatos, B., Stamatopoulos, N., Louloudis, G.: ICDAR2009 handwriting segmentation contest. IJDAR 14(1), 25–33 (2011)CrossRefGoogle Scholar
  9. 9.
    Gatos, B., Stamatopoulos, N., Louloudis, G.: ICFHR 2010 handwriting segmentation contest. In: ICFHR, pp. 737–742 (2010)Google Scholar
  10. 10.
    Li, Y., Zheng, Y., Doermann, D., Jaeger, S.: Script-independent text line segmentation in freestyle handwritten documents. IEEE TPAMI 30(8), 1313–1329 (2008)CrossRefGoogle Scholar
  11. 11.
    Lindeberg, T.: Feature detection with automatic scale selection. IJCV 30(2), 79–116 (1998)CrossRefGoogle Scholar
  12. 12.
    Naegel, B., Wendling, L.: A document binarization method based on connected operators. Pattern Recognition Letters 31(11), 1251–1259 (2010)CrossRefGoogle Scholar
  13. 13.
    Rabaev, I., Biller, O., El-Sana, J., Kedem, K., Dinstein, I.: Text line detection in corrupted and damaged historical manuscripts. In: ICDAR, pp. 812–816 (2013)Google Scholar
  14. 14.
    Saabni, R., Asi, A., El-Sana, J.: Text line extraction for historical document images. Pattern Recognition Letters 35, 23–33 (2014)CrossRefGoogle Scholar
  15. 15.
    Shi, Z., Setlur, S., Govindaraju, V.: A steerable directional local profile technique for extraction of handwritten arabic text lines. In: ICDAR, pp. 176–180 (2009)Google Scholar
  16. 16.
    Stamatopoulos, N., Gatos, B., Louloudis, G., Pal, U., Alaei, A.: ICDAR 2013 handwriting segmentation contest. In: ICDAR, pp. 1402–1406 (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Rafi Cohen
    • 1
    Email author
  • Itshak Dinstein
    • 2
  • Jihad El-Sana
    • 1
  • Klara Kedem
    • 1
  1. 1.Department of Computer ScienceBen-Gurion UniversityBeer-ShevaIsrael
  2. 2.Department of Electrical and Computer EngineeringBen-Gurion UniversityBeer-ShevaIsrael

Personalised recommendations