Text Region Extraction from Quality Degraded Document Images

  • S. Abirami
  • D. Manjula
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4815)


In this paper we present a well designed method that makes use of edge information to extract textual blocks from gray scale document images. It aims at detecting textual regions on heavy noise infected newspaper images and separate them from graphical regions. The algorithm traces the feature points in different entities and then groups those edge points of textual regions. Finally feature based connected component merging was introduced to gather homogeneous textual regions together within the scope of its bounding rectangles. The proposed method can be used to locate text in-group of newspaper images with multiple page layouts. Initial results are encouraging, then they are experimented with considerable number of newspaper images with different layout structures and promising results were obtained. This finds its major application in digital libraries for OCR where information can be of different quality depending on the age of the scanned paper.


Text Extraction Edge Detection Block Merging 


  1. 1.
    Tombre, K., Tabbone, S., Lamiroy, B.: Text / Graphics Separation Revisited (2002)Google Scholar
  2. 2.
    Chanda, S., Pal, U.: English, Devnagari and Urdu Text Identification (August 2003)Google Scholar
  3. 3.
    Kasthuri, R., O’ Gorman, L., Govindaraju, V.: Document Image Analysis – A Primer Google Scholar
  4. 4.
    Chan, C.H., Pau, L.F., Wang, P.S.P.: Handbook of Pattern Recognition & Computer Vision, 2nd edn. (1999)Google Scholar
  5. 5.
    Pavlidis, T., Zhou, J.: Page Segmentation and Classification. Computer Vision Graphics Image Processing 54(6), 484–496 (1992)CrossRefGoogle Scholar
  6. 6.
    Wang, D., Srihari, S.N.: Classification of newspaper image blocks using texture analysis. Computer Vision, Graphics, and Image Processing 47, 327–352 (1989)CrossRefGoogle Scholar
  7. 7.
    Fletcher, L.A., Kasturi, R.: A robust algorithm for text string separation from mixed text/graphics images. IEEE Transaction on Pattern Analysis and Machine Intelligence 10(6), 910–918 (1988)CrossRefGoogle Scholar
  8. 8.
    Le, D.X., Thoma, G.R.: Document classification using connectionist models. In: Proc. of IEEE International Conference on Neural Networks, Orlando, Florida, vol. 5, pp. 3009–3014 (June 1994) Google Scholar
  9. 9.
    Ohya, J., Shio, A., Akamatsu, S.: Recognizing Characters in Scene Images. IEEE Transaction on PAMI 16(2), 214–224 (1994)Google Scholar
  10. 10.
    Baird, W.S., Jones, S.E., Fortune, S.J.: Image segmentation by shape directed covers. In: Proc. Of ICPR, pp. 820–825 (1990)Google Scholar
  11. 11.
    Jain, K., Bhattacharjee, S.: Text Segmentation Using Gabor Filters for Automatic Document Processing. Machine Vision and Applications 5(3), 169–184 (1992)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • S. Abirami
    • 1
  • D. Manjula
    • 1
  1. 1.Department of Computer Science & Engg, Anna University, ChennaiIndia

Personalised recommendations