Classification of Handwritten Document Image into Text and Non-Text Regions

  • V. Vidya
  • T. R. Indhu
  • V. K. Bhadran
Conference paper
Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 222)


Segmentation of document image into text and non-text regions is an essential process in document layout analysis which is one of the preprocessing steps in optical character recognition. Usually handwritten documents has no specific layout. It may contain non text regions such as diagrams, graphics, tables etc. In this work we propose a novel approach to segment text and non text components in Malayalam handwritten document image using Simplified Fuzzy ARTMAP (SFAM) classifier. Binarized document image is dilated horizontally and vertically and merged together. Perform connected component labelling on the smeared image. A set of geometrical and statistical features are extracted from each component and given to SFAM for classifying it into text and non text components. Experimental results are promising and it can be extended to other scripts also.


Text and not text regions segmentation Simplified fuzzy ARTMAP 


  1. 1.
    Abd-Almageed W, Agrawal M, Seo W, David D (2008) Document-zone classification using partial least squares and hybrid classifiers. International conference on pattern recognition (ICPR) 2008Google Scholar
  2. 2.
    Keysers D, Shafait F, Breuel TM (2007) Document image zone classification- a simple high-performance approach. In Proceedings of 2nd international conference on computer vision theory and applications 2007Google Scholar
  3. 3.
    Moll MA, Baird HS, Chang An (2008) Truthing for pixel-accurate segmentation. In: Document analysis systems. The eighth IAPR international workshop 2008Google Scholar
  4. 4.
    Shafait F, Keysers D, Breuel TM (2006) Pixel-accurate representation and evaluation of page segmentation in document images. In: 18th international conference on pattern recognition 2006Google Scholar
  5. 5.
    Bukhari SS, Ali AlAzawi M, Shafait F (2010) Document image segmentation using discriminative learning over connected components. In: 9th IAPR workshop on document analysis systems 2010Google Scholar
  6. 6.
    Bloomberg S, Chen FR (1996) Extraction of text-related features for condensing image documents. In: SPIE conference on 2660, Document Recognition III 1996Google Scholar
  7. 7.
    Bukharia SS, Shafaitb F, Thomas M (2011) Breuela: improved document image segmentation algorithm using multi-resolution morphology. SPIE Document Recognition and Retrieval XVIII 2011Google Scholar
  8. 8.
    Sarkar R, Moulik S, Das N, Basu S, Nasipuri M, Kundu M (2011) Suppression of non-text components in handwritten document images. International conference on image information processing (ICIIP) 2011Google Scholar
  9. 9.
    Otsu N (1979) A threshold selection method from gray-level histogram. IEEE Trans Syst Man CybernGoogle Scholar
  10. 10.
    Ping Z, Lihui C, Alex KC (2000) Text document filters using morphological and geometrical features of characters. In: 5th international conference on Signal processing proceedings 2000Google Scholar
  11. 11.
    Di Stefano L, Bulgarelli A (1999) A simple and efficient connected components labeling algorithm. In: International conference on image analysis and processing ICIAP 1999Google Scholar
  12. 12.
    Granger E, Henniges P, Sabourin R, Oliveira LS (2007) Supervised learning of fuzzy ARTMAP neural networks through particle swarm optimization. J Pattern Recog ResGoogle Scholar
  13. 13.
    Taghi M, Baghmisheh V, Nikola P (2003) A fast simplified fuzzy ARTMAP network. J Neural Process LettGoogle Scholar

Copyright information

© Springer India 2013

Authors and Affiliations

  1. 1.Centre for Development of Advanced ComputingThriuvananthapuramIndia

Personalised recommendations