Classification of Handwritten Document Image into Text and Non-Text Regions

Vidya, V.; Indhu, T. R.; Bhadran, V. K.

doi:10.1007/978-81-322-1000-9_10

Classification of Handwritten Document Image into Text and Non-Text Regions

V. Vidya³,
T. R. Indhu³ &
V. K. Bhadran³

Conference paper
First Online: 01 January 2013

1590 Accesses
4 Citations

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 222))

Abstract

Segmentation of document image into text and non-text regions is an essential process in document layout analysis which is one of the preprocessing steps in optical character recognition. Usually handwritten documents has no specific layout. It may contain non text regions such as diagrams, graphics, tables etc. In this work we propose a novel approach to segment text and non text components in Malayalam handwritten document image using Simplified Fuzzy ARTMAP (SFAM) classifier. Binarized document image is dilated horizontally and vertically and merged together. Perform connected component labelling on the smeared image. A set of geometrical and statistical features are extracted from each component and given to SFAM for classifying it into text and non text components. Experimental results are promising and it can be extended to other scripts also.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Abd-Almageed W, Agrawal M, Seo W, David D (2008) Document-zone classification using partial least squares and hybrid classifiers. International conference on pattern recognition (ICPR) 2008
Google Scholar
Keysers D, Shafait F, Breuel TM (2007) Document image zone classification- a simple high-performance approach. In Proceedings of 2nd international conference on computer vision theory and applications 2007
Google Scholar
Moll MA, Baird HS, Chang An (2008) Truthing for pixel-accurate segmentation. In: Document analysis systems. The eighth IAPR international workshop 2008
Google Scholar
Shafait F, Keysers D, Breuel TM (2006) Pixel-accurate representation and evaluation of page segmentation in document images. In: 18th international conference on pattern recognition 2006
Google Scholar
Bukhari SS, Ali AlAzawi M, Shafait F (2010) Document image segmentation using discriminative learning over connected components. In: 9th IAPR workshop on document analysis systems 2010
Google Scholar
Bloomberg S, Chen FR (1996) Extraction of text-related features for condensing image documents. In: SPIE conference on 2660, Document Recognition III 1996
Google Scholar
Bukharia SS, Shafaitb F, Thomas M (2011) Breuela: improved document image segmentation algorithm using multi-resolution morphology. SPIE Document Recognition and Retrieval XVIII 2011
Google Scholar
Sarkar R, Moulik S, Das N, Basu S, Nasipuri M, Kundu M (2011) Suppression of non-text components in handwritten document images. International conference on image information processing (ICIIP) 2011
Google Scholar
Otsu N (1979) A threshold selection method from gray-level histogram. IEEE Trans Syst Man Cybern
Google Scholar
Ping Z, Lihui C, Alex KC (2000) Text document filters using morphological and geometrical features of characters. In: 5th international conference on Signal processing proceedings 2000
Google Scholar
Di Stefano L, Bulgarelli A (1999) A simple and efficient connected components labeling algorithm. In: International conference on image analysis and processing ICIAP 1999
Google Scholar
Granger E, Henniges P, Sabourin R, Oliveira LS (2007) Supervised learning of fuzzy ARTMAP neural networks through particle swarm optimization. J Pattern Recog Res
Google Scholar
Taghi M, Baghmisheh V, Nikola P (2003) A fast simplified fuzzy ARTMAP network. J Neural Process Lett
Google Scholar

Download references

Author information

Authors and Affiliations

Centre for Development of Advanced Computing, Thriuvananthapuram, Kerala, India
V. Vidya, T. R. Indhu & V. K. Bhadran

Authors

V. Vidya
View author publications
You can also search for this author in PubMed Google Scholar
T. R. Indhu
View author publications
You can also search for this author in PubMed Google Scholar
V. K. Bhadran
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to V. Vidya .

Editor information

Editors and Affiliations

, Computer Science & Engineering, Dr. N.G.P. Institute of Technology, Kalapatti Road, Coimbatore, 641048, Tamil Nadu, India
Mohan S
, Electronics & Communication Engineering, Dr. N.G.P. Institute of Technology, Kalapatti Road, Coimbatore, 641048, Tamil Nadu, India
S Suresh Kumar

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vidya, V., Indhu, T.R., Bhadran, V.K. (2013). Classification of Handwritten Document Image into Text and Non-Text Regions. In: S, M., Kumar, S. (eds) Proceedings of the Fourth International Conference on Signal and Image Processing 2012 (ICSIP 2012). Lecture Notes in Electrical Engineering, vol 222. Springer, India. https://doi.org/10.1007/978-81-322-1000-9_10

Download citation

DOI: https://doi.org/10.1007/978-81-322-1000-9_10
Published: 11 January 2013
Publisher Name: Springer, India
Print ISBN: 978-81-322-0999-7
Online ISBN: 978-81-322-1000-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics