Abstract
Text line extraction is the first and one of the most critical steps in optical character recognition (OCR) of unconstrained handwritten documents. The present work reports a new methodology based on comparison of neighborhood connected components to determine whether they belong to the same text line. Components which are very small or very large compared to the average component height are ignored in the preprocessing step. During post-processing, such components are reconsidered and allocated to the lines to which they most suitably belong. The performance of the developed technique is evaluated on the benchmark training dataset for the ICDAR 2009 handwriting segmentation contest. The dataset consists of English, French, German and Greek handwritten texts. The overall text line identification accuracy on the mentioned dataset is observed to be around 93.35%.
Chapter PDF
Similar content being viewed by others
References
Likforman, L., et al.: A Hough based algorithm for extracting text lines in handwritten documents. In: Proc. of the Third ICDAR, Montreal, Canada, pp. 774–777 (1995)
Pu, Y., et al.: A natural learning algorithm based on Hough transform for text lines extraction in handwritten documents. In: Proc. of the 6th IWFHR, pp. 637–646 (1998)
Louloudis, G., et al.: A block-based Hough transform mapping for text line detection in handwritten documents. In: The 10th IWFHR, France, October 2006, pp. 515–520 (2006)
Shi, Z., et al.: Line separation for complex document images using fuzzy run-length. In: First International Workshop on Document Image Analysis for Libraries, p. 306 (2004)
Gatos, B., et al.: ICDAR2007 Handwriting Segmentation Contest. In: the Ninth ICDAR, Curitiba, Brazil, September 2007, pp. 1284–1288 (2007)
Wahl, F.M., et al.: Block segmentation and text extraction in mixed text/image documents. Computer Graphics and Image Processing 20, 375–390 (1982)
Roy, P.P., et al.: Morphology Based Handwritten Line Segmentation Using Foreground and Background Information. In: Proc. of ICFHR, Canada, pp. 241–246 (2008)
Yin, F., et al.: Handwritten Text Line Segmentation by Clustering with Distance Metric Learning. In: Proc. of ICFHR, Canada, August 91-21, pp. 229–234 (2008)
Du, X., et al.: Text Line Segmentation in Handwritten Documents Using Mumford-Shah Model. In: Proc. of ICFHR, Canada, August 91-21, pp. 253–258 (2008)
Li, Y., et al.: Script-Independent Text Line Segmentation in Freestyle Handwritten Documents. IEEE Transactions on PAMI 30(8), 1313–1329 (2008)
Basu, S., et al.: Text line extraction from multi-skewed handwritten documents. Pattern Recognition 40(6), 1825–1839 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Khandelwal, A., Choudhury, P., Sarkar, R., Basu, S., Nasipuri, M., Das, N. (2009). Text Line Segmentation for Unconstrained Handwritten Document Images Using Neighborhood Connected Component Analysis. In: Chaudhury, S., Mitra, S., Murthy, C.A., Sastry, P.S., Pal, S.K. (eds) Pattern Recognition and Machine Intelligence. PReMI 2009. Lecture Notes in Computer Science, vol 5909. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-11164-8_60
Download citation
DOI: https://doi.org/10.1007/978-3-642-11164-8_60
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-11163-1
Online ISBN: 978-3-642-11164-8
eBook Packages: Computer ScienceComputer Science (R0)