Abstract
A simple yet effective layout segmentation of document images is proposed in this paper. First, n x n blocks are roughly labeled as background, line, text, images, graphics or mixed class. For blocks in mixed class, they are split into 4 sub-blocks and the process repeats until no mixed class is found. By exploiting Savitzky-Golay derivative filter in the classification, the computation of features is kept to the minimum. Next, the boundaries of each object are refined. The experimental results yields a satisfactory results as a pre-process prior to OCR.
Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
A. Savitzky and M.J.E. Golay, Smoothing and di.erentiation of data by simplified least squares procedure. Analytical Chemistry 36 (1964) 1627–1639
I. Keslassy, M. Kalman, D. Wang, and B. Girod, Classification of Compound Images Based on Transform Coeficient Likelihood. Proc. ICIP 2001 (2001)
In-Kwon Kim, Dong-Wook Jung and Rae-Hong Park Document image binarization based on topographic analysis using a water flow model. Pattern Recognition, 35(1) (2002) 265–277
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Suvichakorn, A., Watcharabusaracum, S., Sinthupinyo, W. (2002). Simple Layout Segmentation of Gray-Scale Document Images. In: Lopresti, D., Hu, J., Kashi, R. (eds) Document Analysis Systems V. DAS 2002. Lecture Notes in Computer Science, vol 2423. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45869-7_28
Download citation
DOI: https://doi.org/10.1007/3-540-45869-7_28
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44068-0
Online ISBN: 978-3-540-45869-2
eBook Packages: Springer Book Archive