Abstract
Historical document collections are a valuable resource for human history. This paper proposes a novel digital image binarization scheme for low quality historical documents allowing further content exploitation in an efficient way. The proposed scheme consists of five distinct steps: a pre-processing procedure using a low-pass Wiener filter, a rough estimation of foreground regions using Niblack’s approach, a background surface calculation by interpolating neighboring background intensities, a thresholding by combining the calculated background surface with the original image and finally a post-processing step in order to improve the quality of text regions and preserve stroke connectivity. The proposed methodology works with great success even in cases of historical manuscripts with poor quality, shadows, nonuniform illumination, low contrast, large signal- dependent noise, smear and strain. After testing the proposed method on numerous low quality historical manuscripts, it has turned out that our methodology performs better compared to current state-of-the-art adaptive thresholding techniques.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Rosenfeld, A., Kak, A.C.: Digital Picture Processing, 2nd edn. Academic Press, New York (1982)
Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Systems Man Cybernet. 9(1), 62–66 (1979)
Kittler, J., Illingworth, J.: On threshold selection using clustering criteria. IEEE Trans. Systems Man Cybernet. 15, 652–655 (1985)
Brink, A.D.: Thresholding of digital images using two-dimensional entropies. Pattern Recognition 25(8), 803–808 (1992)
Yan, H.: Unified formulation of a class of image thresholding techniques. Pattern Recognition 29(12), 2025–2032 (1996)
Sahoo, P.K., Soltani, S., Wong, A.K.C.: A survey of Thresholding Techniques. Computer Vision, Graphics and Image Processing 41(2), 233–260 (1988)
Kim, I.K., Park, R.H.: Local adaptive thresholding based on a water flow model. In: Second Japan-Korea Joint Workshop on Computer Vision, Japan, pp. 21–27 (1996)
Niblack, W.: An Introduction to Digital Image Processing, pp. 115–116. Prentice Hall, Englewood Cliffs (1986)
Yang, J., Chen, Y., Hsu, W.: Adaptive thresholding algorithm and its hardware implementation. Pattern Recognition Lett. 15(2), 141–150 (1994)
Parker, J.R., Jennings, C., Salkauskas, A.G.: Thresholding using an illumination model. In: ICDAR 1993, pp. 270–273 (1993)
Sauvola, J., Pietikainen, M.: Adaptive Document Image Binarization. Pattern Recognition 33, 225–236 (2000)
Chang, M., Kang, S., Rho, W., Kim, H., Kim, D.: Improved binarization algorithm for document image by histogram and edge detection. In: ICDAR 1995, pp. 636–643 (1995)
Trier, O.D., Jain, A.K.: Goal-Directed Evaluation of Binarization Methods. IEEE Trans. on Patt. Anal. and Mach. Intell. 17(12), 1191–1201 (1995)
Eikvil, L., Taxt, T., Moen, K.: A fast adaptive method for binarization of document images. In: Int. Conf. Document Analysis and Recognition, France, pp. 435–443 (1991)
Seeger, M., Dance, C.: Binarising Camera Images for OCR. In: Sixth International Conference on Document Analysis and Recognition (ICDAR 2001), Seattle, Washington, pp. 54–58 (2001)
Jain, A.: Fundamentals of Digital Image Processing. Prantice Hall, Englewood Cliffs (1989)
Schilling, R.J.: Fundamentals of Robotics Analysis and Control. Prentice-Hall, Englewood Cliffs (1990)
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions and reversals. Sov. Phys. Dokl. 6, 707–710 (1966)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gatos, B., Pratikakis, I., Perantonis, S.J. (2004). An Adaptive Binarization Technique for Low Quality Historical Documents. In: Marinai, S., Dengel, A.R. (eds) Document Analysis Systems VI. DAS 2004. Lecture Notes in Computer Science, vol 3163. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-28640-0_10
Download citation
DOI: https://doi.org/10.1007/978-3-540-28640-0_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23060-1
Online ISBN: 978-3-540-28640-0
eBook Packages: Springer Book Archive