An Adaptive Binarization Technique for Low Quality Historical Documents
Historical document collections are a valuable resource for human history. This paper proposes a novel digital image binarization scheme for low quality historical documents allowing further content exploitation in an efficient way. The proposed scheme consists of five distinct steps: a pre-processing procedure using a low-pass Wiener filter, a rough estimation of foreground regions using Niblack’s approach, a background surface calculation by interpolating neighboring background intensities, a thresholding by combining the calculated background surface with the original image and finally a post-processing step in order to improve the quality of text regions and preserve stroke connectivity. The proposed methodology works with great success even in cases of historical manuscripts with poor quality, shadows, nonuniform illumination, low contrast, large signal- dependent noise, smear and strain. After testing the proposed method on numerous low quality historical manuscripts, it has turned out that our methodology performs better compared to current state-of-the-art adaptive thresholding techniques.
Unable to display preview. Download preview PDF.
- 1.Rosenfeld, A., Kak, A.C.: Digital Picture Processing, 2nd edn. Academic Press, New York (1982)Google Scholar
- 3.Kittler, J., Illingworth, J.: On threshold selection using clustering criteria. IEEE Trans. Systems Man Cybernet. 15, 652–655 (1985)Google Scholar
- 7.Kim, I.K., Park, R.H.: Local adaptive thresholding based on a water flow model. In: Second Japan-Korea Joint Workshop on Computer Vision, Japan, pp. 21–27 (1996)Google Scholar
- 8.Niblack, W.: An Introduction to Digital Image Processing, pp. 115–116. Prentice Hall, Englewood Cliffs (1986)Google Scholar
- 10.Parker, J.R., Jennings, C., Salkauskas, A.G.: Thresholding using an illumination model. In: ICDAR 1993, pp. 270–273 (1993)Google Scholar
- 12.Chang, M., Kang, S., Rho, W., Kim, H., Kim, D.: Improved binarization algorithm for document image by histogram and edge detection. In: ICDAR 1995, pp. 636–643 (1995)Google Scholar
- 14.Eikvil, L., Taxt, T., Moen, K.: A fast adaptive method for binarization of document images. In: Int. Conf. Document Analysis and Recognition, France, pp. 435–443 (1991)Google Scholar
- 15.Seeger, M., Dance, C.: Binarising Camera Images for OCR. In: Sixth International Conference on Document Analysis and Recognition (ICDAR 2001), Seattle, Washington, pp. 54–58 (2001)Google Scholar
- 17.Schilling, R.J.: Fundamentals of Robotics Analysis and Control. Prentice-Hall, Englewood Cliffs (1990)Google Scholar