Abstract
This paper presents a new method to restore a particular type of degradation related to ancient document images. This degradation, referred to as “bleed-through”, is due to the paper porosity, the chemical quality of the ink, or the conditions of digitalization. It appears as marks degrading the readability of the document image. Our purpose consists then in removing these marks to improve readability. The proposed method is based on a recursive unsupervised segmentation approach applied on the decorrelated data space by the principal component analysis. It generates a binary tree that only the leaves images satisfying a certain condition on their logarithmic histogram are processed. Some experiments, done on real ancient document images provided by the archives of “Chatillon-Chalaronne” illustrate the effectiveness of the suggested method.
Chapter PDF
References
Baird, H.S.: State of the Art of Document Image Degradation Modelling. In: IAPR 2000 Workshop on Document Analysis Systems, Brazil (December 2000) (invited talk)
Leedham, G., Varma, S., Patankar, A., Govindaraju, V.: Separating text and background in degraded document images – a comparison of global thresholding techniques for multi-stage thresholding. In: Proceedings of the 8th international workshop on frontiers in handwriting recognition, Canada, August 2002, pp. 244–249 (2002)
Sharma, G.: Cancellation of show-through in duplex scanning. In: International Conference on Image Processing (ICIP), September 2000, vol. 2, pp. 609–612 (2000)
Dubois, E., Pathak, A.: Reduction of bleed-through in scanned manuscripts documents. In: Proceedings of the IS&T conference on image processing, image quality, image capture systems, Montreal, Canada, April 2001, pp. 177–180 (2001)
Tan, C.L., Cao, R., Shen, P.: Restoration of Archival Documents Using a Wavelet Technique. IEEE Transactions on Pattern Analysis and Machine Intelligence 24, 1399–1404 (2002)
Tan, C.L., Cao, R., Shen, P., Chee, J., Chang, J.: Text extraction from historical handwritten documents by edge detection. In: 6th International Conference on Control, Automation, Robotics and Vision, ICARCV 2000, Singapore (December 2000)
Wang, Q., Xia, T., Tan, C.L., Li, L.: Directional Wavelet Approach to Remove Document Image Interference. In: ICDAR 2003, Edinburgh, Scotland, August 2003, pp. 736–740 (2003)
Tonazzini, A., Salerno, E., Mochi, M., Bedini, L.: Bleed-through removal from degraded documents using a color decorrelation method. In: Marinai, S., Dengel, A.R. (eds.) DAS 2004. LNCS, vol. 3163, pp. 229–240. Springer, Heidelberg (2004)
Gatos, B., Pratikakis, I., Perantonis, S.J.: An Adaptive Binarization Technique for Low Quality Historical Documents. In: Marinai, S., Dengel, A.R. (eds.) DAS 2004. LNCS, vol. 3163, pp. 102–113. Springer, Heidelberg (2004)
Smigiel, E., belaid, A., Hamza, H.: Self-organizing Maps and Ancient Documents. In: Marinai, S., Dengel, A.R. (eds.) DAS 2004. LNCS, vol. 3163, pp. 125–134. Springer, Heidelberg (2004)
Leydier, Y., Le Bourgeois, F., Emptoz, H.: Serialized k-means for adaptative color image segmentation – application to document images and others. In: Marinai, S., Dengel, A.R. (eds.) DAS 2004. LNCS, vol. 3163, pp. 252–263. Springer, Heidelberg (2004)
Hartigan, J.A., Wang, M.A.: A K-means clustering algorithm. Applied Statistics 28, 100–108 (1979)
Chris, D., Xiaofeng, H.: K-means Clustering via Principal Component Analysis. In: Proc. of Int’l Conf. Machine Learning (ICML 2004), Canada (July 2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fadoua, D., Le Bourgeois, F., Emptoz, H. (2006). Restoring Ink Bleed-Through Degraded Document Images Using a Recursive Unsupervised Classification Technique. In: Bunke, H., Spitz, A.L. (eds) Document Analysis Systems VII. DAS 2006. Lecture Notes in Computer Science, vol 3872. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11669487_4
Download citation
DOI: https://doi.org/10.1007/11669487_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-32140-8
Online ISBN: 978-3-540-32157-6
eBook Packages: Computer ScienceComputer Science (R0)