Historical Document Binarization Combining Semantic Labeling and Graph Cuts
Most data mining applications on collections of historical documents require binarization of the digitized images as a pre-processing step. Historical documents are often subjected to degradations such as parchment aging, smudges and bleed through from the other side. The text is sometimes printed, but more often handwritten. Mathematical modeling of appearance of the text, background and all kinds of degradations, is challenging. In the current work we try to tackle binarization as pixel classification problem. We first apply semantic segmentation, using fully convolutional neural networks. In order to improve the sharpness of the result, we then apply a graph cut algorithm. The labels from the semantic segmentation are used as approximate estimates of the text and background, with the probability map of background used for pruning the edges in the graph cut. The results obtained show significant improvement over the state of the art approach.
KeywordsBinarization Semantic labeling Deep learning Graph cut Zero shot learning
This project is a part of q2b, From quill to bytes, an initiative sponsored by the Swedish Research Council (Vetenskapsrådet D.Nr 2012-5743) and Riksbankens Jubileumsfond (R.Nr NHS14-2068:1) and Uppsala university. The authors would like to thank Fredrik Wahlberg and Tomas Wilkinson of Dept. of Information Tech., Uppsala University and also the anonymous reviewers for their constructive criticism in improving the manuscript.
- 1.Ayyalasomayajula, K.R., Brun, A.: Document binarization using topological clustering guided Laplacian energy segmentation. In: Proceedings of ICFHR, pp. 523–528 (2014)Google Scholar
- 4.Howe, N.: A Laplacian energy for document binarization. In: International Conference on Document Analysis and Recognition, pp. 6–10 (2011)Google Scholar
- 6.Lelore, T., Bouchara, F.: Super-resolved binarization of text based on FAIR algorithm. In: International Conference on Document Analysis and Recognition, pp. 839–843 (2011)Google Scholar
- 9.Mishra, A., Alahari, K., Jawahar, C.V.: An MRF model for binarization of natural scene text. In: International Conference on Document Analysis and Recognition (2011)Google Scholar
- 10.Niblack, W.: An Introduction to Digital Image Processing. Prentice-Hall, Englewood Cliffs (1986)Google Scholar
- 12.Pratikakis, I., Gatos, B., Ntirogiannis, K.: ICDAR: document image binarization contest (DIBCO 2011). In: International Conference on Document Analysis and Recognition, pp. 1506–1510 (2011)Google Scholar
- 14.Shelhamer, E., Long, J., Darrell, T.: Fully convolutional networks for semantic segmentation (2016). arXiv:1605.06211
- 15.Yangqing, J., Evan, S., Jeff, D., Sergey, K., Jonathan, L., Ross, G., Sergio, G., Trevor, D.: Caffe: convolutional architecture for fast feature embedding, arXiv preprint (2014). arXiv:1408.5093