Historical Document Binarization Combining Semantic Labeling and Graph Cuts

  • Kalyan Ram Ayyalasomayajula
  • Anders Brun
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10269)


Most data mining applications on collections of historical documents require binarization of the digitized images as a pre-processing step. Historical documents are often subjected to degradations such as parchment aging, smudges and bleed through from the other side. The text is sometimes printed, but more often handwritten. Mathematical modeling of appearance of the text, background and all kinds of degradations, is challenging. In the current work we try to tackle binarization as pixel classification problem. We first apply semantic segmentation, using fully convolutional neural networks. In order to improve the sharpness of the result, we then apply a graph cut algorithm. The labels from the semantic segmentation are used as approximate estimates of the text and background, with the probability map of background used for pruning the edges in the graph cut. The results obtained show significant improvement over the state of the art approach.


Binarization Semantic labeling Deep learning Graph cut Zero shot learning 



This project is a part of q2b, From quill to bytes, an initiative sponsored by the Swedish Research Council (Vetenskapsrådet D.Nr 2012-5743) and Riksbankens Jubileumsfond (R.Nr NHS14-2068:1) and Uppsala university. The authors would like to thank Fredrik Wahlberg and Tomas Wilkinson of Dept. of Information Tech., Uppsala University and also the anonymous reviewers for their constructive criticism in improving the manuscript.


  1. 1.
    Ayyalasomayajula, K.R., Brun, A.: Document binarization using topological clustering guided Laplacian energy segmentation. In: Proceedings of ICFHR, pp. 523–528 (2014)Google Scholar
  2. 2.
    Bar-Yosef, I., Beckman, I., Kedem, K., Dinstein, I.: Binarization, character extraction and writer identification of historical Hebrew calligraphy documents. Int. J. Doc. Anal. Recogn. 9(2), 89–99 (2007)CrossRefGoogle Scholar
  3. 3.
    Boykov, Y., Kolmogorov, V.: An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. PAMI 26(9), 1124–1137 (2004)CrossRefzbMATHGoogle Scholar
  4. 4.
    Howe, N.: A Laplacian energy for document binarization. In: International Conference on Document Analysis and Recognition, pp. 6–10 (2011)Google Scholar
  5. 5.
    Howe, N.R.: Document binarization with automatic parameter tuning. Int. J. Doc. Anal. Recognit. 16(3), 247–258 (2012). doi: 10.1007/s10032-012-0192-x CrossRefGoogle Scholar
  6. 6.
    Lelore, T., Bouchara, F.: Super-resolved binarization of text based on FAIR algorithm. In: International Conference on Document Analysis and Recognition, pp. 839–843 (2011)Google Scholar
  7. 7.
    Lu, H., Kot, A.C., Shi, Y.Q.: Distance-reciprocal distortion measure for binary document images. IEEE Signal Process. Lett. 11(2), 228–231 (2004)CrossRefGoogle Scholar
  8. 8.
    Lu, S., Su, B., Tan, C.L.: Document image binarization using background estimation and stroke edges. Int. J. Doc. Anal. Recogn. 13(4), 303–314 (2010)CrossRefGoogle Scholar
  9. 9.
    Mishra, A., Alahari, K., Jawahar, C.V.: An MRF model for binarization of natural scene text. In: International Conference on Document Analysis and Recognition (2011)Google Scholar
  10. 10.
    Niblack, W.: An Introduction to Digital Image Processing. Prentice-Hall, Englewood Cliffs (1986)Google Scholar
  11. 11.
    Otsu, N.: A threshold selection method from gray level histograms. IEEE Trans. Syst. Man Cybern. 9, 62–66 (1979)CrossRefGoogle Scholar
  12. 12.
    Pratikakis, I., Gatos, B., Ntirogiannis, K.: ICDAR: document image binarization contest (DIBCO 2011). In: International Conference on Document Analysis and Recognition, pp. 1506–1510 (2011)Google Scholar
  13. 13.
    Sauvola, N., Pietikainen, M.: Adaptive document image binarization. Pattern Recogn. 33(2), 225–236 (2000)CrossRefGoogle Scholar
  14. 14.
    Shelhamer, E., Long, J., Darrell, T.: Fully convolutional networks for semantic segmentation (2016). arXiv:1605.06211
  15. 15.
    Yangqing, J., Evan, S., Jeff, D., Sergey, K., Jonathan, L., Ross, G., Sergio, G., Trevor, D.: Caffe: convolutional architecture for fast feature embedding, arXiv preprint (2014). arXiv:1408.5093
  16. 16.
  17. 17.

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Department of Information Technology, Centre for Image AnalysisUppsala UniversityUppsalaSweden

Personalised recommendations