Unsupervised refinement of color and stroke features for text binarization

  • Anand Mishra
  • Karteek Alahari
  • C. V. Jawahar
Original Paper


Color and strokes are the salient features of text regions in an image. In this work, we use both these features as cues, and introduce a novel energy function to formulate the text binarization problem. The minimum of this energy function corresponds to the optimal binarization. We minimize the energy function with an iterative graph cut-based algorithm. Our model is robust to variations in foreground and background as we learn Gaussian mixture models for color and strokes in each iteration of the graph cut. We show results on word images from the challenging ICDAR 2003/2011, born-digital image and street view text datasets, as well as full scene images containing text from ICDAR 2013 datasets, and compare our performance with state-of-the-art methods. Our approach shows significant improvements in performance under a variety of performance measures commonly used to assess text binarization schemes. In addition, our method adapts to diverse document images, like text in videos, handwritten text images.


Gaussian Mixture Model Document Image Conditional Random Field Binarization Method Word Image 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This work was partially supported by the Indo-French Project No. 5302-1, EVEREST, funded by CEFIPRA. Anand Mishra was supported by Microsoft Corporation and Microsoft Research India under the Microsoft Research India Ph.D. fellowship award.


  1. 1.
    Chen, Y., Wang, L.: Broken and degraded document images binarization. Neurocomputing 237, 272–280 (2017)Google Scholar
  2. 2.
    Jia, F., Shi, C., He, K., Wang, C., Xiao, B.: Document image binarization using structural symmetry of strokes. In: ICFHR, pp. 411–416 (2016)Google Scholar
  3. 3.
    Stathis, P., Kavallieratou, E., Papamarkos, N.: An evaluation technique for binarization algorithms. J. Univers. Comput. Sci. 14(18), 3011–3030 (2008)Google Scholar
  4. 4.
    Howe, N.R.: A Laplacian energy for document binarization. In: ICDAR (2011)Google Scholar
  5. 5.
    Howe, N.R.: Document binarization with automatic parameter tuning. Int. J. Doc. Anal. Recognit. 16(3), 247–258 (2013)CrossRefGoogle Scholar
  6. 6.
    Valizadeh, M., Kabir, E.: Binarization of degraded document image based on feature space partitioning and classification. Int. J. Doc. Anal. Recognit. 15(1), 57–69 (2012)CrossRefGoogle Scholar
  7. 7.
    Lazzara, G., Géraud, T.: Efficient multiscale Sauvola’s binarization. Int. J. Doc. Anal. Recognit. 17(2), 105–123 (2014)CrossRefGoogle Scholar
  8. 8.
    Mishra, A., Alahari, K., Jawahar, C.V.: An MRF model for binarization of natural scene text. In: ICDAR (2011)Google Scholar
  9. 9.
    Milyaev, S., Barinova, O., Novikova, T., Kohli, P., Lempitsky, V.: Fast and accurate scene text understanding with image binarization and off-the-shelf OCR. Int. J. Doc. Anal. Recognit. 18(2), 169–182 (2015)CrossRefGoogle Scholar
  10. 10.
    Pratikakis, I., Gatos, B., Ntirogiannis, K.: ICDAR 2013 document image binarization contest (DIBCO 2013). In: ICDAR (2013)Google Scholar
  11. 11.
    Pratikakis, I., Gatos, B., Ntirogiannis, K.: ICFHR 2012 competition on handwritten document image binarization. In: ICFHR (2012)Google Scholar
  12. 12.
    Ntirogiannis, K., Gatos, B., Pratikakis, I.: ICFHR2014 competition on handwritten document image binarization (H-DIBCO 2014). In: ICFHR (2014)Google Scholar
  13. 13.
    Boykov, Y.Y., Jolly, M.-P.: Interactive graph cuts for optimal boundary and region segmentation of objects in ND images. In: ICCV (2001)Google Scholar
  14. 14.
    Rother, C., Kolmogorov, V., Blake, A.: GrabCut: interactive foreground extraction using iterated graph cuts. ACM Trans. Graph. 23(3), 309–314 (2004)CrossRefGoogle Scholar
  15. 15.
    Blake, A., Rother, C., Brown, M., Perez, P., Torr, P.: Interactive image segmentation using an adaptive GMMRF model. In: ECCV (2004)Google Scholar
  16. 16.
    Kittler, J., Illingworth, J., Föglein, J.: Threshold selection based on a simple image statistic. Comput. Vis. Graph. Image Process. 30(2), 125–147 (1985)CrossRefGoogle Scholar
  17. 17.
    Otsu, N.: A threshold selection method from gray-level histograms. Automatica 11(285–296), 23–27 (1975)Google Scholar
  18. 18.
    Kasar, T., Kumar, J., Ramakrishnan, A.: Font and background color independent text binarization. In: CBDAR (2007)Google Scholar
  19. 19.
    Niblack, W.: An Introduction to Digital Image Processing. Strandberg Publishing Company (1985)Google Scholar
  20. 20.
    Sauvola, J., Pietikäinen, M.: Adaptive document image binarization. Pattern Recognit. 33(2), 225–236 (2000)CrossRefGoogle Scholar
  21. 21.
    Wolf, C., Doermann, D.: Binarization of low quality text using a Markov random field model. In: ICPR (2002)Google Scholar
  22. 22.
    Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., i Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazán, J., de las Heras, L.: ICDAR 2013 robust reading competition. In: ICDAR (2013)Google Scholar
  23. 23.
  24. 24.
  25. 25.
    Thillou, C., Gosselin, B.: Color binarization for complex camera-based images. In: Electronic imaging (2005)Google Scholar
  26. 26.
    Kita, K., Wakahara, T.: Binarization of color characters in scene images using k-means clustering and support vector machines. In: ICPR (2010)Google Scholar
  27. 27.
    Lu, S., Su, B., Tan, C.L.: Document image binarization using background estimation and stroke edges. Int. J. Doc. Anal. Recognit. 13(4), 303–314 (2010)CrossRefGoogle Scholar
  28. 28.
    Kuk, J.G., Cho, N.I.: Feature based binarization of document images degraded by uneven light condition. In: ICDAR (2009)Google Scholar
  29. 29.
    Peng, X., Setlur, S., Govindaraju, V., Sitaram, R.: Markov random field based binarization for hand-held devices captured document images. In: ICVGIP (2010)Google Scholar
  30. 30.
    Zhang, H., Liu, C., Yang, C., Ding, X., Wang, K.: An improved scene text extraction method using conditional random field and optical character recognition. In: ICDAR (2011)Google Scholar
  31. 31.
    Pan, Y.-F., Hou, X., Liu, C.-L.: Text localization in natural scene images based on conditional random field. In: ICDAR (2009)Google Scholar
  32. 32.
    Hebert, D., Nicolas, S., Paquet, T.: Discrete CRF based combination framework for document image binarization. In: ICDAR (2013)Google Scholar
  33. 33.
    Gatos, B., Pratikakis, I., Kepene, K., Perantonis, S.: Text detection in indoor/outdoor scene images. In: CBDAR (2005)Google Scholar
  34. 34.
    Ezaki, N., Bulacu, M., Schomaker, L.: Text detection from natural scene images: towards a system for visually impaired persons. In: ICPR (2004)Google Scholar
  35. 35.
    Gomez, L., Karatzas, D.: A fast hierarchical method for multi-script and arbitrary oriented scene text extraction. arXiv preprint arXiv:1407.7504 (2014)
  36. 36.
    Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: CVPR (2010)Google Scholar
  37. 37.
    Feild, J., Learned-Miller, E.: Scene text recognition with bilateral regression. University of Massachusetts-Amherst, Computer Science Research Center, Tech. Rep. UM-CS-2012-021 (2013)Google Scholar
  38. 38.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005)Google Scholar
  39. 39.
    Tian, S., Lu, S., Su, B., Tan, C.L.: Scene text segmentation with multi-level maximally stable extremal regions. In: ICPR (2014)Google Scholar
  40. 40.
    Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide baseline stereo from maximally stable extremal regions. In: BMVC (2002)Google Scholar
  41. 41.
    Boykov, Y., Kolmogorov, V.: An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE Trans. Pattern Anal. Mach. Intell. 26(9), 1124–1137 (2004)CrossRefzbMATHGoogle Scholar
  42. 42.
    Kolmogorov, V., Zabin, R.: What energy functions can be minimized via graph cuts? IEEE Trans. Pattern Anal. Mach. Intell. 26(2), 147–159 (2004)CrossRefGoogle Scholar
  43. 43.
    Boros, E., Hammer, P.L.: Pseudo-boolean optimization. Discrete Appl. Math. 123(1), 155–225 (2002)MathSciNetCrossRefzbMATHGoogle Scholar
  44. 44.
    Reynolds, D.A.: Gaussian mixture models. In: Encyclopedia of Biometrics, Second Edition, pp. 827–832 (2015)Google Scholar
  45. 45.
    Sosa, L.P., Lucas, S.M., Panaretos, A., Sosa, L., Tang, A., Wong, S., Young, R.: ICDAR 2003 robust reading competitions. In: ICDAR (2003)Google Scholar
  46. 46.
    Shahab, A., Shafait, F., Dengel, A.: ICDAR 2011 robust reading competition challenge 2: reading text in scene images. In: ICDAR (2011)Google Scholar
  47. 47.
    Karatzas, D., Mestre, S.R., Mas, J., Nourbakhsh, F., Roy, P.P.: ICDAR 2011 robust reading competition—challenge 1: reading text in born-digital images (web and email). In: ICDAR (2011)Google Scholar
  48. 48.
    Wang, K., Belongie, S.: Word spotting in the wild. In: ECCV (2010)Google Scholar
  49. 49.
    ICDAR 2015 Competition on Video Script Identification.
  50. 50.
  51. 51.
  52. 52.
    Kumar, D., Prasad, M., Ramakrishnan, A.: Benchmarking recognition results on camera captured word image data sets. In: DAR (2012)Google Scholar
  53. 53.
    Milyaev, S., Barinova, O., Novikova, T., Kohli, P., Lempitsky, V.: Image binarization for end-to-end text understanding in natural images. In: ICDAR (2013)Google Scholar
  54. 54.
    Clavelli, A., Karatzas, D., Lladós, J.: A framework for the assessment of text extraction algorithms on complex colour images. In: DAS (2010)Google Scholar
  55. 55.
    Lopresti, D., Zhou, J.: Locating and recognizing text in WWW images. Inf. Retr. 2(2–3), 177–206 (2000)CrossRefGoogle Scholar
  56. 56.
    Karatzas, D., Antonacopoulos, A.: Colour text segmentation in web images based on human perception. Image Vis. Comput. 25(5), 564–577 (2007)CrossRefzbMATHGoogle Scholar
  57. 57.
    Kumar, D., Prasad, M.A., Ramakrishnan, A.: NESP: nonlinear enhancement and selection of plane for optimal segmentation and recognition of scene word images. In: IS&T/SPIE Electronic Imaging (2013)Google Scholar
  58. 58.
    Smith, E.H.B.: An analysis of binarization ground truthing. In: DAS (2010)Google Scholar
  59. 59.
    Gonzalez, R.C., Woods, R.E.: Digital Image Process. Prentice-Hall of India Pvt. Ltd, Delhi (2005)Google Scholar
  60. 60.
    ABBYY Finereader 8.0.
  61. 61.
    Shahab, A., Shafait, F., Dengel, A.: ICDAR2011 robust reading competition challenge 2: Reading text in scene images. In: ICDAR (2011)Google Scholar
  62. 62.
    Lu, Z., Wu, Z., Brown, M.S.: Directed assistance for ink-bleed reduction in old documents. In: CVPR (2009)Google Scholar
  63. 63.
    Lu, Z., Wu, Z., Brown, M.S.: Interactive degraded document binarization: an example (and case) for interactive computer vision. In: WACV (2009)Google Scholar
  64. 64.
    Mishra, A., Alahari, K., Jawahar, C.: Top-down and bottom-up cues for scene text recognition. In: CVPR (2012)Google Scholar
  65. 65.
    Jaderberg, M., Vedaldi, A., Zisserman, A.: Deep features for text spotting. In: ECCV (2014)Google Scholar
  66. 66.
    Novikova, T., Barinova, O., Kohli, P., Lempitsky, V.: Large-lexicon attribute-consistent text recognition in natural images. In: ECCV (2012)Google Scholar
  67. 67.
    Shi, C., Wang, C., Xiao, B., Zhang, Y., Gao, S., Zhang, Z.: Scene text recognition using part-based tree-structured character detection. In: CVPR (2013)Google Scholar
  68. 68.
    Jahangiri, M., Heesch, D.: Modified grabcut for unsupervised object segmentation. In: ICIP (2009)Google Scholar
  69. 69.
    Khattab, D., Ebied, H.M., Hussein, A.S., Tolba, M.F.: Multi-label automatic grabcut for image segmentation. In: HIS (2014)Google Scholar
  70. 70.
    Khattab, D., Ebied, H.M., Hussein, A.S., Tolba, M.F.: Color image segmentation based on different color space models using automatic GrabCut. Sci. World J. 2014, 10 (2014)Google Scholar
  71. 71.
    Jegelka, S., Bilmes, J.: Submodularity beyond submodular energies: coupling edges in graph cuts. In: CVPR (2011)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2017

Authors and Affiliations

  1. 1.Center for Visual Information TechnologyIIIT HyderabadHyderabadIndia
  2. 2.Thoth TeamInriaGrenobleFrance
  3. 3.Laboratoire Jean Kuntzmann, Université Grenoble AlpesGrenobleFrance

Personalised recommendations