Skip to main content

Advertisement

Log in

Unsupervised refinement of color and stroke features for text binarization

  • Original Paper
  • Published:
International Journal on Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Abstract

Color and strokes are the salient features of text regions in an image. In this work, we use both these features as cues, and introduce a novel energy function to formulate the text binarization problem. The minimum of this energy function corresponds to the optimal binarization. We minimize the energy function with an iterative graph cut-based algorithm. Our model is robust to variations in foreground and background as we learn Gaussian mixture models for color and strokes in each iteration of the graph cut. We show results on word images from the challenging ICDAR 2003/2011, born-digital image and street view text datasets, as well as full scene images containing text from ICDAR 2013 datasets, and compare our performance with state-of-the-art methods. Our approach shows significant improvements in performance under a variety of performance measures commonly used to assess text binarization schemes. In addition, our method adapts to diverse document images, like text in videos, handwritten text images.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. Other color spaces such as CMYK or HSV can also be used.

  2. The stroke-based term is computed similarly with stroke width and intensity of each pixel generated from one of the 2c GMMs.

  3. Source code for all the performance measures used in this work is available on our project Web site [24].

  4. Skeleton, also known as morphological skeleton, is a medial axis representation of a binary image computed with morphological operators [59].

  5. We thank the authors for providing the implementation of their methods.

References

  1. Chen, Y., Wang, L.: Broken and degraded document images binarization. Neurocomputing 237, 272–280 (2017)

  2. Jia, F., Shi, C., He, K., Wang, C., Xiao, B.: Document image binarization using structural symmetry of strokes. In: ICFHR, pp. 411–416 (2016)

  3. Stathis, P., Kavallieratou, E., Papamarkos, N.: An evaluation technique for binarization algorithms. J. Univers. Comput. Sci. 14(18), 3011–3030 (2008)

    Google Scholar 

  4. Howe, N.R.: A Laplacian energy for document binarization. In: ICDAR (2011)

  5. Howe, N.R.: Document binarization with automatic parameter tuning. Int. J. Doc. Anal. Recognit. 16(3), 247–258 (2013)

    Article  Google Scholar 

  6. Valizadeh, M., Kabir, E.: Binarization of degraded document image based on feature space partitioning and classification. Int. J. Doc. Anal. Recognit. 15(1), 57–69 (2012)

    Article  Google Scholar 

  7. Lazzara, G., Géraud, T.: Efficient multiscale Sauvola’s binarization. Int. J. Doc. Anal. Recognit. 17(2), 105–123 (2014)

    Article  Google Scholar 

  8. Mishra, A., Alahari, K., Jawahar, C.V.: An MRF model for binarization of natural scene text. In: ICDAR (2011)

  9. Milyaev, S., Barinova, O., Novikova, T., Kohli, P., Lempitsky, V.: Fast and accurate scene text understanding with image binarization and off-the-shelf OCR. Int. J. Doc. Anal. Recognit. 18(2), 169–182 (2015)

    Article  Google Scholar 

  10. Pratikakis, I., Gatos, B., Ntirogiannis, K.: ICDAR 2013 document image binarization contest (DIBCO 2013). In: ICDAR (2013)

  11. Pratikakis, I., Gatos, B., Ntirogiannis, K.: ICFHR 2012 competition on handwritten document image binarization. In: ICFHR (2012)

  12. Ntirogiannis, K., Gatos, B., Pratikakis, I.: ICFHR2014 competition on handwritten document image binarization (H-DIBCO 2014). In: ICFHR (2014)

  13. Boykov, Y.Y., Jolly, M.-P.: Interactive graph cuts for optimal boundary and region segmentation of objects in ND images. In: ICCV (2001)

  14. Rother, C., Kolmogorov, V., Blake, A.: GrabCut: interactive foreground extraction using iterated graph cuts. ACM Trans. Graph. 23(3), 309–314 (2004)

    Article  Google Scholar 

  15. Blake, A., Rother, C., Brown, M., Perez, P., Torr, P.: Interactive image segmentation using an adaptive GMMRF model. In: ECCV (2004)

  16. Kittler, J., Illingworth, J., Föglein, J.: Threshold selection based on a simple image statistic. Comput. Vis. Graph. Image Process. 30(2), 125–147 (1985)

    Article  Google Scholar 

  17. Otsu, N.: A threshold selection method from gray-level histograms. Automatica 11(285–296), 23–27 (1975)

    Google Scholar 

  18. Kasar, T., Kumar, J., Ramakrishnan, A.: Font and background color independent text binarization. In: CBDAR (2007)

  19. Niblack, W.: An Introduction to Digital Image Processing. Strandberg Publishing Company (1985)

  20. Sauvola, J., Pietikäinen, M.: Adaptive document image binarization. Pattern Recognit. 33(2), 225–236 (2000)

    Article  Google Scholar 

  21. Wolf, C., Doermann, D.: Binarization of low quality text using a Markov random field model. In: ICPR (2002)

  22. Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., i Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazán, J., de las Heras, L.: ICDAR 2013 robust reading competition. In: ICDAR (2013)

  23. Tesseract OCR. http://code.google.com/p/tesseract-ocr/

  24. Project website. http://cvit.iiit.ac.in/projects/SceneTextUnderstanding/

  25. Thillou, C., Gosselin, B.: Color binarization for complex camera-based images. In: Electronic imaging (2005)

  26. Kita, K., Wakahara, T.: Binarization of color characters in scene images using k-means clustering and support vector machines. In: ICPR (2010)

  27. Lu, S., Su, B., Tan, C.L.: Document image binarization using background estimation and stroke edges. Int. J. Doc. Anal. Recognit. 13(4), 303–314 (2010)

    Article  Google Scholar 

  28. Kuk, J.G., Cho, N.I.: Feature based binarization of document images degraded by uneven light condition. In: ICDAR (2009)

  29. Peng, X., Setlur, S., Govindaraju, V., Sitaram, R.: Markov random field based binarization for hand-held devices captured document images. In: ICVGIP (2010)

  30. Zhang, H., Liu, C., Yang, C., Ding, X., Wang, K.: An improved scene text extraction method using conditional random field and optical character recognition. In: ICDAR (2011)

  31. Pan, Y.-F., Hou, X., Liu, C.-L.: Text localization in natural scene images based on conditional random field. In: ICDAR (2009)

  32. Hebert, D., Nicolas, S., Paquet, T.: Discrete CRF based combination framework for document image binarization. In: ICDAR (2013)

  33. Gatos, B., Pratikakis, I., Kepene, K., Perantonis, S.: Text detection in indoor/outdoor scene images. In: CBDAR (2005)

  34. Ezaki, N., Bulacu, M., Schomaker, L.: Text detection from natural scene images: towards a system for visually impaired persons. In: ICPR (2004)

  35. Gomez, L., Karatzas, D.: A fast hierarchical method for multi-script and arbitrary oriented scene text extraction. arXiv preprint arXiv:1407.7504 (2014)

  36. Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: CVPR (2010)

  37. Feild, J., Learned-Miller, E.: Scene text recognition with bilateral regression. University of Massachusetts-Amherst, Computer Science Research Center, Tech. Rep. UM-CS-2012-021 (2013)

  38. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005)

  39. Tian, S., Lu, S., Su, B., Tan, C.L.: Scene text segmentation with multi-level maximally stable extremal regions. In: ICPR (2014)

  40. Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide baseline stereo from maximally stable extremal regions. In: BMVC (2002)

  41. Boykov, Y., Kolmogorov, V.: An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE Trans. Pattern Anal. Mach. Intell. 26(9), 1124–1137 (2004)

    Article  MATH  Google Scholar 

  42. Kolmogorov, V., Zabin, R.: What energy functions can be minimized via graph cuts? IEEE Trans. Pattern Anal. Mach. Intell. 26(2), 147–159 (2004)

    Article  Google Scholar 

  43. Boros, E., Hammer, P.L.: Pseudo-boolean optimization. Discrete Appl. Math. 123(1), 155–225 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  44. Reynolds, D.A.: Gaussian mixture models. In: Encyclopedia of Biometrics, Second Edition, pp. 827–832 (2015)

  45. Sosa, L.P., Lucas, S.M., Panaretos, A., Sosa, L., Tang, A., Wong, S., Young, R.: ICDAR 2003 robust reading competitions. In: ICDAR (2003)

  46. Shahab, A., Shafait, F., Dengel, A.: ICDAR 2011 robust reading competition challenge 2: reading text in scene images. In: ICDAR (2011)

  47. Karatzas, D., Mestre, S.R., Mas, J., Nourbakhsh, F., Roy, P.P.: ICDAR 2011 robust reading competition—challenge 1: reading text in born-digital images (web and email). In: ICDAR (2011)

  48. Wang, K., Belongie, S.: Word spotting in the wild. In: ECCV (2010)

  49. ICDAR 2015 Competition on Video Script Identification. http://www.ict.griffith.edu.au/cvsi2015/

  50. ICDAR 2003 dataset. http://algoval.essex.ac.uk/icdar/RobustWord.html

  51. ICDAR 2011 dataset. http://robustreading.opendfki.de/trac/wiki/SceneText

  52. Kumar, D., Prasad, M., Ramakrishnan, A.: Benchmarking recognition results on camera captured word image data sets. In: DAR (2012)

  53. Milyaev, S., Barinova, O., Novikova, T., Kohli, P., Lempitsky, V.: Image binarization for end-to-end text understanding in natural images. In: ICDAR (2013)

  54. Clavelli, A., Karatzas, D., Lladós, J.: A framework for the assessment of text extraction algorithms on complex colour images. In: DAS (2010)

  55. Lopresti, D., Zhou, J.: Locating and recognizing text in WWW images. Inf. Retr. 2(2–3), 177–206 (2000)

    Article  Google Scholar 

  56. Karatzas, D., Antonacopoulos, A.: Colour text segmentation in web images based on human perception. Image Vis. Comput. 25(5), 564–577 (2007)

    Article  MATH  Google Scholar 

  57. Kumar, D., Prasad, M.A., Ramakrishnan, A.: NESP: nonlinear enhancement and selection of plane for optimal segmentation and recognition of scene word images. In: IS&T/SPIE Electronic Imaging (2013)

  58. Smith, E.H.B.: An analysis of binarization ground truthing. In: DAS (2010)

  59. Gonzalez, R.C., Woods, R.E.: Digital Image Process. Prentice-Hall of India Pvt. Ltd, Delhi (2005)

    Google Scholar 

  60. ABBYY Finereader 8.0. http://www.abbyy.com/

  61. Shahab, A., Shafait, F., Dengel, A.: ICDAR2011 robust reading competition challenge 2: Reading text in scene images. In: ICDAR (2011)

  62. Lu, Z., Wu, Z., Brown, M.S.: Directed assistance for ink-bleed reduction in old documents. In: CVPR (2009)

  63. Lu, Z., Wu, Z., Brown, M.S.: Interactive degraded document binarization: an example (and case) for interactive computer vision. In: WACV (2009)

  64. Mishra, A., Alahari, K., Jawahar, C.: Top-down and bottom-up cues for scene text recognition. In: CVPR (2012)

  65. Jaderberg, M., Vedaldi, A., Zisserman, A.: Deep features for text spotting. In: ECCV (2014)

  66. Novikova, T., Barinova, O., Kohli, P., Lempitsky, V.: Large-lexicon attribute-consistent text recognition in natural images. In: ECCV (2012)

  67. Shi, C., Wang, C., Xiao, B., Zhang, Y., Gao, S., Zhang, Z.: Scene text recognition using part-based tree-structured character detection. In: CVPR (2013)

  68. Jahangiri, M., Heesch, D.: Modified grabcut for unsupervised object segmentation. In: ICIP (2009)

  69. Khattab, D., Ebied, H.M., Hussein, A.S., Tolba, M.F.: Multi-label automatic grabcut for image segmentation. In: HIS (2014)

  70. Khattab, D., Ebied, H.M., Hussein, A.S., Tolba, M.F.: Color image segmentation based on different color space models using automatic GrabCut. Sci. World J. 2014, 10 (2014)

  71. Jegelka, S., Bilmes, J.: Submodularity beyond submodular energies: coupling edges in graph cuts. In: CVPR (2011)

Download references

Acknowledgements

This work was partially supported by the Indo-French Project No. 5302-1, EVEREST, funded by CEFIPRA. Anand Mishra was supported by Microsoft Corporation and Microsoft Research India under the Microsoft Research India Ph.D. fellowship award.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anand Mishra.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mishra, A., Alahari, K. & Jawahar, C.V. Unsupervised refinement of color and stroke features for text binarization. IJDAR 20, 105–121 (2017). https://doi.org/10.1007/s10032-017-0283-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10032-017-0283-9

Keywords

Navigation