Abstract
Color and strokes are the salient features of text regions in an image. In this work, we use both these features as cues, and introduce a novel energy function to formulate the text binarization problem. The minimum of this energy function corresponds to the optimal binarization. We minimize the energy function with an iterative graph cut-based algorithm. Our model is robust to variations in foreground and background as we learn Gaussian mixture models for color and strokes in each iteration of the graph cut. We show results on word images from the challenging ICDAR 2003/2011, born-digital image and street view text datasets, as well as full scene images containing text from ICDAR 2013 datasets, and compare our performance with state-of-the-art methods. Our approach shows significant improvements in performance under a variety of performance measures commonly used to assess text binarization schemes. In addition, our method adapts to diverse document images, like text in videos, handwritten text images.
Similar content being viewed by others
Notes
Other color spaces such as CMYK or HSV can also be used.
The stroke-based term is computed similarly with stroke width and intensity of each pixel generated from one of the 2c GMMs.
Source code for all the performance measures used in this work is available on our project Web site [24].
Skeleton, also known as morphological skeleton, is a medial axis representation of a binary image computed with morphological operators [59].
We thank the authors for providing the implementation of their methods.
References
Chen, Y., Wang, L.: Broken and degraded document images binarization. Neurocomputing 237, 272–280 (2017)
Jia, F., Shi, C., He, K., Wang, C., Xiao, B.: Document image binarization using structural symmetry of strokes. In: ICFHR, pp. 411–416 (2016)
Stathis, P., Kavallieratou, E., Papamarkos, N.: An evaluation technique for binarization algorithms. J. Univers. Comput. Sci. 14(18), 3011–3030 (2008)
Howe, N.R.: A Laplacian energy for document binarization. In: ICDAR (2011)
Howe, N.R.: Document binarization with automatic parameter tuning. Int. J. Doc. Anal. Recognit. 16(3), 247–258 (2013)
Valizadeh, M., Kabir, E.: Binarization of degraded document image based on feature space partitioning and classification. Int. J. Doc. Anal. Recognit. 15(1), 57–69 (2012)
Lazzara, G., Géraud, T.: Efficient multiscale Sauvola’s binarization. Int. J. Doc. Anal. Recognit. 17(2), 105–123 (2014)
Mishra, A., Alahari, K., Jawahar, C.V.: An MRF model for binarization of natural scene text. In: ICDAR (2011)
Milyaev, S., Barinova, O., Novikova, T., Kohli, P., Lempitsky, V.: Fast and accurate scene text understanding with image binarization and off-the-shelf OCR. Int. J. Doc. Anal. Recognit. 18(2), 169–182 (2015)
Pratikakis, I., Gatos, B., Ntirogiannis, K.: ICDAR 2013 document image binarization contest (DIBCO 2013). In: ICDAR (2013)
Pratikakis, I., Gatos, B., Ntirogiannis, K.: ICFHR 2012 competition on handwritten document image binarization. In: ICFHR (2012)
Ntirogiannis, K., Gatos, B., Pratikakis, I.: ICFHR2014 competition on handwritten document image binarization (H-DIBCO 2014). In: ICFHR (2014)
Boykov, Y.Y., Jolly, M.-P.: Interactive graph cuts for optimal boundary and region segmentation of objects in ND images. In: ICCV (2001)
Rother, C., Kolmogorov, V., Blake, A.: GrabCut: interactive foreground extraction using iterated graph cuts. ACM Trans. Graph. 23(3), 309–314 (2004)
Blake, A., Rother, C., Brown, M., Perez, P., Torr, P.: Interactive image segmentation using an adaptive GMMRF model. In: ECCV (2004)
Kittler, J., Illingworth, J., Föglein, J.: Threshold selection based on a simple image statistic. Comput. Vis. Graph. Image Process. 30(2), 125–147 (1985)
Otsu, N.: A threshold selection method from gray-level histograms. Automatica 11(285–296), 23–27 (1975)
Kasar, T., Kumar, J., Ramakrishnan, A.: Font and background color independent text binarization. In: CBDAR (2007)
Niblack, W.: An Introduction to Digital Image Processing. Strandberg Publishing Company (1985)
Sauvola, J., Pietikäinen, M.: Adaptive document image binarization. Pattern Recognit. 33(2), 225–236 (2000)
Wolf, C., Doermann, D.: Binarization of low quality text using a Markov random field model. In: ICPR (2002)
Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., i Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazán, J., de las Heras, L.: ICDAR 2013 robust reading competition. In: ICDAR (2013)
Tesseract OCR. http://code.google.com/p/tesseract-ocr/
Project website. http://cvit.iiit.ac.in/projects/SceneTextUnderstanding/
Thillou, C., Gosselin, B.: Color binarization for complex camera-based images. In: Electronic imaging (2005)
Kita, K., Wakahara, T.: Binarization of color characters in scene images using k-means clustering and support vector machines. In: ICPR (2010)
Lu, S., Su, B., Tan, C.L.: Document image binarization using background estimation and stroke edges. Int. J. Doc. Anal. Recognit. 13(4), 303–314 (2010)
Kuk, J.G., Cho, N.I.: Feature based binarization of document images degraded by uneven light condition. In: ICDAR (2009)
Peng, X., Setlur, S., Govindaraju, V., Sitaram, R.: Markov random field based binarization for hand-held devices captured document images. In: ICVGIP (2010)
Zhang, H., Liu, C., Yang, C., Ding, X., Wang, K.: An improved scene text extraction method using conditional random field and optical character recognition. In: ICDAR (2011)
Pan, Y.-F., Hou, X., Liu, C.-L.: Text localization in natural scene images based on conditional random field. In: ICDAR (2009)
Hebert, D., Nicolas, S., Paquet, T.: Discrete CRF based combination framework for document image binarization. In: ICDAR (2013)
Gatos, B., Pratikakis, I., Kepene, K., Perantonis, S.: Text detection in indoor/outdoor scene images. In: CBDAR (2005)
Ezaki, N., Bulacu, M., Schomaker, L.: Text detection from natural scene images: towards a system for visually impaired persons. In: ICPR (2004)
Gomez, L., Karatzas, D.: A fast hierarchical method for multi-script and arbitrary oriented scene text extraction. arXiv preprint arXiv:1407.7504 (2014)
Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: CVPR (2010)
Feild, J., Learned-Miller, E.: Scene text recognition with bilateral regression. University of Massachusetts-Amherst, Computer Science Research Center, Tech. Rep. UM-CS-2012-021 (2013)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005)
Tian, S., Lu, S., Su, B., Tan, C.L.: Scene text segmentation with multi-level maximally stable extremal regions. In: ICPR (2014)
Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide baseline stereo from maximally stable extremal regions. In: BMVC (2002)
Boykov, Y., Kolmogorov, V.: An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE Trans. Pattern Anal. Mach. Intell. 26(9), 1124–1137 (2004)
Kolmogorov, V., Zabin, R.: What energy functions can be minimized via graph cuts? IEEE Trans. Pattern Anal. Mach. Intell. 26(2), 147–159 (2004)
Boros, E., Hammer, P.L.: Pseudo-boolean optimization. Discrete Appl. Math. 123(1), 155–225 (2002)
Reynolds, D.A.: Gaussian mixture models. In: Encyclopedia of Biometrics, Second Edition, pp. 827–832 (2015)
Sosa, L.P., Lucas, S.M., Panaretos, A., Sosa, L., Tang, A., Wong, S., Young, R.: ICDAR 2003 robust reading competitions. In: ICDAR (2003)
Shahab, A., Shafait, F., Dengel, A.: ICDAR 2011 robust reading competition challenge 2: reading text in scene images. In: ICDAR (2011)
Karatzas, D., Mestre, S.R., Mas, J., Nourbakhsh, F., Roy, P.P.: ICDAR 2011 robust reading competition—challenge 1: reading text in born-digital images (web and email). In: ICDAR (2011)
Wang, K., Belongie, S.: Word spotting in the wild. In: ECCV (2010)
ICDAR 2015 Competition on Video Script Identification. http://www.ict.griffith.edu.au/cvsi2015/
ICDAR 2003 dataset. http://algoval.essex.ac.uk/icdar/RobustWord.html
ICDAR 2011 dataset. http://robustreading.opendfki.de/trac/wiki/SceneText
Kumar, D., Prasad, M., Ramakrishnan, A.: Benchmarking recognition results on camera captured word image data sets. In: DAR (2012)
Milyaev, S., Barinova, O., Novikova, T., Kohli, P., Lempitsky, V.: Image binarization for end-to-end text understanding in natural images. In: ICDAR (2013)
Clavelli, A., Karatzas, D., Lladós, J.: A framework for the assessment of text extraction algorithms on complex colour images. In: DAS (2010)
Lopresti, D., Zhou, J.: Locating and recognizing text in WWW images. Inf. Retr. 2(2–3), 177–206 (2000)
Karatzas, D., Antonacopoulos, A.: Colour text segmentation in web images based on human perception. Image Vis. Comput. 25(5), 564–577 (2007)
Kumar, D., Prasad, M.A., Ramakrishnan, A.: NESP: nonlinear enhancement and selection of plane for optimal segmentation and recognition of scene word images. In: IS&T/SPIE Electronic Imaging (2013)
Smith, E.H.B.: An analysis of binarization ground truthing. In: DAS (2010)
Gonzalez, R.C., Woods, R.E.: Digital Image Process. Prentice-Hall of India Pvt. Ltd, Delhi (2005)
ABBYY Finereader 8.0. http://www.abbyy.com/
Shahab, A., Shafait, F., Dengel, A.: ICDAR2011 robust reading competition challenge 2: Reading text in scene images. In: ICDAR (2011)
Lu, Z., Wu, Z., Brown, M.S.: Directed assistance for ink-bleed reduction in old documents. In: CVPR (2009)
Lu, Z., Wu, Z., Brown, M.S.: Interactive degraded document binarization: an example (and case) for interactive computer vision. In: WACV (2009)
Mishra, A., Alahari, K., Jawahar, C.: Top-down and bottom-up cues for scene text recognition. In: CVPR (2012)
Jaderberg, M., Vedaldi, A., Zisserman, A.: Deep features for text spotting. In: ECCV (2014)
Novikova, T., Barinova, O., Kohli, P., Lempitsky, V.: Large-lexicon attribute-consistent text recognition in natural images. In: ECCV (2012)
Shi, C., Wang, C., Xiao, B., Zhang, Y., Gao, S., Zhang, Z.: Scene text recognition using part-based tree-structured character detection. In: CVPR (2013)
Jahangiri, M., Heesch, D.: Modified grabcut for unsupervised object segmentation. In: ICIP (2009)
Khattab, D., Ebied, H.M., Hussein, A.S., Tolba, M.F.: Multi-label automatic grabcut for image segmentation. In: HIS (2014)
Khattab, D., Ebied, H.M., Hussein, A.S., Tolba, M.F.: Color image segmentation based on different color space models using automatic GrabCut. Sci. World J. 2014, 10 (2014)
Jegelka, S., Bilmes, J.: Submodularity beyond submodular energies: coupling edges in graph cuts. In: CVPR (2011)
Acknowledgements
This work was partially supported by the Indo-French Project No. 5302-1, EVEREST, funded by CEFIPRA. Anand Mishra was supported by Microsoft Corporation and Microsoft Research India under the Microsoft Research India Ph.D. fellowship award.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Mishra, A., Alahari, K. & Jawahar, C.V. Unsupervised refinement of color and stroke features for text binarization. IJDAR 20, 105–121 (2017). https://doi.org/10.1007/s10032-017-0283-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10032-017-0283-9