Unsupervised refinement of color and stroke features for text binarization

Mishra, Anand; Alahari, Karteek; Jawahar, C. V.

doi:10.1007/s10032-017-0283-9

Unsupervised refinement of color and stroke features for text binarization

Original Paper
Published: 03 April 2017

Volume 20, pages 105–121, (2017)
Cite this article

International Journal on Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

670 Accesses
13 Citations
Explore all metrics

Abstract

Color and strokes are the salient features of text regions in an image. In this work, we use both these features as cues, and introduce a novel energy function to formulate the text binarization problem. The minimum of this energy function corresponds to the optimal binarization. We minimize the energy function with an iterative graph cut-based algorithm. Our model is robust to variations in foreground and background as we learn Gaussian mixture models for color and strokes in each iteration of the graph cut. We show results on word images from the challenging ICDAR 2003/2011, born-digital image and street view text datasets, as well as full scene images containing text from ICDAR 2013 datasets, and compare our performance with state-of-the-art methods. Our approach shows significant improvements in performance under a variety of performance measures commonly used to assess text binarization schemes. In addition, our method adapts to diverse document images, like text in videos, handwritten text images.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Text Extraction from Scene Images Through Local Binary Pattern and Business Features Based Color Image Segmentation

Edge color transform: a new operator for natural scene text localization

Article 25 April 2017

A Variance Based Image Binarization Scheme and Its Application in Text Segmentation

Notes

Other color spaces such as CMYK or HSV can also be used.
The stroke-based term is computed similarly with stroke width and intensity of each pixel generated from one of the 2c GMMs.
Source code for all the performance measures used in this work is available on our project Web site [24].
Skeleton, also known as morphological skeleton, is a medial axis representation of a binary image computed with morphological operators [59].
We thank the authors for providing the implementation of their methods.

References

Chen, Y., Wang, L.: Broken and degraded document images binarization. Neurocomputing 237, 272–280 (2017)
Jia, F., Shi, C., He, K., Wang, C., Xiao, B.: Document image binarization using structural symmetry of strokes. In: ICFHR, pp. 411–416 (2016)
Stathis, P., Kavallieratou, E., Papamarkos, N.: An evaluation technique for binarization algorithms. J. Univers. Comput. Sci. 14(18), 3011–3030 (2008)
Google Scholar
Howe, N.R.: A Laplacian energy for document binarization. In: ICDAR (2011)
Howe, N.R.: Document binarization with automatic parameter tuning. Int. J. Doc. Anal. Recognit. 16(3), 247–258 (2013)
Article Google Scholar
Valizadeh, M., Kabir, E.: Binarization of degraded document image based on feature space partitioning and classification. Int. J. Doc. Anal. Recognit. 15(1), 57–69 (2012)
Article Google Scholar
Lazzara, G., Géraud, T.: Efficient multiscale Sauvola’s binarization. Int. J. Doc. Anal. Recognit. 17(2), 105–123 (2014)
Article Google Scholar
Mishra, A., Alahari, K., Jawahar, C.V.: An MRF model for binarization of natural scene text. In: ICDAR (2011)
Milyaev, S., Barinova, O., Novikova, T., Kohli, P., Lempitsky, V.: Fast and accurate scene text understanding with image binarization and off-the-shelf OCR. Int. J. Doc. Anal. Recognit. 18(2), 169–182 (2015)
Article Google Scholar
Pratikakis, I., Gatos, B., Ntirogiannis, K.: ICDAR 2013 document image binarization contest (DIBCO 2013). In: ICDAR (2013)
Pratikakis, I., Gatos, B., Ntirogiannis, K.: ICFHR 2012 competition on handwritten document image binarization. In: ICFHR (2012)
Ntirogiannis, K., Gatos, B., Pratikakis, I.: ICFHR2014 competition on handwritten document image binarization (H-DIBCO 2014). In: ICFHR (2014)
Boykov, Y.Y., Jolly, M.-P.: Interactive graph cuts for optimal boundary and region segmentation of objects in ND images. In: ICCV (2001)
Rother, C., Kolmogorov, V., Blake, A.: GrabCut: interactive foreground extraction using iterated graph cuts. ACM Trans. Graph. 23(3), 309–314 (2004)
Article Google Scholar
Blake, A., Rother, C., Brown, M., Perez, P., Torr, P.: Interactive image segmentation using an adaptive GMMRF model. In: ECCV (2004)
Kittler, J., Illingworth, J., Föglein, J.: Threshold selection based on a simple image statistic. Comput. Vis. Graph. Image Process. 30(2), 125–147 (1985)
Article Google Scholar
Otsu, N.: A threshold selection method from gray-level histograms. Automatica 11(285–296), 23–27 (1975)
Google Scholar
Kasar, T., Kumar, J., Ramakrishnan, A.: Font and background color independent text binarization. In: CBDAR (2007)
Niblack, W.: An Introduction to Digital Image Processing. Strandberg Publishing Company (1985)
Sauvola, J., Pietikäinen, M.: Adaptive document image binarization. Pattern Recognit. 33(2), 225–236 (2000)
Article Google Scholar
Wolf, C., Doermann, D.: Binarization of low quality text using a Markov random field model. In: ICPR (2002)
Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., i Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazán, J., de las Heras, L.: ICDAR 2013 robust reading competition. In: ICDAR (2013)
Tesseract OCR. http://code.google.com/p/tesseract-ocr/
Project website. http://cvit.iiit.ac.in/projects/SceneTextUnderstanding/
Thillou, C., Gosselin, B.: Color binarization for complex camera-based images. In: Electronic imaging (2005)
Kita, K., Wakahara, T.: Binarization of color characters in scene images using k-means clustering and support vector machines. In: ICPR (2010)
Lu, S., Su, B., Tan, C.L.: Document image binarization using background estimation and stroke edges. Int. J. Doc. Anal. Recognit. 13(4), 303–314 (2010)
Article Google Scholar
Kuk, J.G., Cho, N.I.: Feature based binarization of document images degraded by uneven light condition. In: ICDAR (2009)
Peng, X., Setlur, S., Govindaraju, V., Sitaram, R.: Markov random field based binarization for hand-held devices captured document images. In: ICVGIP (2010)
Zhang, H., Liu, C., Yang, C., Ding, X., Wang, K.: An improved scene text extraction method using conditional random field and optical character recognition. In: ICDAR (2011)
Pan, Y.-F., Hou, X., Liu, C.-L.: Text localization in natural scene images based on conditional random field. In: ICDAR (2009)
Hebert, D., Nicolas, S., Paquet, T.: Discrete CRF based combination framework for document image binarization. In: ICDAR (2013)
Gatos, B., Pratikakis, I., Kepene, K., Perantonis, S.: Text detection in indoor/outdoor scene images. In: CBDAR (2005)
Ezaki, N., Bulacu, M., Schomaker, L.: Text detection from natural scene images: towards a system for visually impaired persons. In: ICPR (2004)
Gomez, L., Karatzas, D.: A fast hierarchical method for multi-script and arbitrary oriented scene text extraction. arXiv preprint arXiv:1407.7504 (2014)
Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: CVPR (2010)
Feild, J., Learned-Miller, E.: Scene text recognition with bilateral regression. University of Massachusetts-Amherst, Computer Science Research Center, Tech. Rep. UM-CS-2012-021 (2013)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005)
Tian, S., Lu, S., Su, B., Tan, C.L.: Scene text segmentation with multi-level maximally stable extremal regions. In: ICPR (2014)
Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide baseline stereo from maximally stable extremal regions. In: BMVC (2002)
Boykov, Y., Kolmogorov, V.: An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE Trans. Pattern Anal. Mach. Intell. 26(9), 1124–1137 (2004)
Article MATH Google Scholar
Kolmogorov, V., Zabin, R.: What energy functions can be minimized via graph cuts? IEEE Trans. Pattern Anal. Mach. Intell. 26(2), 147–159 (2004)
Article Google Scholar
Boros, E., Hammer, P.L.: Pseudo-boolean optimization. Discrete Appl. Math. 123(1), 155–225 (2002)
Article MathSciNet MATH Google Scholar
Reynolds, D.A.: Gaussian mixture models. In: Encyclopedia of Biometrics, Second Edition, pp. 827–832 (2015)
Sosa, L.P., Lucas, S.M., Panaretos, A., Sosa, L., Tang, A., Wong, S., Young, R.: ICDAR 2003 robust reading competitions. In: ICDAR (2003)
Shahab, A., Shafait, F., Dengel, A.: ICDAR 2011 robust reading competition challenge 2: reading text in scene images. In: ICDAR (2011)
Karatzas, D., Mestre, S.R., Mas, J., Nourbakhsh, F., Roy, P.P.: ICDAR 2011 robust reading competition—challenge 1: reading text in born-digital images (web and email). In: ICDAR (2011)
Wang, K., Belongie, S.: Word spotting in the wild. In: ECCV (2010)
ICDAR 2015 Competition on Video Script Identification. http://www.ict.griffith.edu.au/cvsi2015/
ICDAR 2003 dataset. http://algoval.essex.ac.uk/icdar/RobustWord.html
ICDAR 2011 dataset. http://robustreading.opendfki.de/trac/wiki/SceneText
Kumar, D., Prasad, M., Ramakrishnan, A.: Benchmarking recognition results on camera captured word image data sets. In: DAR (2012)
Milyaev, S., Barinova, O., Novikova, T., Kohli, P., Lempitsky, V.: Image binarization for end-to-end text understanding in natural images. In: ICDAR (2013)
Clavelli, A., Karatzas, D., Lladós, J.: A framework for the assessment of text extraction algorithms on complex colour images. In: DAS (2010)
Lopresti, D., Zhou, J.: Locating and recognizing text in WWW images. Inf. Retr. 2(2–3), 177–206 (2000)
Article Google Scholar
Karatzas, D., Antonacopoulos, A.: Colour text segmentation in web images based on human perception. Image Vis. Comput. 25(5), 564–577 (2007)
Article MATH Google Scholar
Kumar, D., Prasad, M.A., Ramakrishnan, A.: NESP: nonlinear enhancement and selection of plane for optimal segmentation and recognition of scene word images. In: IS&T/SPIE Electronic Imaging (2013)
Smith, E.H.B.: An analysis of binarization ground truthing. In: DAS (2010)
Gonzalez, R.C., Woods, R.E.: Digital Image Process. Prentice-Hall of India Pvt. Ltd, Delhi (2005)
Google Scholar
ABBYY Finereader 8.0. http://www.abbyy.com/
Shahab, A., Shafait, F., Dengel, A.: ICDAR2011 robust reading competition challenge 2: Reading text in scene images. In: ICDAR (2011)
Lu, Z., Wu, Z., Brown, M.S.: Directed assistance for ink-bleed reduction in old documents. In: CVPR (2009)
Lu, Z., Wu, Z., Brown, M.S.: Interactive degraded document binarization: an example (and case) for interactive computer vision. In: WACV (2009)
Mishra, A., Alahari, K., Jawahar, C.: Top-down and bottom-up cues for scene text recognition. In: CVPR (2012)
Jaderberg, M., Vedaldi, A., Zisserman, A.: Deep features for text spotting. In: ECCV (2014)
Novikova, T., Barinova, O., Kohli, P., Lempitsky, V.: Large-lexicon attribute-consistent text recognition in natural images. In: ECCV (2012)
Shi, C., Wang, C., Xiao, B., Zhang, Y., Gao, S., Zhang, Z.: Scene text recognition using part-based tree-structured character detection. In: CVPR (2013)
Jahangiri, M., Heesch, D.: Modified grabcut for unsupervised object segmentation. In: ICIP (2009)
Khattab, D., Ebied, H.M., Hussein, A.S., Tolba, M.F.: Multi-label automatic grabcut for image segmentation. In: HIS (2014)
Khattab, D., Ebied, H.M., Hussein, A.S., Tolba, M.F.: Color image segmentation based on different color space models using automatic GrabCut. Sci. World J. 2014, 10 (2014)
Jegelka, S., Bilmes, J.: Submodularity beyond submodular energies: coupling edges in graph cuts. In: CVPR (2011)

Download references

Acknowledgements

This work was partially supported by the Indo-French Project No. 5302-1, EVEREST, funded by CEFIPRA. Anand Mishra was supported by Microsoft Corporation and Microsoft Research India under the Microsoft Research India Ph.D. fellowship award.

Author information

Authors and Affiliations

Center for Visual Information Technology, IIIT Hyderabad, Hyderabad, India
Anand Mishra & C. V. Jawahar
Thoth Team, Inria, Grenoble, France
Karteek Alahari
Laboratoire Jean Kuntzmann, Université Grenoble Alpes, Grenoble, France
Karteek Alahari

Authors

Anand Mishra
View author publications
You can also search for this author in PubMed Google Scholar
Karteek Alahari
View author publications
You can also search for this author in PubMed Google Scholar
C. V. Jawahar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anand Mishra.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mishra, A., Alahari, K. & Jawahar, C.V. Unsupervised refinement of color and stroke features for text binarization. IJDAR 20, 105–121 (2017). https://doi.org/10.1007/s10032-017-0283-9

Download citation

Received: 04 December 2015
Revised: 28 February 2017
Accepted: 15 March 2017
Published: 03 April 2017
Issue Date: June 2017
DOI: https://doi.org/10.1007/s10032-017-0283-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Unsupervised refinement of color and stroke features for text binarization

Abstract

Access this article

Similar content being viewed by others

Text Extraction from Scene Images Through Local Binary Pattern and Business Features Based Color Image Segmentation

Edge color transform: a new operator for natural scene text localization

A Variance Based Image Binarization Scheme and Its Application in Text Segmentation

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Unsupervised refinement of color and stroke features for text binarization

Abstract

Access this article

Similar content being viewed by others

Text Extraction from Scene Images Through Local Binary Pattern and Business Features Based Color Image Segmentation

Edge color transform: a new operator for natural scene text localization

A Variance Based Image Binarization Scheme and Its Application in Text Segmentation

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation