Fast and accurate scene text understanding with image binarization and off-the-shelf OCR

  • Sergey Milyaev
  • Olga Barinova
  • Tatiana Novikova
  • Pushmeet Kohli
  • Victor Lempitsky
Original Paper

Abstract

While modern off-the-shelf OCR engines show particularly high accuracy on scanned text, text detection and recognition in natural images still remain a challenging problem. Here, we demonstrate that OCR engines can still perform well on this harder task as long as an appropriate image binarization is applied to input photographs. We propose a new binarization algorithm that is particularly suitable for scene text and systematically evaluate its performance along with 12 existing binarization methods. While most existing binarization techniques are designed specifically either for text detection or for recognition of localized text, our method shows very similar results for both large images and localized text regions. Therefore, it can be applied to large images directly with no need for re-binarization of localized text regions. We also propose the real-time variant of this method based on linear-time bilateral filtering. Evaluation across different metrics on established natural image text recognition benchmarks (ICDAR 2003 and ICDAR 2011) shows that our simple and fast image binarization method combined with off-the-shelf OCR engine achieves state-of-the-art performance for end-to-end text understanding in natural images and outperforms recent fancy methods.

Keywords

Natural scene binarization Scene text localization  Document binarization 

References

  1. 1.
    Adams, A., Gelfand, N., Dolson, J., Levoy, M.: Gaussian KD-trees for fast high-dimensional filtering. ACM Trans. Graph. (TOG) 28(3), 21 (2009)CrossRefGoogle Scholar
  2. 2.
    Badekas, E., Papamarkos, N.: Automatic evaluation of document binarization results. In: CIARP, pp. 1005–1014 (2005)Google Scholar
  3. 3.
    Boykov, Y., Jolly, M.P.: Interactive graph cuts for optimal boundary and region segmentation of objects in n-d images. In: ICCV, pp. 105–112 (2001)Google Scholar
  4. 4.
    Boykov, Y., Kolmogorov, V.: An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. In: IEEE Transactions on Pattern Analysis and Machine Intelligence (2004)Google Scholar
  5. 5.
    Boykov, Y., Kolmogorov, V.: An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE Trans. Pattern Anal. Mach. Intell. 26(9), 1124–1137 (2004)CrossRefGoogle Scholar
  6. 6.
    Clavelli, A., Karatzas, D., Lladós, J.: A framework for the assessment of text extraction algorithms on complex colour images. In: Document Analysis Systems, pp. 19–26 (2010)Google Scholar
  7. 7.
    Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: CVPR (2010)Google Scholar
  8. 8.
    Ezaki, N.: Text detection from natural scene images: towards a system for visually impaired persons. In. International Conference on Pattern Recognition, pp. 683–686 (2004)Google Scholar
  9. 9.
    Friedman, J., Hastie, T., Tibshirani, R.: Additive logistic regression: a statistical view of boosting. Ann. Stat. 38(2), 337–407 (2000)Google Scholar
  10. 10.
    Gatos, B., Ntirogiannis, K., Pratikakis, I.: Icdar 2009 document image binarization contest (dibco 2009). In: ICDAR, pp. 1375–1382 (2009)Google Scholar
  11. 11.
    Gatos, B., Pratikakis, I., Perantonis, S.J.: Text detection in indoor/outdoor scene images. In: CBDAR’05, pp. 127–132 (2005)Google Scholar
  12. 12.
    He, K., Sun, J., Tang, X.: Guided image filtering. In: Computer vision-ECCV 2010, pp. 1–14. Springer (2010)Google Scholar
  13. 13.
    Howe, N.: A laplacian energy for document binarization. In: ICDAR, pp. 6–10 (2011)Google Scholar
  14. 14.
    Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., i Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., de las Heras, L.P.: ICDAR 2013 robust reading competition. In: 2013 International Conference on Document Analysis and Recognition (ICDAR). IEEE (2013)Google Scholar
  15. 15.
    Kimmel, R., Bruckstein, A.M.: Regularized Laplacian zero crossings as optimal edge integrators. Int. J. Comput. Vis. 53(3), 225–243 (2003)CrossRefGoogle Scholar
  16. 16.
    Kittler, J., Illingworth, J.: Minimum error thresholding. Pattern Recogn. 19, 41–47 (1986)CrossRefGoogle Scholar
  17. 17.
    Lu, S., Su, B., Tan, C.L.: Document image binarization using background estimation and stroke edges. IJDAR 13(4), 303–314 (2010)CrossRefGoogle Scholar
  18. 18.
    Milyaev, S., Barinova, O., Novikova, T., Lempitsky, V., Kohli, P.: Image binarization for end-to-end text understanding in natural images. In: ICDAR (2013)Google Scholar
  19. 19.
    Minetto, R., Thome, N., Cord, M., Stolfi, J., Precioso, F., Guyomard, J., Leite, N.J.: Text detection and recognition in urban scenes. In: ICCV Workshops, pp. 227–234 (2011)Google Scholar
  20. 20.
    Mishra, A., Alahari, K., Jawahar, C.V.: An mrf model for binarization of natural scene text. In: ICDAR, pp. 11–16 (2011)Google Scholar
  21. 21.
    Neumann, L., Matas, J.: Estimating hidden parameters for text localization and recognition. In: Computer Vision Winter Workshop (2011)Google Scholar
  22. 22.
    Neumann, L., Matas, J.: Real-time scene text localization and recognition. In: CVPR, pp. 3538–3545 (2012)Google Scholar
  23. 23.
    Neumann, L., Matas, J.: Scene text localization and recognition with oriented stroke detection. In: 2013 IEEE International Conference on Computer Vision (ICCV 2013), pp. 97–104 (2013)Google Scholar
  24. 24.
    Niblack, W.: An introduction to digital image processing. Strandberg Publishing, Denmark (1985)Google Scholar
  25. 25.
    Ntirogiannis, K., Gatos, B., Pratikakis, I.: An objective evaluation methodology for document image binarization techniques. In: DAS, pp. 217–224 (2008)Google Scholar
  26. 26.
    Otsu, N.: A threshold selection method from gray level histograms. IEEE Trans. Syst. Man Cybern. 9, 62–66 (1979)CrossRefGoogle Scholar
  27. 27.
    Pan, Y.F., Hou, X., Liu, C.L.: Text localization in natural scene images based on conditional random field. In: ICDAR, pp. 6–10 (2009)Google Scholar
  28. 28.
    Pratikakis, I., Gatos, B., Ntirogiannis, K.: ICDAR 2011 document image binarization contest (DIBCO 2011). In: ICDAR, pp. 1506–1510 (2011)Google Scholar
  29. 29.
    Sauvola, J., Pietikinen, M.: Adaptive document image binarization. Pattern Recogn. 33, 225–236 (2000)Google Scholar
  30. 30.
    Wakahara, T., Kita, K.: Binarization of color character strings in scene images using k-means clustering and support vector machines. In: ICDAR, pp. 274–278 (2011)Google Scholar
  31. 31.
    Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: IEEE International Conference on Computer Vision (ICCV). Barcelona, Spain (2011)Google Scholar
  32. 32.
    Wang, T., Wu, D.J., Coates, A., Ng, A.Y.: End-to-end text recognition with convolutional neural networks. In: 21st International Conference on Pattern Recognition (ICPR), pp. 3304–3308 (2012)Google Scholar
  33. 33.
    Wolf, C., Doermann, D.: Binarization of low quality text using a markov random field model. In: Proceedings of International Conference on Pattern Recognition, pp. 160–163 (2002)Google Scholar
  34. 34.
    Wolf, C., Jolion, J.M.: Object count/area graphs for the evaluation of object detection and segmentation algorithms. Int. J. Doc. Anal. Recogn. 8(4), 280–296 (2006)CrossRefGoogle Scholar
  35. 35.
    Yamazoe, T., Etoh, M., Yoshimura, T., Tsujino, K.: Hypothesis preservation approach to scene text recognition with weighted finite-state transducer. In: ICDAR, pp. 359–363 (2011)Google Scholar
  36. 36.
    Yang, Q., Tan, K.H., Ahuja, N.: Real-time o (1) bilateral filtering. In: IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009, pp. 557–564. IEEE (2009)Google Scholar
  37. 37.
    Yang, Q.: Recursive bilateral filtering. In: ECCV (1), pp. 399–413 (2012)Google Scholar
  38. 38.
    Yao, C., Bai, X., Liu, W., Ma, Y., Tu, Z.: ICDAR 2011 document image binarization contest (DIBCO 2011). In: CVPR (2012)Google Scholar
  39. 39.
    Yoon, K.J., Kweon, I.S.: Adaptive support-weight approach for correspondence search. IEEE Trans. Pattern Anal. Mach. Intell. 28(4), 650–656 (2006)CrossRefGoogle Scholar
  40. 40.
    Zhu, K., Qi, F., Jiang, R., Xu, L., Kimaci, M., Wu, Y., Aizawa, T.: Using adaboost to detect and segment characters from natural scenes. In: Camera-Based Document Analysis and Recognition (CBDAR) (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  • Sergey Milyaev
    • 1
  • Olga Barinova
    • 1
  • Tatiana Novikova
    • 1
  • Pushmeet Kohli
    • 2
  • Victor Lempitsky
    • 3
  1. 1.Lomonosov Moscow State UniversityMoscowRussia
  2. 2.Microsoft ResearchCambridgeUK
  3. 3.Skolkovo Institute of Science and TechnologyMoscowRussia

Personalised recommendations