Fast and accurate scene text understanding with image binarization and off-the-shelf OCR
- 703 Downloads
- 5 Citations
Abstract
While modern off-the-shelf OCR engines show particularly high accuracy on scanned text, text detection and recognition in natural images still remain a challenging problem. Here, we demonstrate that OCR engines can still perform well on this harder task as long as an appropriate image binarization is applied to input photographs. We propose a new binarization algorithm that is particularly suitable for scene text and systematically evaluate its performance along with 12 existing binarization methods. While most existing binarization techniques are designed specifically either for text detection or for recognition of localized text, our method shows very similar results for both large images and localized text regions. Therefore, it can be applied to large images directly with no need for re-binarization of localized text regions. We also propose the real-time variant of this method based on linear-time bilateral filtering. Evaluation across different metrics on established natural image text recognition benchmarks (ICDAR 2003 and ICDAR 2011) shows that our simple and fast image binarization method combined with off-the-shelf OCR engine achieves state-of-the-art performance for end-to-end text understanding in natural images and outperforms recent fancy methods.
Keywords
Natural scene binarization Scene text localization Document binarizationReferences
- 1.Adams, A., Gelfand, N., Dolson, J., Levoy, M.: Gaussian KD-trees for fast high-dimensional filtering. ACM Trans. Graph. (TOG) 28(3), 21 (2009)CrossRefGoogle Scholar
- 2.Badekas, E., Papamarkos, N.: Automatic evaluation of document binarization results. In: CIARP, pp. 1005–1014 (2005)Google Scholar
- 3.Boykov, Y., Jolly, M.P.: Interactive graph cuts for optimal boundary and region segmentation of objects in n-d images. In: ICCV, pp. 105–112 (2001)Google Scholar
- 4.Boykov, Y., Kolmogorov, V.: An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. In: IEEE Transactions on Pattern Analysis and Machine Intelligence (2004)Google Scholar
- 5.Boykov, Y., Kolmogorov, V.: An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE Trans. Pattern Anal. Mach. Intell. 26(9), 1124–1137 (2004)CrossRefGoogle Scholar
- 6.Clavelli, A., Karatzas, D., Lladós, J.: A framework for the assessment of text extraction algorithms on complex colour images. In: Document Analysis Systems, pp. 19–26 (2010)Google Scholar
- 7.Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: CVPR (2010)Google Scholar
- 8.Ezaki, N.: Text detection from natural scene images: towards a system for visually impaired persons. In. International Conference on Pattern Recognition, pp. 683–686 (2004)Google Scholar
- 9.Friedman, J., Hastie, T., Tibshirani, R.: Additive logistic regression: a statistical view of boosting. Ann. Stat. 38(2), 337–407 (2000)Google Scholar
- 10.Gatos, B., Ntirogiannis, K., Pratikakis, I.: Icdar 2009 document image binarization contest (dibco 2009). In: ICDAR, pp. 1375–1382 (2009)Google Scholar
- 11.Gatos, B., Pratikakis, I., Perantonis, S.J.: Text detection in indoor/outdoor scene images. In: CBDAR’05, pp. 127–132 (2005)Google Scholar
- 12.He, K., Sun, J., Tang, X.: Guided image filtering. In: Computer vision-ECCV 2010, pp. 1–14. Springer (2010)Google Scholar
- 13.Howe, N.: A laplacian energy for document binarization. In: ICDAR, pp. 6–10 (2011)Google Scholar
- 14.Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., i Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., de las Heras, L.P.: ICDAR 2013 robust reading competition. In: 2013 International Conference on Document Analysis and Recognition (ICDAR). IEEE (2013)Google Scholar
- 15.Kimmel, R., Bruckstein, A.M.: Regularized Laplacian zero crossings as optimal edge integrators. Int. J. Comput. Vis. 53(3), 225–243 (2003)CrossRefGoogle Scholar
- 16.Kittler, J., Illingworth, J.: Minimum error thresholding. Pattern Recogn. 19, 41–47 (1986)CrossRefGoogle Scholar
- 17.Lu, S., Su, B., Tan, C.L.: Document image binarization using background estimation and stroke edges. IJDAR 13(4), 303–314 (2010)CrossRefGoogle Scholar
- 18.Milyaev, S., Barinova, O., Novikova, T., Lempitsky, V., Kohli, P.: Image binarization for end-to-end text understanding in natural images. In: ICDAR (2013)Google Scholar
- 19.Minetto, R., Thome, N., Cord, M., Stolfi, J., Precioso, F., Guyomard, J., Leite, N.J.: Text detection and recognition in urban scenes. In: ICCV Workshops, pp. 227–234 (2011)Google Scholar
- 20.Mishra, A., Alahari, K., Jawahar, C.V.: An mrf model for binarization of natural scene text. In: ICDAR, pp. 11–16 (2011)Google Scholar
- 21.Neumann, L., Matas, J.: Estimating hidden parameters for text localization and recognition. In: Computer Vision Winter Workshop (2011)Google Scholar
- 22.Neumann, L., Matas, J.: Real-time scene text localization and recognition. In: CVPR, pp. 3538–3545 (2012)Google Scholar
- 23.Neumann, L., Matas, J.: Scene text localization and recognition with oriented stroke detection. In: 2013 IEEE International Conference on Computer Vision (ICCV 2013), pp. 97–104 (2013)Google Scholar
- 24.Niblack, W.: An introduction to digital image processing. Strandberg Publishing, Denmark (1985)Google Scholar
- 25.Ntirogiannis, K., Gatos, B., Pratikakis, I.: An objective evaluation methodology for document image binarization techniques. In: DAS, pp. 217–224 (2008)Google Scholar
- 26.Otsu, N.: A threshold selection method from gray level histograms. IEEE Trans. Syst. Man Cybern. 9, 62–66 (1979)CrossRefGoogle Scholar
- 27.Pan, Y.F., Hou, X., Liu, C.L.: Text localization in natural scene images based on conditional random field. In: ICDAR, pp. 6–10 (2009)Google Scholar
- 28.Pratikakis, I., Gatos, B., Ntirogiannis, K.: ICDAR 2011 document image binarization contest (DIBCO 2011). In: ICDAR, pp. 1506–1510 (2011)Google Scholar
- 29.Sauvola, J., Pietikinen, M.: Adaptive document image binarization. Pattern Recogn. 33, 225–236 (2000)Google Scholar
- 30.Wakahara, T., Kita, K.: Binarization of color character strings in scene images using k-means clustering and support vector machines. In: ICDAR, pp. 274–278 (2011)Google Scholar
- 31.Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: IEEE International Conference on Computer Vision (ICCV). Barcelona, Spain (2011)Google Scholar
- 32.Wang, T., Wu, D.J., Coates, A., Ng, A.Y.: End-to-end text recognition with convolutional neural networks. In: 21st International Conference on Pattern Recognition (ICPR), pp. 3304–3308 (2012)Google Scholar
- 33.Wolf, C., Doermann, D.: Binarization of low quality text using a markov random field model. In: Proceedings of International Conference on Pattern Recognition, pp. 160–163 (2002)Google Scholar
- 34.Wolf, C., Jolion, J.M.: Object count/area graphs for the evaluation of object detection and segmentation algorithms. Int. J. Doc. Anal. Recogn. 8(4), 280–296 (2006)CrossRefGoogle Scholar
- 35.Yamazoe, T., Etoh, M., Yoshimura, T., Tsujino, K.: Hypothesis preservation approach to scene text recognition with weighted finite-state transducer. In: ICDAR, pp. 359–363 (2011)Google Scholar
- 36.Yang, Q., Tan, K.H., Ahuja, N.: Real-time o (1) bilateral filtering. In: IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009, pp. 557–564. IEEE (2009)Google Scholar
- 37.Yang, Q.: Recursive bilateral filtering. In: ECCV (1), pp. 399–413 (2012)Google Scholar
- 38.Yao, C., Bai, X., Liu, W., Ma, Y., Tu, Z.: ICDAR 2011 document image binarization contest (DIBCO 2011). In: CVPR (2012)Google Scholar
- 39.Yoon, K.J., Kweon, I.S.: Adaptive support-weight approach for correspondence search. IEEE Trans. Pattern Anal. Mach. Intell. 28(4), 650–656 (2006)CrossRefGoogle Scholar
- 40.Zhu, K., Qi, F., Jiang, R., Xu, L., Kimaci, M., Wu, Y., Aizawa, T.: Using adaboost to detect and segment characters from natural scenes. In: Camera-Based Document Analysis and Recognition (CBDAR) (2005)Google Scholar