Skip to main content
Log in

TextCatcher: a method to detect curved and challenging text in natural scenes

  • Original Paper
  • Published:
International Journal on Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Abstract

In this paper, we propose a text detection algorithm which is hybrid and multi-scale. First, it relies on a connected component-based approach: After the segmentation of the image, a classification step using a new wavelet descriptor spots the letters. A new graph modeling and its traversal procedure allow to form candidate text areas. Second, a texture-based approach discards the false positives. Finally, the detected text areas are precisely cut out and a new binarization step is introduced. The main advantage of our method is that few assumptions are put forward. Thus, “challenging texts” like multi-sized, multi-colored, multi-oriented or curved text can be localized. The efficiency of TextCatcher has been validated on three different datasets: Two come from the ICDAR competition, and the third one contains photographs we have taken with various daily life texts. We present both qualitative and quantitative results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. http://www.itowns.fr/.

  2. The scores of participating are freely available [22].

  3. Dataset is available at https://www.lrde.epita.fr/~jonathan/

  4. https://github.com/mop/LTPTextDetector.

  5. https://github.com/Itseez/opencv_contrib/blob/master/modules/text/samples/end_to_end_recognition.cpp.

References

  1. Abrash, M.: Michael Abrash’s Graphics Programming Black Book, 10th edn. Coriolis Group Books, Scottsdale (1997)

    Google Scholar 

  2. Anthimopoulos, M., Gatos, B., Pratikakis, I.: Detection of artificial and scene text in images and video frames. Pattern Anal. Appl. 16(3), 431–446 (2013)

    Article  MathSciNet  Google Scholar 

  3. Arth, C., Limberger, F., Bischof, H.: Real-time license plate recognition on an embedded dsp-platform. In: Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2007)

  4. Bai, B., Yin, F., Liu, C.L.: Scene text localization using gradient local correlation. In: International Conference on Document Analysis and Recognition, pp. 1380–1384 (2013)

  5. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. Conf. Comput. Vis. Pattern Recognit. 1, 886–893 (2005)

  6. Daubechies, I.: Ten Lectures on Wavelets. Society for Industrial and Applied Mathematics, Philadelphia (1992)

    Book  MATH  Google Scholar 

  7. Daubechies, I., Sweldens, W.: Factoring wavelet transforms into lifting steps. J. Fourier Anal. Appl. 4(3), 245–267 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  8. Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: Conference on Computer Vision and Pattern Recognition, pp. 2963–2970 (2010). doi:10.1109/CVPR.2010.5540041

  9. Fabrizio, J., Marcotegui, B., Cord, M.: Text segmentation in natural scenes using toggle-mapping. In: International Conference on Image Processing, pp. 2349–2352 (2009)

  10. Fabrizio, J., Marcotegui, B., Cord, M.: Text detection in street level image. Pattern Anal. Appl. 16(4), 519–533 (2013)

    Article  MathSciNet  Google Scholar 

  11. Gao, S., Wang, C., Xiao, B., Shi, C., Zhang, Y., Lv, Z., Shi, Y.: Adaptive scene text detection based on transferring adaboost. In: International Conference on Document Analysis and Recognition, pp. 388–392 (2013)

  12. Gatos, B., Ntirogiannis, K., Pratikakis, I.: Icdar document image binarization contest. In: International Conference on Document Analysis and Recognition (2009)

  13. Gomez, L., Karatzas, D.: Multi-script text extraction from natural scenes. In: International Conference on Document Analysis and Recognition, pp. 467–471 (2013)

  14. Haralick, R., Shanmugam, K., Dinstein, I.: Textural features for image classification. IEEE Trans. Syst. Man Cybern. 3(6), 610–621 (1973)

    Article  Google Scholar 

  15. Huang, W., Qiao, Y., Tang, X.: Robust scene text detection with convolution neural network induced MSER trees. In: European Conference on Computer Vision, pp. 497–511 (2014)

  16. Jaderberg, M., Vedaldi, A., Zisserman, A.: Deep features for text spotting. In: European Conference on Computer Vision (2014)

  17. Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Proceedings of ECML, pp. 137–142 (1998)

  18. Jung, K., Kim, I.K., Jain, A.K.: Text information extraction in images and video: a survey. Pattern Recogn. 37(5), 977–997 (2004)

    Article  Google Scholar 

  19. Kan, C., Srinath, M.D.: Scene text localization and recognition with oriented stroke detection. In: International Conference on Computer Vision, pp. 97–104 (2013)

  20. Karaoglu, S., Fernando, B., Tremeau, A.: A novel algorithm for text detection and localization in natural scene images. In: Proceedings of DICTA, pp. 635–642 (2010)

  21. Karaoglu, S., Gemert, J., Gevers, T.: Object reading: text recognition for object recognition. In: Proceedings of ECCVW-IFCVCR, pp. 456–465 (2012)

  22. Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., de las Heras, L.P.: ICDAR 2013 robust reading competition. In: International Conference on Document Analysis and Recognition, pp. 1484–1493 (2013)

  23. Kasar, T., Agarai, G.: Multi-script and multi-oriented text localization from scene images. In: International Workshop on Camera-Based Document Analysis and Recognition, pp. 1–14 (2012)

  24. Li, R., Wang, S., Shi, Z.: A two level algorithm for text detection in natural scene images. In: International Workshop on Document Analysis Systems (2014)

  25. Li, Y., Shen, C., Jia, W., van den Hengel, A.: Leveraging surrounding context for scene text detection. In: International Conference on Image Processing, pp. 2264–2268 (2013)

  26. Mao, J., Li, H., Zhou, W., Yan, S., Tian, Q.: Scale based region growing for scene text detection. In: International conference on MultiMedia, pp. 1007–1016 (2013)

  27. Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide baseline stereo from maximally stable extremal regions. In: British Machine Vision Conference, pp. 384–393 (2002)

  28. Meng, Q., Song, Y.: Text detection in natural scenes with salient region. In: International Workshop on Document Analysis Systems, pp. 384–388 (2012)

  29. Merino-Gracia, C., Lenc, K., Mirmehdi, M.: A head-mounted device for recognizing text in natural scenes. In: International Workshop on Camera-Based Document Analysis and Recognition, pp. 29–41 (2011)

  30. Minetto, R., Thome, N., Cord, M., Leite, N.J., Stolfi, J.: T-hog: an effective gradient-based descriptor for single line text regions. Pattern Recogn. 46(3), 1078–1090 (2013)

    Article  Google Scholar 

  31. Neumann, L., Matas, J.: A method for text localization and recognition in real-world images. In: Asian Conference on Computer Vision, pp. 770–783 (2011)

  32. Neumann, L., Matas, J.: Real-time scene text localization and recognition. In: Conference on Computer Vision and Pattern Recognition, pp. 3538–3545 (2012)

  33. Neumann, L., Matas, J.: On combining multiple segmentations in scene text recognition. In: International Conference on Document Analysis and Recognition, pp. 523–527 (2013)

  34. Ojala, T., Pietikinen, M., Harwood, D.: A comparative study of texture measures with classification based on featured distributions. Pattern Recogn. 29(1), 51–59 (1996)

    Article  Google Scholar 

  35. Olena Team: Milena, generic c++ library for image processing and pattern recognition. https://www.lrde.epita.fr/wiki/Olena/Milena

  36. Opitz, M., Diem, M., Fiel S. and Kleber, F., Sablatnig: End-to-end text recognition with local ternary patterns, mser and deep convolutional nets. In: International Workshop on Document Analysis Systems (2014)

  37. Phan, T.Q., Shivakumara, P., Tan, C.L.: Detecting text in the real world. In: International conference on MultiMedia, pp. 765–768 (2012)

  38. Prakash, S., Ravishankar, M.: Multi-oriented video text detection and extraction using dct feature extraction and projection based rotation calculation. In: Proceedings of ICACCI, pp. 714–718 (2013)

  39. Serra, J.: Toggle mappings. In: Simon, J.C. (ed.) From pixels to features, pp. 61–72. Elsevier, North-Holland (1989)

    Google Scholar 

  40. Shi, C., Wang, C., Xiao, B., Zhang, Y., Gao, S.: Scene text detection using graph model built upon maximally stable extremal regions. Pattern Recogn. Lett. 34(2), 107–116 (2013)

    Article  Google Scholar 

  41. Shi, C., Wang, C., Xiao, B., Zhang, Y., Gao, S., Zhang, Z.: Scene text recognition using part-based tree-structured character detection. In: Conference on Computer Vision and Pattern Recognition, pp. 2961–2968 (2013)

  42. Shivakumara, P., Basavaraju, H.T., Guru, D.S., Tan, C.L.: Detection of curved text in video: Quad tree based method. In: International Conference on Document Analysis and Recognition, pp. 594–598 (2013)

  43. Sumathi, C.P., Santhanam, T., Gayathri, G.: A survey on various approaches of text extraction in images. Int. J. Comput. Sci. Eng. Surv. 3(4), 27–42 (2012)

    Article  Google Scholar 

  44. Tomer, P., Goyal, A.: Ant clustering based text detection in natural scene images. In: Proceedings of ICCCNT, pp. 1–7 (2013)

  45. Usevitch, B.E.: A tutorial on modern lossy wavelet image compression: foundations of JPEG 2000. IEEE Signal Process. Mag. 18(5), 22–35 (2001)

    Article  Google Scholar 

  46. Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: International Conference on Computer Vision, pp. 1457–1464 (2011)

  47. Wang, T., Wu, D.J., Coates, A., Ng, A.Y.: End-to-end text recognition with convolutional neural networks. In: International Conference on Pattern Recognition, pp. 3304–3308 (2012)

  48. Wang, X., Song, Y., Zhang, Y.: Natural scene text detection with multi-channel connected component segmentation. In: International Conference on Document Analysis and Recognition, pp. 1375–1379 (2013)

  49. Wolf, C., Jolion, J.M.: Object count/area graphs for the evaluation of object detection and segmentation algorithms. Int. J. Doc. Anal. Recognit. 8(4), 280–296 (2006)

    Article  Google Scholar 

  50. Xu-Cheng, Y., Xuwang, Y., Kaizhu, H., Hong-Wei, H.: Robust text detection in natural scene images. Pattern Anal. Mach. Intell. 36(5), 970–983 (2013)

    Article  Google Scholar 

  51. Yang, H., Quehl, B., Sack, H.: A framework for improved video text detection and recognition. Multimed. Tools Appl. 69(1), 217–245 (2014)

  52. Yao, C., Xiang, B., Wenyu, L., Yi, M., Zhuowan, T.: Detecting texts of arbitrary orientations in natural images. In: International Conference on Computer Vision, pp. 1083–1090 (2012)

  53. Yi, C., Tian, Y.: Assistive text reading from complex background for blind persons. In: International Workshop on Camera-Based Document Analysis and Recognition, pp. 15–28 (2011)

  54. Zagoris, K., Pratikakis, I.: Text detection in natural images using bio-inspired models. In: International Conference on Document Analysis and Recognition, pp. 1370–1374 (2013)

  55. Zhang, J., Chong, Y.: Text localization based on the discrete shearlet transform. In: ICSESS, pp. 262–266 (2013)

  56. Zhang, J., Kasturi, R.: Extraction of text objects in video documents: recent progress. In: International Workshop on Document Analysis Systems, pp. 5–17 (2008)

  57. Zhang, Y., Huang, K., Liu, C.: Fast and robust graph-based transductive learning via minimum tree cut. In: 11th IEEE International Conference on Data Mining, ICDM 2011, Vancouver, BC, Canada, December 11–14, 2011, pp. 952–961 (2011)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jonathan Fabrizio.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fabrizio, J., Robert-Seidowsky, M., Dubuisson, S. et al. TextCatcher: a method to detect curved and challenging text in natural scenes. IJDAR 19, 99–117 (2016). https://doi.org/10.1007/s10032-016-0264-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10032-016-0264-4

Keywords

Navigation