Scene Text Detection Based on Robust Stroke Width Transform and Deep Belief Network

  • Hailiang Xu
  • Like Xue
  • Feng SuEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9004)


Text detection in natural scene images is an open and challenging problem due to the significant variations of the appearance of the text itself and its interaction with the context. In this paper, we present a novel text detection method combining two main ingredients: the robust extension of Stroke Width Transform (SWT) and the Deep Belief Network (DBN) based discrimination of text objects from other scene components. In the former, smoothness-based edge information is combined with gradient for generating high quality edge images, and various edge cues are exploited in Connected Component (CC) analysis on basis of SWT to eliminate inter-character and intra-character errors. In the latter, DBN is exploited for learning efficient representations discriminating character and non-character CCs, resulting in the improved detection accuracy. The proposed method is evaluated on ICDAR and SVT public datasets and achieves the state-of-the-art results, which reveal the effectiveness of the method.



Research supported by the National Science Foundation of China under Grant Nos. 61003113, 61272218 and 61321491.


  1. 1.
    Lucas, S.M., Panaretos, A., Sosa, L., Tang, A., Wong, S., Young, R.: ICDAR 2003 robust reading competitions. In: ICDAR, pp. 682–687 (2003)Google Scholar
  2. 2.
    Lucas, S.M.: ICDAR 2005 text locating competition results. In: ICDAR, pp. 80–84 (2005)Google Scholar
  3. 3.
    Shahab, A., Shafait, F., Dengel, A.: ICDAR 2011 robust reading competition challenge 2: reading text in scene images. In: ICDAR, pp. 1491–1496 (2011)Google Scholar
  4. 4.
    Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., Heras, L.P.: ICDAR 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013)Google Scholar
  5. 5.
    Epsthtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: CVPR, pp. 2963–2970 (2010)Google Scholar
  6. 6.
    Chen, X., Yuille, A.L.: Detecting and reading text in natural scenes. In: CVPR, pp. 366–373 (2004)Google Scholar
  7. 7.
    Neumann, L., Matas, J.: Real-time scene text localization and recognition. In: CVPR, pp. 3538–3545 (2012)Google Scholar
  8. 8.
    Mosleh, A., Bouguila, N.: Image text detection using a bandlet-based edge detector and stroke width transform. In: BMVC, pp. 1–12 (2012)Google Scholar
  9. 9.
    Wang, X.B., Song, Y.H., Zhang, Y.L.: Natural scene text detection with multi-channel connected component segmentation. In: ICDAR, pp. 1375–1379 (2013)Google Scholar
  10. 10.
    Neumann, L., Matas, J.: Scene text localization and recognition with oriented stroke detection. In: ICCV, pp. 97–104 (2013)Google Scholar
  11. 11.
    Mishra, A., Alahari, K., Jawahar, C.V.: Top-down and bottom-up cues for scene text recognition. In: CVPR, pp. 2687–2694 (2012)Google Scholar
  12. 12.
    Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: ICCV, pp. 1457–1464 (2011)Google Scholar
  13. 13.
    Neumann, L., Matas, J.: A method for text localization and recognition in real-world images. In: Kimmel, R., Klette, R., Sugimoto, A. (eds.) ACCV 2010, Part III. LNCS, vol. 6494, pp. 770–783. Springer, Heidelberg (2011) CrossRefGoogle Scholar
  14. 14.
    Yi, C., Tian, Y.: Text detection in natural scene images by stroke gabor words. In: ICDAR, pp. 177–181 (2011)Google Scholar
  15. 15.
    Koo, H.I., Kim, D.H.: Scene text detection via connected component clustering and nontext filtering. IEEE TIP 22, 2296–2305 (2013)MathSciNetGoogle Scholar
  16. 16.
    Minetto, R., Thome, N., Cord, M., Stolfi, J., Precioso, F., Guyomard, J., Leite, N.: Text detection and recognition in Urban scenes. In: ICCVW, pp. 227–234 (2011)Google Scholar
  17. 17.
    Zhang, J., Kasturi, R.: A novel text detection system based on character and link energies. IEEE Trans. Image Process. 23, 4187–4198 (2014)MathSciNetGoogle Scholar
  18. 18.
    Hinton, G.E., Osindero, S., Teh, Y.: A fast learning algorithm for deep belief nets. Neural Comput. 18, 1527–1554 (2006)CrossRefzbMATHMathSciNetGoogle Scholar
  19. 19.
    Wang, K., Belongie, S.: Word spotting in the wild. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part I. LNCS, vol. 6311, pp. 591–604. Springer, Heidelberg (2010) CrossRefGoogle Scholar
  20. 20.
    Palm, R.B.: Prediction as a candidate for learning deep hierarchical models of data. Master’s thesis, Technical University of Denmark (2012)Google Scholar
  21. 21.
    Yin, X., Yin, X., Huang, K., Hao, H.: Robust text detection in natural scene images. IEEE Trans. PAMI 36, 970–983 (2014)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.State Key Laboratory for Novel Software TechnologyNanjing UniversityNanjingChina

Personalised recommendations