Skip to main content

Scene Text Detection Based on Robust Stroke Width Transform and Deep Belief Network

  • Conference paper
  • First Online:
Computer Vision -- ACCV 2014 (ACCV 2014)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9004))

Included in the following conference series:

Abstract

Text detection in natural scene images is an open and challenging problem due to the significant variations of the appearance of the text itself and its interaction with the context. In this paper, we present a novel text detection method combining two main ingredients: the robust extension of Stroke Width Transform (SWT) and the Deep Belief Network (DBN) based discrimination of text objects from other scene components. In the former, smoothness-based edge information is combined with gradient for generating high quality edge images, and various edge cues are exploited in Connected Component (CC) analysis on basis of SWT to eliminate inter-character and intra-character errors. In the latter, DBN is exploited for learning efficient representations discriminating character and non-character CCs, resulting in the improved detection accuracy. The proposed method is evaluated on ICDAR and SVT public datasets and achieves the state-of-the-art results, which reveal the effectiveness of the method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://en.wikipedia.org/wiki/Color_difference.

References

  1. Lucas, S.M., Panaretos, A., Sosa, L., Tang, A., Wong, S., Young, R.: ICDAR 2003 robust reading competitions. In: ICDAR, pp. 682–687 (2003)

    Google Scholar 

  2. Lucas, S.M.: ICDAR 2005 text locating competition results. In: ICDAR, pp. 80–84 (2005)

    Google Scholar 

  3. Shahab, A., Shafait, F., Dengel, A.: ICDAR 2011 robust reading competition challenge 2: reading text in scene images. In: ICDAR, pp. 1491–1496 (2011)

    Google Scholar 

  4. Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., Heras, L.P.: ICDAR 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013)

    Google Scholar 

  5. Epsthtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: CVPR, pp. 2963–2970 (2010)

    Google Scholar 

  6. Chen, X., Yuille, A.L.: Detecting and reading text in natural scenes. In: CVPR, pp. 366–373 (2004)

    Google Scholar 

  7. Neumann, L., Matas, J.: Real-time scene text localization and recognition. In: CVPR, pp. 3538–3545 (2012)

    Google Scholar 

  8. Mosleh, A., Bouguila, N.: Image text detection using a bandlet-based edge detector and stroke width transform. In: BMVC, pp. 1–12 (2012)

    Google Scholar 

  9. Wang, X.B., Song, Y.H., Zhang, Y.L.: Natural scene text detection with multi-channel connected component segmentation. In: ICDAR, pp. 1375–1379 (2013)

    Google Scholar 

  10. Neumann, L., Matas, J.: Scene text localization and recognition with oriented stroke detection. In: ICCV, pp. 97–104 (2013)

    Google Scholar 

  11. Mishra, A., Alahari, K., Jawahar, C.V.: Top-down and bottom-up cues for scene text recognition. In: CVPR, pp. 2687–2694 (2012)

    Google Scholar 

  12. Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: ICCV, pp. 1457–1464 (2011)

    Google Scholar 

  13. Neumann, L., Matas, J.: A method for text localization and recognition in real-world images. In: Kimmel, R., Klette, R., Sugimoto, A. (eds.) ACCV 2010, Part III. LNCS, vol. 6494, pp. 770–783. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  14. Yi, C., Tian, Y.: Text detection in natural scene images by stroke gabor words. In: ICDAR, pp. 177–181 (2011)

    Google Scholar 

  15. Koo, H.I., Kim, D.H.: Scene text detection via connected component clustering and nontext filtering. IEEE TIP 22, 2296–2305 (2013)

    MathSciNet  Google Scholar 

  16. Minetto, R., Thome, N., Cord, M., Stolfi, J., Precioso, F., Guyomard, J., Leite, N.: Text detection and recognition in Urban scenes. In: ICCVW, pp. 227–234 (2011)

    Google Scholar 

  17. Zhang, J., Kasturi, R.: A novel text detection system based on character and link energies. IEEE Trans. Image Process. 23, 4187–4198 (2014)

    MathSciNet  Google Scholar 

  18. Hinton, G.E., Osindero, S., Teh, Y.: A fast learning algorithm for deep belief nets. Neural Comput. 18, 1527–1554 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  19. Wang, K., Belongie, S.: Word spotting in the wild. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part I. LNCS, vol. 6311, pp. 591–604. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  20. Palm, R.B.: Prediction as a candidate for learning deep hierarchical models of data. Master’s thesis, Technical University of Denmark (2012)

    Google Scholar 

  21. Yin, X., Yin, X., Huang, K., Hao, H.: Robust text detection in natural scene images. IEEE Trans. PAMI 36, 970–983 (2014)

    Article  Google Scholar 

Download references

Acknowledgement

Research supported by the National Science Foundation of China under Grant Nos. 61003113, 61272218 and 61321491.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Feng Su .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Xu, H., Xue, L., Su, F. (2015). Scene Text Detection Based on Robust Stroke Width Transform and Deep Belief Network. In: Cremers, D., Reid, I., Saito, H., Yang, MH. (eds) Computer Vision -- ACCV 2014. ACCV 2014. Lecture Notes in Computer Science(), vol 9004. Springer, Cham. https://doi.org/10.1007/978-3-319-16808-1_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-16808-1_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-16807-4

  • Online ISBN: 978-3-319-16808-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics