Pattern Analysis and Applications

, Volume 16, Issue 4, pp 519–533 | Cite as

Text detection in street level images

  • Jonathan FabrizioEmail author
  • Beatriz Marcotegui
  • Matthieu Cord
Theoretical Advances


Text detection system for natural images is a very challenging task in Computer Vision. Image acquisition introduces distortion in terms of perspective, blurring, illumination, and characters which may have very different shape, size, and color. We introduce in this article a full text detection scheme. Our architecture is based on a new process to combine a hypothesis generation step to get potential boxes of text and a hypothesis validation step to filter false detections. The hypothesis generation process relies on a new efficient segmentation method based on a morphological operator. Regions are then filtered and classified using shape descriptors based on Fourier, Pseudo Zernike moments and an original polar descriptor, which is invariant to rotation. Classification process relies on three SVM classifiers combined in a late fusion scheme. Detected characters are finally grouped to generate our text box hypotheses. Validation step is based on a global SVM classification of the box content using dedicated descriptors adapted from the HOG approach. Results on the well-known ICDAR database are reported showing that our method is competitive. Evaluation protocol and metrics are deeply discussed and results on a very challenging street-level database are also proposed.


Text detection Text segmentation TMMS Toggle mapping Image classification 



This work is funded by ANR, ITOWNS project 07-MDCO-007-03 [1, 22].


  1. 1.
    The french national research agency (anr).
  2. 2.
    Arth C, Limberger F, Bischof H (2007) Real-time license plate recognition on an embedded DSP-platform. IEEE International Conference on Computer Vision and Pattern Recognition (CVPR ’07) pp 1–8Google Scholar
  3. 3.
    Beucher S (2007) Numerical residues. Image Vis Comput 25(4):405–415. doi: 10.1016/j.imavis.2006.07.020
  4. 4.
    Breen EJ, Jones R (1996) Attribute openings, thinnings, and granulometries. Comput Vis Image Underst 64(3):377–389CrossRefGoogle Scholar
  5. 5.
    Chehdi K, Coquin D (1991) Binarisation d’images par seuillage local optimal maximisant un critre d’homognite. GRETSIGoogle Scholar
  6. 6.
    Chen D, Odobez J, Thiran J (2004) A localization/verification scheme for finding text in images and video frames based on contrast independent features and machine learning method. Image Commun 19(3):205–217Google Scholar
  7. 7.
    Chen X, Yuille AL (2004) Detecting and reading text in natural scenes. IEEE Computer Society Conference on Computer Vision and Pattern Recognition 2:366–373. doi: 10.1109/CVPR.2004.77
  8. 8.
    Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297zbMATHGoogle Scholar
  9. 9.
    Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: IEEE CVPR IEEE Computer Society, pp 886–893Google Scholar
  10. 10.
    Ezaki N, Bulacu M, Schomaker L (2004) Text detection from natural scene images: Towards a system for visually impaired persons. In: 17th International conference on pattern recognition, vol 2, pp 683–686Google Scholar
  11. 11.
    Fabrizio J, Cord M, Marcotegui B (2009) Text extraction from street level images isprs workshop cmrt. ISPRS WorkshopGoogle Scholar
  12. 12.
    Fabrizio J, Marcotegui B (2009) Fast implementation of the ultimate opening. International symposium on mathematical morphology. pp 272–281Google Scholar
  13. 13.
    Fabrizio J, Marcotegui B, Cord M (2009) Text segmentation in natural scenes using toggle-mapping. 2009 IEEE International Conference on Image ProcessingGoogle Scholar
  14. 14.
    Garcia WC, Apostolidis X (2000) Text detection and segmentation in complex color images. In: IEEE International Conference on Acoustic Speech, Signal Processing, pp 2326–2329. IEEE Computer SocietyGoogle Scholar
  15. 15.
    Gatos B, Ntirogiannis K, Pratikakis I (2009) Icdar 2009 document image binarization contest (dibco 2009). International Conference on Document Analysis and RecognitionGoogle Scholar
  16. 16.
    Gatos B, Ntirogiannis K, Pratikakis I (2010) DIBCO 2009: document image binarization contest. Int J Doc Anal Recognit. doi: 10.1007/s10032-010-0115-7
  17. 17.
    Gosselin P, Cord M (2008) Active learning methods for interactive image retrieval. IEEE Trans Image Process 17(7):1200–1211MathSciNetCrossRefGoogle Scholar
  18. 18.
    Hanif SM, Prevost L (2007) Texture based text detection in natural scene images—a help to blind and visually impaired persons. In: Conference on assistive technologies for people with vision and hearing impairmentsGoogle Scholar
  19. 19.
    ICDAR: Robust reading and locating database (2003).
  20. 20.
    Institut géographique national (ign).
  21. 21.
    Imageval (2006).
  22. 22.
    Anr itowns project.
  23. 23.
    Joachims: svmlight.
  24. 24.
    Joachims T (1999) Making large-scale svm learning practical. Advances in kernel methods: support vector learning. pp 169–184Google Scholar
  25. 25.
    Jung C, Liu Q, Kim J (2009) A stroke filter and its application to text localization. Pattern Recogn Lett 30(2):114–122. doi: 10.1016/j.patrec.2008.05.014 Google Scholar
  26. 26.
    Jung K, Kim K, Jain A (2004) Text information extraction in images and video: a survey. Pattern Recogn Lett 37(5):977–997CrossRefGoogle Scholar
  27. 27.
    Kavallieratou E, Balcan D, Popa M, Fakotakis N (2001) Handwritten text localization in skewed documents. In: International conference on image processing, pp. I: 1102–1105Google Scholar
  28. 28.
    Kuncheva L (2004) Combining pattern classifiers. methods and algorithms. Wiley, HobokenGoogle Scholar
  29. 29.
    Liang J, Doermann D, Li H (2005) Camera-based analysis of text and documents: a survey. Int J Doc Anal Recogn 7(2–3):83 – 104Google Scholar
  30. 30.
    Lienhart R, Effelsberg W (2000) Automatic text segmentation and text recognition for video indexing. Multim Syst 8(1):69–81. doi: 10.1007/s005300050006 Google Scholar
  31. 31.
    Liu Q, Jung C, Kim S, Moon Y, yeun Kim J (2006) Stroke filter for text localization in video images. IEEE international conference on image processingGoogle Scholar
  32. 32.
    Liu X, Samarabandu J (2006) Multiscale edge based text extraction from complex images. In: International conference on multimedia expo, pp 1721–1724Google Scholar
  33. 33.
    Lucas S (2005) Icdar 2005 text locating competition results. Eight international conference on document analysis and recognitionGoogle Scholar
  34. 34.
    Mancas-Thillou C (2006) Natural scene text understanding. Ph.D. thesis, TCTS Lab of the Facult Polytechnique de Mons, BelgiumGoogle Scholar
  35. 35.
    Niblack W (1986) An introduction to image processing. Prentice-Hall, Englewood CliffsGoogle Scholar
  36. 36.
    Otsu N (1979) A threshold selection method from gray level histogram. IEEE Trans Syst Man Cybern 9:62–66CrossRefGoogle Scholar
  37. 37.
    Palumbo PW, Srihari SN, Soh J, Sridhar R, Demjanenko V (1992) Postal address block location in real time. Computer 25(7):34–42. doi: 10.1109/2.144438 Google Scholar
  38. 38.
    Pan W, Bui TD, Suen CY (2009) Text detection from natural scene images using topographic maps and sparse representations. In: IEEE ICIP. IEEE Computer SocietyGoogle Scholar
  39. 39.
    Pazio M, Niedwiecki M, Kowalik R, Lebied J (2007) Text detection system for the blind. 15th european signal processing conference EUSIPCO, pp 272–276Google Scholar
  40. 40.
    Retornaz T (2007) Détection de textes enfouis dans des bases d’images généralistes. un descripteur sémantique pour l’indexation. Ph.D. thesis, Ecole Nationale Suprieure des Mines de Paris—C.M.M., FontainebleauGoogle Scholar
  41. 41.
    Retornaz T, Marcotegui B (2007) Scene text localization based on the ultimate opening. Int Symp Math Morphol 1:177–188Google Scholar
  42. 42.
    Sauvola J, Inen MP (2000) Adaptive document image binarization. Pattern Recogn Lett 33:225–236CrossRefGoogle Scholar
  43. 43.
    Sauvola JJ, Seppänen T, Haapakoski S, Pietikäinen M (1997) Adaptive document binarization. In: ICDAR ’97: Proceedings of the 4th International Conference on Document Analysis and Recognition, pp 147–152. IEEE Computer Society, Washington, DCGoogle Scholar
  44. 44.
    Seeger M, Dance C (2001) Binarising camera images for ocr. Proceeding of sixth international conference on document analysis and recognition (ICDAR)Google Scholar
  45. 45.
    Serra J (1989) Toggle mappings. From pixels to features. In: Simon JC (ed), Elsevier, North-Holland. pp 61–72Google Scholar
  46. 46.
    Sezgin M, Sankur B (2004) Survey over image thresholding techniques and quantitative performance evaluation. J Electron Imaging 13(1):146–165CrossRefGoogle Scholar
  47. 47.
    Shafait F, Keysers D, Breuel TM (2008) Efficient implementation of local adaptive thresholding techniques using integral images. In: Document Recognition and Retrieval XV. San JoseGoogle Scholar
  48. 48.
    Szumilas L (2008) Scale and rotation invariant shape matching. Ph.D. thesis, Technische universität wien fakultät für informatikGoogle Scholar
  49. 49.
    Trier OD, Jain AK, Taxt T (1996) Feature extraction methods for character recognition-a survey. Pattern Recogn 29(4):641–662. doi: 10.1016/0031-3203(95)00118-2
  50. 50.
    Viola P, Jones M (2001) Robust real-time object detection. Int J Comput VisGoogle Scholar
  51. 51.
    Wahl F, Wong K, Casey R (1982) Block segmentation and text extraction in mixed text/image documents. Comput Graph Image Process 20(4):375–390CrossRefGoogle Scholar
  52. 52.
    Wolf C, michel Jolion J, Chassaing F (2002) Text localization, enhancement and binarization in multimedia documents. In: Proceedings of the international conference on pattern recognition (ICPR) 2002, pp 1037–1040Google Scholar
  53. 53.
    Wolf C, Jolion JM (2006) Object count/area graphs for the evaluation of object detection and segmentation algorithms. IJDAR 8(4):280–296CrossRefGoogle Scholar
  54. 54.
    Xiao Y, Yan H (2003) Text region extraction in a document image based on the delaunay tessellation. Pattern Recogn Lett 36(3):799–809MathSciNetCrossRefzbMATHGoogle Scholar
  55. 55.
    Zhao XK, Lin YF, Hu Y Liu YTH (2011) Text from corners: a novel approach to detect text and caption in videos. IEEE Trans Image Process 20(3):790–799Google Scholar
  56. 56.
    Zhu KF Qi, RJ, Xu L, Kimachi M, Wu Y, Aziwa T (2005) Using adaboost to detect and segment characters from natural scenes. In: Proceedings of CBDAR, ICDAR WorkshopGoogle Scholar

Copyright information

© Springer-Verlag London 2013

Authors and Affiliations

  • Jonathan Fabrizio
    • 1
    Email author
  • Beatriz Marcotegui
    • 2
  • Matthieu Cord
    • 3
  1. 1.LRDE-EPITA LabLe Kremlin Bicetre CedexFrance
  2. 2.Mines ParisTech, CMM - Centre de Morphologie Mathématique Mathématiques et SystèmesFontainebleau-CEDEXFrance
  3. 3.UPMC-Sorbonne Universités, LIP6 LabParisFrance

Personalised recommendations