Advertisement

Text Localization and Recognition in Images and Video

  • Seiichi Uchida
Reference work entry

Abstract

This chapter reviews techniques on text localization and recognition in scene images captured by camera. Since properties of scene texts are very different from scanned documents in various aspects, specific techniques are necessary to localize and recognize them. In fact, localization of scene text is a difficult and important task because there is no prior information on the location, layout, direction, size, typeface, and color of texts in a scene image in general and there are many textures and patterns similar to characters. In addition, recognition of scene text is also a difficult task because there are many characters distorted by blurring, perspective, nonuniform lighting, and low resolution. Decoration of characters makes the recognition task far more difficult. As reviewed in this chapter, those difficult tasks have been tackled with not only modified versions of conventional OCR techniques but also state-of-the-art computer vision and pattern recognition methodologies.

Keywords

Scene text detection Scene text recognition Text image acquisition Textlocalization Text/non-text discrimination Video caption detection Video captionrecognition 

References

  1. 1.
    Impedovo S, Modugno R, Ferrante A, Stasolla E (2009) New trends in digital scanning processes. In: International conference on document analysis and recognition (ICDAR2009), Barcelona, pp 1071–1075Google Scholar
  2. 2.
    Jung K, Kim KI, Jain AK (2004) Text information extraction in images and video: a survey. Pattern Recognit 37:977–997CrossRefGoogle Scholar
  3. 3.
    Liang J, Doermann D, Li H (2005) Camera-Based analysis of text and documents: a survey. Int J Doc Anal Recognit 7(2–3):84–104CrossRefGoogle Scholar
  4. 4.
    Anagnostopoulos CNE, Anagnostopoulos IE, Loumos V, Kayafas E (2006) A license plate-recognition algorithm for intelligent transportation system applications. IEEE Trans Intell Transp Syst 7(3):377–391CrossRefGoogle Scholar
  5. 5.
    Frome A, Cheung G, Abdulkader A, Zennaro M, Wu B, Bissacco A, Adam H, Neven H, Vincent L (2009) Large-scale privacy protection in Google Street View. In: International conference on computer vision (ICCV2009), pp 2373–2380Google Scholar
  6. 6.
    Newman W, Dance C, Taylor A, Taylor S, Taylor M, Aldhous T (1999) CamWorks: a video-based tool for efficient capture from paper source documents. In: International conference on multimedia computing and systems (ICMCS1999), Florence, pp 647–653Google Scholar
  7. 7.
    Pollard S, Pilu M (2005) Building cameras for capturing documents. Int J Doc Anal Recognit 7(2–3):123–137CrossRefGoogle Scholar
  8. 8.
    Shafait F, Cutter MP, van Beusekom J, Bukhari SS, Breuel TM (2011) Decapod: a flexible, low cost digitization solution for small and medium archives. In: International workshop on camera-based document analysis and recognition (CBDAR2011), Beijing, pp 41–46Google Scholar
  9. 9.
    Nakai T, Kise K, Iwamura M (2005) Hashing with local combinations of feature points and its application to camera-based document image retrieval – retrieval in 0.14 second from 10,000 pages. In: International workshop on camera-based document analysis and recognition (CBDAR2005), Seoul, pp 87–94Google Scholar
  10. 10.
    Liu X, Doermann D (2008) Mobile retriever: access to digital documents from their physical source. Int J Doc Anal Recognit 11(1):19–27CrossRefGoogle Scholar
  11. 11.
    Okada Y, Takeda T, Kim Y-B, Watanabe Y (1998) Translation camera. In: International conference on pattern recognition (ICPR1998), Brisbane, pp 613–617Google Scholar
  12. 12.
    Gao J, Yang J (2001) An adaptive algorithm for text detection from natural scenes. In: IEEE computer society conference on vision and pattern recognition (CVPR2001), Kauai, vol 2, pp 84–89Google Scholar
  13. 13.
    Haritaoglu I (2001) Scene text extraction and translation for handheld devices. In: IEEE computer society conference on computer vision and pattern recognition (CVPR 2001), Kauai, vol 2, pp 408–413Google Scholar
  14. 14.
    Yokomizo K, Sono K, Watanabe Y, Okada Y (2003) Translation camera on mobile phone. In: International conference on multimedia and expo (ICME2003), Baltimore, pp 177–180Google Scholar
  15. 15.
    Chen X, Yang J, Zhang J, Waibel A (2004) Automatic detection and recognition of signs from natural scenes. IEEE Trans Image Process 13(1):87–99CrossRefGoogle Scholar
  16. 16.
    Ezaki N, Bulacu M, Schomaker L (2004) Text detection from natural scene images: towards a system for visually impaired persons. In: International conference on pattern recognition (ICPR2004), Cambridge, vol 2, pp 683–686Google Scholar
  17. 17.
    Yi C, Tian Y (2011) Assistive text reading from complex background for blind persons. In: International workshop on camera-based document analysis and recognition (CBDAR2011), Beijing, pp 21–26Google Scholar
  18. 18.
    Liu Y, Yamamura T, Tanaka T, Ohnishi N (1999) Character-Based mobile robot navigation. In: International conference on intelligent robots and systems (IROS1999), Kyongju, Korea vol 2, pp 610–616Google Scholar
  19. 19.
    Wienecke M, Fink GA, Sagerer G (2005) Toward automatic video-based whiteboard reading. Int J Doc Anal Recognit 7(2–3):188–200CrossRefGoogle Scholar
  20. 20.
    Munich ME, Peronaa P (1996) Visual input for pen-based computers. In: International conference on pattern recognition (ICPR1996), Vienna, pp 33–37Google Scholar
  21. 21.
    Iwata K, Kise K, Iwamura M, Uchida S, Omachi S (2010) Tracking and retrieval of pen tip positions for an intelligent camera pen. In: International conference on frontiers in handwriting recognition (ICFHR2010), Kolkata, pp 277–283Google Scholar
  22. 22.
    Liu Q, Liao C (2011) PaperUI. In: International workshop on camera-based document analysis and recognition (CBDAR2011), Beijing, pp 3–11Google Scholar
  23. 23.
    Zhang J, Kasturi R (2008) Extraction of text objects in video documents: recent progress. In: International workshop on document analysis systems (DAS2008), Nara, pp 5–17Google Scholar
  24. 24.
    Sato T, Kanade T, Hughes EK, Smith MA (1998) Video OCR for digital news archives. In: IEEE international workshop on content-based access of image and video database, Bombay, pp 52–60Google Scholar
  25. 25.
    Chen D, Odobez J-M, Bourlard H (2004) Text detection and recognition in images and video frames. Pattern Recognit 37(3):595–608CrossRefGoogle Scholar
  26. 26.
    Kim W, Kim C (2009) A new approach for overlay text detection and extraction from complex video scene. IEEE Trans Image Process 18(2):401–411CrossRefMathSciNetGoogle Scholar
  27. 27.
    Shivakumara P, Huang W, Phan TQ, Tan CL (2010) Accurate video text detection through classification of low and high contrast images. Pattern Recognit 43(6):2165–2185CrossRefGoogle Scholar
  28. 28.
    Smeaton AF, Over P, Kraaij W (2006) Evaluation campaigns and TRECVid. In: ACM international workshop on multimedia information retrieval (MIR2006), Santa Barbara, pp 321–330Google Scholar
  29. 29.
    Zhang D, Chang S-F (2002) Event detection in baseball video using superimposed caption recognition. In: ACM international conference on multimedia (MULTIMEDIA ’02), Juan-les-Pins, pp 315–318Google Scholar
  30. 30.
    Bertini M, Del Bimbo A, Nunziati W (2006) Automatic detection of player’s identity in soccer videos using faces and text cues. In: ACM international conference on multimedia (MULTIMEDIA ’06), Santa Barbara, pp 663–666Google Scholar
  31. 31.
    Wang J, Duan L, Li Z, Liu J, Lu H, Jin JS (2006) Robust method for TV logo tracking in video streams. In: IEEE international conference on multimedia and expo (ICME2006), Toronto, pp 1041–1044Google Scholar
  32. 32.
    Özay N, Sankur B (2009) Automatic TV logo detection and classification in broadcast videos. In: European signal processing conference (EUSIPCO2009), Glasgow, Scotland, pp 839–843Google Scholar
  33. 33.
    Shahab A, Shafait F, Dengel A (2011) Bayesian approach to photo time-stamp recognition. In: International conference on document analysis and recognition (ICDAR2011), Beijing, pp 1039–1043Google Scholar
  34. 34.
    Li H, Doermann D (1999) Text enhancement in digital video using multiple frame integration. In: ACM international conference on multimedia (Part 1) (MULTIMEDIA ’99), Orlando, pp 19–22Google Scholar
  35. 35.
    Li H, Doermann D (2000) Superresolution-Based enhancement of text in digital video. In: International conference on pattern recognition (ICPR2000), Barcelona, vol 1, pp 847–850Google Scholar
  36. 36.
    Mancas-Thillou C, Mirmehdi M (2005) Super-Resolution text using the Teager filter. In: International workshop on camera-based document analysis and recognition (CBDAR2005), Seoul, Korea, pp 10–16Google Scholar
  37. 37.
    Capel D, Zisserman A (2000) A super-resolution enhancement of text image sequences. In: International conference on pattern recognition (ICPR2000), Barcelona, vol 1, pp 600–605Google Scholar
  38. 38.
    Irani M, Peleg S (1991) Improving resolution by image registration. Graph Models Image Process 53(3):231–239CrossRefGoogle Scholar
  39. 39.
    Donaldson K, Myers GK (2005) Bayesian super-resolution of text in video with a text-specific bimodal prior. Int J Doc Anal Recognit 7(2–3):159–167CrossRefGoogle Scholar
  40. 40.
    Banerjee J, Jawahar CV (2008) Super-resolution of text images using edge-directed tangent field. In: International workshop on document analysis systems (DAS2008), Nara, pp 76–83Google Scholar
  41. 41.
    Bayarsaikhan B, Kwon Y, Kim JH (2008) Anisotropic total variation method for text image super-resolution. In: International workshop on document analysis systems (DAS2008), Nara, pp 473–479Google Scholar
  42. 42.
    Zappalá A, Gee A, Taylor M (1999) Document mosaicking. Image Vis Comput 17: 589–595CrossRefGoogle Scholar
  43. 43.
    Sato T, Ikeda S, Kanbara M, Iketani A, Nakajima N, Yokoya N, Yamada K (2004) High-resolution video mosaicing for documents and photos by estimating camera motion. Proc SPIE Electron Imaging 5299:246–253CrossRefGoogle Scholar
  44. 44.
    Baker S, Kanade T (2002) Limits on super-resolution and how to break them. IEEE Trans Pattern Anal Mach Intell 24(9):1167–1183CrossRefGoogle Scholar
  45. 45.
    Dalley G, Freeman B, Marks J (2004) Single-frame text super-resolution: a Bayesian approach. In: International conference on image processing (ICIP2004), Singapore, vol 5, pp 3295–3298Google Scholar
  46. 46.
    Freeman WT, Jones TR, Pasztor EC (2002) Example-Based super-resolution. IEEE Comput Graph Appl 22(2):56–65CrossRefGoogle Scholar
  47. 47.
    Park J, Kwon Y, Kim JH (2005) An example-based prior model for text image super-resolution. In: International conference on document analysis and recognition (ICDAR2005), Seoul, pp 374–378Google Scholar
  48. 48.
    Jacobs C, Simard PY, Viola P, Rinker J (2005) Text recognition of low-resolution document images. In: International conference on document analysis and recognition (ICDAR2005), Seoul, vol 2, pp 695–699Google Scholar
  49. 49.
    Taylor MJ, Dance CR (1998) Enhancement of document images from cameras. Proc SPIE 3305:230–241CrossRefGoogle Scholar
  50. 50.
    Tian Y, Ming W (2009) Adaptive deblurring for camera-based document image processing. Lect Notes Comput Sci 5876:767–777CrossRefGoogle Scholar
  51. 51.
    Kanungo T, Zheng Q (2004) Estimating degradation model parameters using neighborhood pattern distributions: an optimization approach. IEEE Trans Pattern Anal Mach Intell 26(4):520–524CrossRefGoogle Scholar
  52. 52.
    Chen X, He X, Yang J, Wu Q (2011) An effective document image deblurring algorithm. In: IEEE computer society conference on computer vision and pattern recognition (CVPR2011), Colorado Springs, pp 369–376Google Scholar
  53. 53.
    Qi XY, Zhang L, Tan CL (2005) Motion deblurring for optical character recognition. In: International conference on document analysis and recognition (ICDAR2005), Seoul, pp 389–393Google Scholar
  54. 54.
    Clark P, Mirmehdi M (2000) Location and recovery of text on oriented surfaces. SPIE Conf Doc Recognit Retr VII 3967:267–277Google Scholar
  55. 55.
    Clark P, Mirmehdi M (2001) Estimating the orientation and recovery of text planes in a single image. In: British machine vision conference (BMVC2001), Manchester, pp 421–430Google Scholar
  56. 56.
    Dance CR (2002) Perspective estimation for document images. SPIE Doc Recognit IX 20–25Google Scholar
  57. 57.
    Myers GK, Bolles RC, Luong Q-T, Herson JA, Aradhye HB (2004) Rectification and recognition of text in 3-D scenes. Int J Doc Anal Recognit 7(2–3):147–158Google Scholar
  58. 58.
    Yamaguchi T, Maruyama M, Miyao H, Nakano Y (2004) Digit recognition in a natural scene with skew and slant normalization. Int J Doc Anal Recognit 7(2–3):168–177Google Scholar
  59. 59.
    Lu S, Chen BM, Ko CC (2005) Perspective rectification of document images using fuzzy set and morphological operations. Image Vis Comput 23(5):541–553CrossRefGoogle Scholar
  60. 60.
    Liang J, DeMenthon D, Doermann D (2008) Geometric rectification of camera-captured document images. IEEE Trans Pattern Anal Mach Intell 30(4):591–605CrossRefGoogle Scholar
  61. 61.
    Yin X-C, Sun J, Naoi S, Fujimoto K, Takebe H, Fujii Y, Kurokawa K (2007) A multi-stage strategy to perspective rectification for mobile phone camera-based document images. In: International conference on document analysis and recognition (ICDAR2007), Curitiba, pp 574–578Google Scholar
  62. 62.
    Uchida S, Sakai M, Iwamura M, Omachi S, Kise K (2008) Skew estimation by instances. In: International workshop on document analysis systems (DAS2008), Nara, pp 201–208Google Scholar
  63. 63.
    Shiraishi S, Feng Y, Uchida S (2012) A part-based skew estimation method. In: International workshop on document analysis systems (DAS2012), Gold Coast, pp 185–189Google Scholar
  64. 64.
    Lu S, Tan CL (2007) Automatic detection of document script and orientation. In: International conference on document analysis and recognition (ICDAR2007), Curitiba, pp 237–241Google Scholar
  65. 65.
    Koo HI, Kim J, Cho NI (2009) Composition of a dewarped and enhanced document image from two view images. IEEE Trans Image Process 18(7):1551–1562CrossRefMathSciNetGoogle Scholar
  66. 66.
    Lucas SM et al (2005) ICDAR 2003 robust reading competitions: entries, results and future directions. Int J Doc Anal Recognit 7:105–122CrossRefGoogle Scholar
  67. 67.
    Lucas SM (2005) ICDAR 2005 text locating competition results. In: International conference on document analysis and recognition (ICDAR2005), Seoul, pp 80–84Google Scholar
  68. 68.
    Shahab A, Shafait F, Dengel A (2011) ICDAR 2011 robust reading competition challenge 2: reading text in scene images. In: International conference on document analysis and recognition (ICDAR2011), Beijing, pp 1491–1496Google Scholar
  69. 69.
    Shivakumara P, Huang W, Phan TQ, Tan CL (2010) Accurate video text detection through classification of low and high contrast images. Pattern Recognit 43(6):2165–2185CrossRefGoogle Scholar
  70. 70.
    Ma Y, Liu W, Bai X, Yao C, Tu Z (2012) Detecting texts of arbitrary orientations in natural images. In: IEEE conference on computer vision and pattern recognition (CVPR2012), Providence, pp 1083–1090Google Scholar
  71. 71.
    Ezaki N, Kiyota K, Minh BT, Bulacu M, Schomaker L (2005) Improved text-detection methods for a camera-based text reading system for blind persons. In: International conference on document analysis and recognition (ICDAR2005), Seoul, pp 257–261Google Scholar
  72. 72.
    Kuwano H, Taniguchi Y, Arai H, Mori M, Kurakake S, Kojima H (2000) Telop-on-demand: video structuring and retrieval based on text recognition. In: IEEE international conference on multimedia and expo (ICME 2000), New York, vol 2, pp 759–762Google Scholar
  73. 73.
    Sin B-K, Kim S-K, Cho B-J (2002) Locating characters in scene images using frequency features. In: International conference on pattern recognition (ICPR2002), Quebec City, vol 3Google Scholar
  74. 74.
    Liu X, Samarabandu J (2006) Multiscale edge-based text extraction from complex images. In: IEEE international conference on multimedia and expo (ICME2006), Toronto, pp 1721–1724Google Scholar
  75. 75.
    Bertini M, Colombo C, Del Bimbo A (2001) Automatic caption localization in videos using salient points. In: IEEE international conference on multimedia and expo (ICME’01), Tokyo, pp 69–72Google Scholar
  76. 76.
    Huang X, Ma H (2010) Automatic detection and localization of natural scene text in video. In: International conference on pattern recognition (ICPR2010), Istanbul, pp 3216–3219Google Scholar
  77. 77.
    Zhao X, Lin K-H, Fu Y, Hu Y, Liu Y, Huang TS (2011) Text from corners: a novel approach to detect text and caption in videos. IEEE Trans Image Process 20(3):790–799CrossRefMathSciNetGoogle Scholar
  78. 78.
    Wolf C, Jolion J-M, Chassaing F (2002) Text localization, enhancement and binarization in multimedia documents. In: International conference on pattern recognition (ICPR2002), Quebec City, vol 2, pp 1037–1040Google Scholar
  79. 79.
    Wong EK, Chen M (2003) A new robust algorithm for video text extraction. Pattern Recognit 36(6):1397–1406CrossRefzbMATHGoogle Scholar
  80. 80.
    Kim W, Kim C (2009) A new approach for overlay text detection and extraction from complex video scene. IEEE Trans Image Process 18(2):401–411CrossRefMathSciNetGoogle Scholar
  81. 81.
    Phan TQ, Shivakumara P, Tan CL (2009) A Laplacian method for video text detection. In: International conference on document analysis and recognition (ICDAR 2009), Barcelona, pp 66–70Google Scholar
  82. 82.
    Wang K, Belongie S (2010) Word spotting in the wild. In: European conference on computer vision (ECCV2010), Heraklion, pp 591–604Google Scholar
  83. 83.
    Uchida S, Shigeyoshi Y, Kunishige Y, Feng Y (2011) A keypoint-based approach toward scenery character detection. In: International conference on document analysis and recognition (ICDAR 2011), Beijing, pp 819–823Google Scholar
  84. 84.
    Mishra A, Alahari K, Jawahar CV (2012) Top-down and bottom-up cues for scene text recognition. In: IEEE conference on computer vision and pattern recognition (CVPR2012), Providence, pp 2687–2694Google Scholar
  85. 85.
    Anthimopoulos M, Gatos B, Pratikakis I (2010) A two-stage scheme for text detection in video images. Image Vis Comput 28(9):1413–1426CrossRefGoogle Scholar
  86. 86.
    Chaddha N, Sharma R, Agrawal A, Gupta A (1994) Text segmentation in mixed-mode images. In: Asilomar conference on signals, systems and computers, Pacific Grove, vol 2, pp 1356–1361Google Scholar
  87. 87.
    Zhong Y, Zhang H, Jain AK (2000) Automatic caption localization in compressed video. IEEE Trans Pattern Anal Mach Intell 22(4):385–392CrossRefGoogle Scholar
  88. 88.
    Goto H (2008) Redefining the DCT-based feature for scene text detection. Analysis and comparison of spatial frequency-based features. Int J Doc Anal Recognit 11(1):1–8MathSciNetGoogle Scholar
  89. 89.
    Jain AK, Bhattacharjee S (1992) Text segmentation using Gabor filters for automatic document processing. Mach Vis Appl 5(3):169–184CrossRefGoogle Scholar
  90. 90.
    Saoi T, Goto H, Kobayashi H (2005) Text detection in color scene images based on unsupervised clustering of multi-channel wavelet features. In: International conference on document analysis and recognition (ICDAR 2005), Seoul, pp 690–694Google Scholar
  91. 91.
    Kumar S, Gupta R, Khanna N, Chaudhury S, Joshi SD (2007) Text extraction and document image segmentation using matched wavelets and MRF model. IEEE Trans Image Process 16(8):2117–2128CrossRefMathSciNetGoogle Scholar
  92. 92.
    Weinman JJ, Learned-Miller E, Hanson AR (2009) Scene text recognition using similarity and a lexicon with sparse belief propagation. IEEE Trans Pattern Anal Mach Intell 31(10): 1733–1746CrossRefGoogle Scholar
  93. 93.
    Kim KC, Byun HR, Song YJ, Choi YW, Chi SY, Kim KK, Chung YK (2004) Scene text extraction in natural scene images using hierarchical feature combining and verification. In: International conference on pattern recognition (ICPR2004), Cambridge, vol 2, pp 679–682Google Scholar
  94. 94.
    Chen X, Yang J, Zhang J, Waibel A (2004) Automatic detection and recognition of signs from natural scenes. IEEE Trans Image Process 13(1):87–99CrossRefGoogle Scholar
  95. 95.
    Pan Y-F, Hou X, Liu C-L (2008) A robust system to detect and localize texts in natural scene images. In: International workshop on document analysis systems (DAS2008), Nara, pp 35–42Google Scholar
  96. 96.
    Peng X, Cao H, Prasad R, Natarajan P (2011) Text extraction from video using conditional random fields. In: International conference on document analysis and recognition (ICDAR2011), Beijing, pp 1029–1033Google Scholar
  97. 97.
    Pan W, Bui TD, Suen CY (2008) Text detection from scene images using sparse representation. In: International conference on pattern recognition (ICPR2008), Tampa, pp 1–5Google Scholar
  98. 98.
    Coates A, Carpenter B, Case C, Satheesh S, Suresh B, Wang T, Wu DJ, Ng AY (2011) Text detection and character recognition in scene images with unsupervised feature learning. In: International conference on document analysis and recognition (ICDAR2011), Beijing, pp 440–445Google Scholar
  99. 99.
    Yi C, Tian Y (2011) Text detection in natural scene images by stroke Gabor words. In: International conference on document analysis and recognition (ICDAR2011), Beijing, pp 177–181Google Scholar
  100. 100.
    Shahab A, Shafait F, Dengel A (2011) Bayesian approach to photo time-stamp recognition. In: International conference on document analysis and recognition (ICDAR2011), Beijing, pp 1039–1043Google Scholar
  101. 101.
    Shahab A, Shafait F, Dengel A, Uchida S (2012) How salient is scene text. In: International workshop on document analysis systems (DAS2012), Gold Coast, pp 317–321Google Scholar
  102. 102.
    Gandhi T, Kasturi R, Antani S (2000) Application of planar motion segmentation for scene text extraction. In: International conference on pattern recognition (ICPR2000), Barcelona, vol 1, pp 1445–1449Google Scholar
  103. 103.
    Park S-B, Oh K-J, Kim H-N, Jo G-S (2008) Automatic subtitles localization through speaker identification in multimedia system. In: IEEE international workshop on semantic computing and applications (IWSCA2008), Incheon, Korea, pp 166–172Google Scholar
  104. 104.
    Kunishige Y, Feng Y, Uchida S (2011) Scenery character detection with environmental context. In: International conference on document analysis and recognition (ICDAR2011), BeijingGoogle Scholar
  105. 105.
    Bhattacharya U, Parui SK, S Mondal (2009) Devanagari and Bangla text extraction from natural scene images. In: International conference on document analysis and recognition (ICDAR2009), Barcelona, pp 171–175Google Scholar
  106. 106.
    Kim E, Lee SH, Kim JH (2009) Scene text extraction using focus of mobile camera. In: International conference on document analysis and recognition (ICDAR 2009), Barcelona, pp 166–170Google Scholar
  107. 107.
    Rother C, Kolmogorov V, Blake A (2004) GrabCut: interactive foreground extraction using iterated graph cuts. In: Proceedings of the SIGGRAPH, Los Angeles, pp 309–314Google Scholar
  108. 108.
    Messelodi S, Modena CM (1999) Automatic identification and skew estimation of text lines in real scene images. Pattern Recognit 32:791–810CrossRefGoogle Scholar
  109. 109.
    Gatos B, Pratikakis I, Perantonis SJ (2005) Text detection in indoor/outdoor scene images. In: International workshop on camera-based document analysis and recognition (CBDAR2005), Seoul, Korea, pp 127–132Google Scholar
  110. 110.
    Chen X, Yuille AL (2004) Detecting and reading text in natural scenes. In: IEEE conference on computer vision and pattern recognition (CVPR2004), Washington, DC, vol 2, pp 366–373Google Scholar
  111. 111.
    Hu S, Chen M (2005) Adaptive Frechet kernel based support vector machine for text detection. In: International conference on acoustics, speech, and signal processing (ICASSP2005), Philadelphia, vol 5, pp 365–368Google Scholar
  112. 112.
    Xu L, Nagayoshi H, Sako H (2008) Kanji character detection from complex real scene images based on character properties. In: International workshop on document analysis systems (DAS2008), Nara, pp 278–285Google Scholar
  113. 113.
    Hanif SM, Prevost L (2009) Text detection and localization in complex scene images using constrained AdaBoost algorithm. In: International conference on document analysis and recognition (ICDAR2009), Barcelona, pp 1–5Google Scholar
  114. 114.
    Ohya J, Shio A, Akamatsu S (1994) Recognizing characters in scene images. IEEE Trans Pattern Anal Mach Intell 16(2):214–220CrossRefGoogle Scholar
  115. 115.
    Kusachi Y, Suzuki A, Ito N, Arakawa K (2004) Kanji recognition in scene images without detection of text fields – robust against variation of viewpoint, contrast, and background texture. In: International conference on pattern recognition (ICPR2004), Cambridge, vol 1, pp 457–460Google Scholar
  116. 116.
    Antani S, Crandall D, Kasturi R (2000) Robust extraction of text in video. In: International conference on pattern recognition (ICPR2000), Barcelona, vol 3, pp 831–834Google Scholar
  117. 117.
    Ren X, Malik J (2003) Learning a classification model for segmentation. In: International conference on computer vision (ICCV2003), Nice, vol 1, pp 10–17Google Scholar
  118. 118.
    Epshtein B, Ofek E, Wexler Y (2010) Detecting text in natural scenes with stroke width transform. In: IEEE conference on computer vision and pattern recognition (CVPR2010), San Francisco, pp 2963–2970Google Scholar
  119. 119.
    Matas J, Chum O, Urban M, Pajdla T (2004) Robust wide-baseline stereo from maximally stable extremal regions. Image Vis Comput 22:761–767CrossRefGoogle Scholar
  120. 120.
    Cho MS, Seok J-H, Lee S, Kim JH (2011) Scene text extraction by superpixel CRFs combining multiple character features. In: International conference on document analysis and recognition (ICDAR2011), Beijing, pp 1034–1038Google Scholar
  121. 121.
    Wang X, Ding X, Liu C (2001) Character extraction and recognition in natural scene images. In: International conference on document analysis and recognition (ICDAR2001), Seattle, pp 1084–1088Google Scholar
  122. 122.
    Wang K, Kangas JA (2003) Character location in scene images from digital camera. Pattern Recognit 36(10):2287–2299CrossRefzbMATHGoogle Scholar
  123. 123.
    Mancas-Thillou C, Gosselin B (2007) Color text extraction with selective metric-based clustering. Comput Vis Image Underst 107(1–2):97–107CrossRefGoogle Scholar
  124. 124.
    Liu Q, Jung C, Moon Y (2006) Text segmentation based on stroke filter. In: Annual ACM international conference on multimedia (MULTIMEDIA 2006), Santa Barbara, pp 129–132Google Scholar
  125. 125.
    Donoser M, Arth C, Bischof H (2007) Detecting, tracking and recognizing license plates. In: Asian conference on computer vision (ACCV2007), Tokyo, vol II, pp 447–456Google Scholar
  126. 126.
    Neumann L, Matas J (2010) A method for text localization and recognition in real-world images. In: Asian conference on computer vision (ACCV2010), Queenstown, vol III, pp 770–783Google Scholar
  127. 127.
    Merino-Gracia C, Lenc K, Mirmehdi M (2011) A head-mounted device for recognizing text in natural scenes. In: International workshop on camera-based document analysis and recognition (CBDAR2011), Beijing, pp 27–32Google Scholar
  128. 128.
    Neumann L, Matas J (2012) Real-time scene text localization and recognition. In: IEEE conference on computer vision and pattern recognition (CVPR2012), Providence, pp 3538–3545Google Scholar
  129. 129.
    Li C, Ding X, Wu Y (2001) Automatic text location in natural scene images. In: International conference on document analysis and recognition (ICDAR 2001), Seattle, pp 1069–1073Google Scholar
  130. 130.
    Huang R, Oba S, Palaiahnakote S, Uchida S (2012) Scene character detection and recognition based on multiple hypotheses framework. In: International conference on pattern recognition (ICPR2012), Tsukuba, pp 717–720Google Scholar
  131. 131.
    Goto H, Aso H (2001) Character pattern extraction from documents with complex backgrounds. Int J Doc Anal Recognit 4(4):258–268CrossRefGoogle Scholar
  132. 132.
    Zhang D-Q, Chang S-F (2000) Learning to detect scene text using a higher-order MRF with belief propagation. In: Conference on computer vision and pattern recognition workshop (CVPRW2004), Washington, DC, pp 101–108Google Scholar
  133. 133.
    Hase H, Shinokawa T, Yoneda M, Suen CY (2001) Character string extraction from color documents. Pattern Recognit 34(7):1349–1365CrossRefzbMATHGoogle Scholar
  134. 134.
    Iwamura M, Tsuji T, Kise K (2010) Memory-based recognition of camera-captured characters. In: International workshop on document analysis systems (DAS2010), Boston, pp 89–96Google Scholar
  135. 135.
    Pan P, Zhu Y, Sun J, Naoi S (2011) Recognizing characters with severe perspective distortion using hash tables and perspective invariants. In: International conference on document analysis and recognition (ICDAR 2011), Beijing, pp 548–552Google Scholar
  136. 136.
    Uchida S, Sakoe H (2005) A survey of elastic matching techniques for handwritten character recognition. IEICE Trans Inf Syst E88-D(8):1781–1790CrossRefGoogle Scholar
  137. 137.
    Omachi S, Inoue M, Aso H (2001) Structure extraction from decorated characters using multiscale images. IEEE Trans Pattern Anal Mach Intell 23(3):315–322CrossRefGoogle Scholar
  138. 138.
    Mancas-Thillou C, Mancas M (2007) Comparison between pen-scanner and digital camera acquisition for engraved character recognition. In: International workshop on camera-based document analysis and recognition (CBDAR 2007), Curitiba, Brazil, pp 130–137Google Scholar
  139. 139.
    Uchida S, Miyazaki H, Sakoe H (2008) Mosaicing-by-recognition for video-based text recognition. Pattern Recognit 41(4):1230–1240CrossRefzbMATHGoogle Scholar
  140. 140.
    Uchida S, Sakai M, Iwamura M, Omachi S, Kise K (2007) Extraction of embedded class information from universal character pattern. In: International conference on document analysis and recognition (ICDAR 2007), Curitiba, vol 1, pp 437–441Google Scholar
  141. 141.
    Sawaki M, Murase H, Hagita N (2000) Automatic acquisition of context-based image templates for degraded character recognition in scene images. In: International conference on pattern recognition (ICPR2000), Barcelona, vol 4, pp 15–18Google Scholar
  142. 142.
    Saidane Z, Garcia C (2007) Automatic scene text recognition using a convolutional neural network. In: International workshop on camera-based document analysis and recognition (CBDAR 2007), Curitiba, Brazil, pp 100–106Google Scholar
  143. 143.
    Netzer Y, Wang T, Coates A, Bissacco A, Wu B, Ng AY (2011) Reading digits in natural images with unsupervised feature learning. In: NIPS workshop on deep learning and unsupervised feature learning, Granada, Spain.Google Scholar
  144. 144.
    de Camposm TE, Babu BR, Varma M (2009) Character recognition in natural images. In: International conference on computer vision theory and applications (VISAPP2009), Lisboa, pp 273–280Google Scholar
  145. 145.
    Lu S, Tan CL (2006) Camera text recognition based on perspective invariants. In: International conference on pattern recognition (ICPR 2006), Hong Kong, vol 2, pp 1042–1045Google Scholar
  146. 146.
    Li L, Tan CL (2010) Recognizing planar symbols with severe perspective deformation. IEEE Trans Pattern Anal Mach Intell 32(4):755–762CrossRefGoogle Scholar
  147. 147.
    Yokobayashi M, Wakahara T (2006) Binarization and recognition of degraded characters using a maximum separability axis in color space and GAT correlation. In: International conference on pattern recognition (ICPR 2006), Hong Kong, vol 2, pp 885–888Google Scholar
  148. 148.
    Beaufort R, Mancas-Thillou C (2007) A weighted finite-state framework for correcting errors in natural scene OCR. In: International conference on document analysis and recognition (ICDAR 2007), Curitiba, vol 2, pp 889–893Google Scholar

Copyright information

© Springer-Verlag London 2014

Authors and Affiliations

  1. 1.Department of Advanced Information TechnologyKyushu UniversityNishi-ku, FukuokaJapan

Personalised recommendations