Advertisement

A Survey on Text Information Extraction from Born-Digital and Scene Text Images

  • S. P. Faustina JoanEmail author
  • S. Valli
Review Article
  • 163 Downloads

Abstract

Text information extraction (TIE) from images is an open research area because of its unsolved challenges with respect to the heterogeneity in image types, mode of image capture, position of text and the clarity of text information. Currently, the number of images captured using mobile phones is voluminous. The information from such images is capable of providing valuable input to the user as well to applications that depend on the image text information. Text is the pipeline of human communication and images containing text can aid the semantic understanding of the image. Types of image text are explored along with an introduction to TIE and its applications. Text detection is emphasized and an attempt to categorize the features used by text detection is made. With a brief discussion on the onset research works, the available datasets and performance metrics are listed out. A broad summary regarding the types of text detection methods and systems under them is presented. The paper concludes with existing challenges that pave the way for more active research.

Keywords

Text information extraction Text detection Survey ICDAR Sliding window MSER 

References

  1. 1.
    Jung K, Kim K, Jain AK (2004) Text information extraction in images and video: a survey. Pattern Recogn 37(5):977–997Google Scholar
  2. 2.
    Ye Q, Doermann D (2015) Text detection and recognition in imagery: a survey. IEEE Trans Pattern Anal Mach Intell 37(7):1480–1500Google Scholar
  3. 3.
    Mahesh G, Mittal R (2009) Digital content creation and copyright issues. Electron Libr 27(4):676–683Google Scholar
  4. 4.
    Merino-Gracia C, Mirmehdi M (2014) Real-time text tracking in natural scenes. IET Comput Vision 8(6):670–681Google Scholar
  5. 5.
    Xiong B, Grauman K (2016) Text detection in stores using a repetition prior. In: Proceedings of 2016 IEEE winter conference on applications of computer vision (WACV), vol 1, pp 1–9Google Scholar
  6. 6.
    Yi C, Tian Y, Arditi A (2014) Portable camera-based assistive text and product label reading from hand-held objects for blind persons. IEEE/ASME Trans Mechatron 19(3):808–817Google Scholar
  7. 7.
    Greenhalgh J, Mirmehdi M (2012) Real-time detection and recognition of road traffic signs. IEEE Trans Intell Transp Syst 13(4):1498–1506Google Scholar
  8. 8.
    Greenhalgh J, Mirmehdi M (2015) Recognizing text-based traffic signs. IEEE Trans Intell Transp Syst 16(3):1360–1369Google Scholar
  9. 9.
    Bhargava M, Dhote P, Srivastava A, Kumar A (2016) Speech enabled integrated AR-based multimodal language translation. In: Proceedings of conference on advances signal process (CASP), vol 1, pp 226–230Google Scholar
  10. 10.
    Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2016) Reading text in the wild with convolutional neural networks. Int J Comput Vis 116(1):1–20MathSciNetGoogle Scholar
  11. 11.
    Gattullo M, Uva AE, Fiorentino M, Gabbard JL (2015) Legibility in industrial AR: text style, color coding, and illuminance. IEEE Comput Graph Appl 35(2):52–61Google Scholar
  12. 12.
    Garcia C, Apostolidis X (2000) Text Detection and Segmentation in Complex Color Images. In: Proceedings of 2000 IEEE international conference on acoustics speech signal process, vol 6, pp 2326–2329Google Scholar
  13. 13.
    Wernicke A, Lienhart R (2000) On the segmentation of text in videos. In: Proceedings of 2000 IEEE international conference on multimedia and expo, vol 3, pp 1511–1514Google Scholar
  14. 14.
    Zhong Y, Karu K, Jain AK (1995) Locating text in complex color images. In: Proceedings of 3rd international conference document analysis and recognition, vol 1, pp 146–149Google Scholar
  15. 15.
    Lim J, Park J, Medioni GG (2007) Text segmentation in color images using tensor voting. Image Vis Comput 25(5):671–685Google Scholar
  16. 16.
    Nguyen T, Lee G (2012) Color image segmentation using tensor voting based color clustering. Pattern Recogn Lett 33(5):605–614Google Scholar
  17. 17.
    Mariano V, Kasturi R (2000) Locating uniform-colored text in video frames. In: Proceedings of 15th international conference on pattern recognition, vol 4, pp 539–542Google Scholar
  18. 18.
    Mancas-Thillou C, Gosselin B (2005) Color text extraction from camera-based images: the impact of the choice of the clustering distance. In: Proceedings of 8th international conference on document analysis and recognition, vol 1, pp 312–316Google Scholar
  19. 19.
    Canny J (1986) A computational approach to edge detection. IEEE Trans Pattern Anal Mach Intell 6:679–698Google Scholar
  20. 20.
    Kumar D, Ramakrishnan A (2012) OTCYMIST: Otsu-Canny minimal spanning tree for born-digital images. In: Proceedings of 10th IAPR international conference on document analysis systems, vol 1, pp 389–393Google Scholar
  21. 21.
    Duda R, Hart P (1973) Pattern classification and scene analysis. Wiley, New YorkzbMATHGoogle Scholar
  22. 22.
    Shivakumara P, Phan TQ, Tan CL (2011) A Laplacian approach to multi-oriented text detection in video. IEEE Trans Pattern Anal Mach Intell 33(2):412–419Google Scholar
  23. 23.
    Liu C, Wang C, Dai R (2005) Text detection in images based on unsupervised classification of edge-based features. In: Proceedings of 8th international conference on document analysis and recognition, vol 1, pp 610–614Google Scholar
  24. 24.
    Yu C, Song Y, Meng Q, Zhang Y, Liu Y (2015) Text detection and recognition in natural scene with edge analysis. IET Comput Vis 9(4):603–613Google Scholar
  25. 25.
    Liu X, Wang W (2012) Robustly extracting captions in videos based on stroke-like edges and spatio-temporal analysis. IEEE Trans Multimedia 14(2):482–489ADSMathSciNetGoogle Scholar
  26. 26.
    Sun H, Zhao N, Xu X (2006) Extraction of text under complex background using wavelet transform and SVM. In: Proceedings of 2006 international conference on mechatronics and automation, vol 1, pp 1493–1497Google Scholar
  27. 27.
    Ye Q, Huang Q (2004) A new text detection algorithm in images/video frames. In: Proceedings of 5th Pacific Rim conference on multimedia, vol 1, pp 858–865Google Scholar
  28. 28.
    Shivakumara P, Phan TQ, Tan CL (2010) New wavelet and color features for text detection in video. In: Proceedings 20th international conference on pattern recognition, vol 1, pp 3996–3999Google Scholar
  29. 29.
    Aradhya VNM, Pavithra MS, Naveena C (2012) A robust multilingual text detection approach based on transforms and wavelet entropy. Procedia Technol 4:232–237Google Scholar
  30. 30.
    Kim KI, Jung K, Kim JH (2003) Texture-based approach for text detection in images using SVM and continuously adaptive mean shift algorithm. IEEE Trans Pattern Anal Mach Intell 25(12):1631–1639Google Scholar
  31. 31.
    Zhu C, Wang W, Ning Q (2006) Text detection in images using texture feature from strokes. In: Proceedings of 7th Pacific Rim conference on multimedia, vol 1, pp 295–301Google Scholar
  32. 32.
    Ephstein B, Ofek E, Wexler Y (2010) Detecting text in natural scenes with stroke width transform. In: Proceedings of 23rd IEEE international conference on computer vision and pattern recognition, vol 1, pp 2963–2970Google Scholar
  33. 33.
    Zhang J, Rangachar K (2014) A novel text detection system based on character and link energies. IEEE Trans Image Process 23(9):4187–4198MathSciNetzbMATHGoogle Scholar
  34. 34.
    Huang W, Lin Z, Yang J, Wang J (2013) Text localization in natural images using stroke feature transform and text covariance descriptors. In: Proceedings of 2013 IEEE international conference on computer vision, vol 1, pp 1241–1248Google Scholar
  35. 35.
    Zhou G, Liu Y, Xu L, Jia Z (2015) Scene text detection method based on the hierarchical model. IET Comput Vis 9(4):500–510Google Scholar
  36. 36.
    Wang R, Sang N, Gao C (2015) Scene text identification by leveraging mid-level patches and context information. IEEE Signal Process Lett 22(7):963–967ADSGoogle Scholar
  37. 37.
    Fletcher L, Kasturi R (1998) A robust algorithm for text string separation from mixed text/graphics images. IEEE Trans Pattern Anal Mach Intell 10(6):910–918Google Scholar
  38. 38.
    Leng GW, Mital DP, Yong TS, Kang TK (1994) A differential-processing extraction approach to text and image segmentation. Eng Appl Artif Intell 7(6):639–651Google Scholar
  39. 39.
    Suen H-M, Wang J-F (1997) Segmentation of uniform-coloured text from colour graphics background. IEE Proc Vis Image Signal Process 144(6):317–322Google Scholar
  40. 40.
    Wu V, Manmatha R, Riseman EM (1999) Textfinder: an automatic system to detect and recognize text in images. IEEE Trans Pattern Anal Mach Intell 11:1224–1229Google Scholar
  41. 41.
    Messelodi S, Modena C (1996) Context driven text segmentation and recognition. Pattern Recogn Lett 17(1):47–56Google Scholar
  42. 42.
    Sawaki M, Hagita N (1998) Text-line extraction and character recognition of document headlines with graphical designs using complementary similarity measure. IEEE Trans Pattern Anal Mach Intell 10:1103–1109Google Scholar
  43. 43.
    Sobottka K, Bunke H, Kronenberg H (1999) Identification of text on colored book and journal covers. In: Proceedings of 5th international conference document analysis and recognition, vol 1, pp 57–62Google Scholar
  44. 44.
    Karatzas D, Bigorda LG, Nicolaou A, Ghosh S, Bagdanov A, Iwamura M (2015) ICDAR 2015 Competition on robust reading. In: Proceedings of 13th international conference document analysis and recognition, vol 1, pp 1156–1160Google Scholar
  45. 45.
    de Campos TE, Babu BR, Varma M (2009) Character recognition in natural images. In: Proceedings of 4th international conference on computer vision theory application, vol 1, pp 273–280Google Scholar
  46. 46.
    Kasturi R, Goldgof D, Soundararajan P, Manohar V, Garofolo J, Bowers R (2009) Framework for performance evaluation of face, text, and vehicle detection and tracking in video: data, metrics, and protocol. IEEE Trans Pattern Anal Mach Intell 31(2):319–336Google Scholar
  47. 47.
    Lee SH, Cho MS, Jung K, Kim JH (2010) Scene text extraction with edge constraint and text collinearity. In: Proceedings of 20th IEEE international conference on pattern recognition, vol 1, pp 3983–3986Google Scholar
  48. 48.
    Wang K, Belongie S (2010) Word spotting in the wild. Springer, BerlinGoogle Scholar
  49. 49.
    Mishra A, Alahari K, Jawahar CV (2012) Scene text recognition using higher order language priors. In: Proceedings of 23rd British machine vision conference, vol 1, pp 1–11Google Scholar
  50. 50.
    Yao C, Bai X, Liu W, Ma Y, Tu Z (2012) Detecting texts of arbitrary orientations in natural images. In: Proceedings of 2012 IEEE conference on computer vision and pattern recognition, vol 1, pp 1083–1090Google Scholar
  51. 51.
    Veit A, Matera T, Neumann L, Matas J, Belongie S (2016) COCO-text: dataset and benchmark for text detection and recognition in natural images. arXiv:1601.07140
  52. 52.
    Shin CS, Kim KI, Park MH, Kim HJ (2000) Support vector machine-based text detection in digital video. In: Proceedings of 2000 IEEE signal processing society workshop, vol 2, pp 634–641Google Scholar
  53. 53.
    Zhang XW, Zheng XB, Weng ZJ (2008) Text extraction algorithm under background image using wavelet transforms. In: International conference on wavelet analysis and pattern recognition, vol 1, pp 200–204Google Scholar
  54. 54.
    Li X, Wang W, Jiang S, Huang Q, Gao W (2008) Fast and effective text detection. In: Proceedings of 15th international conference on image process, vol 1, pp 969–972Google Scholar
  55. 55.
    Jung C, Liu Q, Kim J (2009) Accurate text localization in images based on SVM output scores. Image Vis Comput 27(9):1295–1301Google Scholar
  56. 56.
    Shivakumara P, Phan TQ, Tan CL (2010) New Fourier-statistical features in RGB space for video text detection. IEEE Trans Circuits Syst Video Technol 20(11):1520–1532Google Scholar
  57. 57.
    Lee JJ, Lee PH, Lee SW, Yuille A, Koch C (2011) Adaboost for text detection in natural scene. In: Proceedings of 2011 international conference document analysis and recognition, vol 1, pp 429–434Google Scholar
  58. 58.
    Hanif S, Prevost L (2009) Text detection and localization in complex scene images using constrained adaboost algorithm. In: Proceedings of 10th international conference document analysis and recognition, vol 1, pp 1–5Google Scholar
  59. 59.
    Coates A, Carpenter B, Case C, Satheesh S, Suresh B, Wang T (2011) Text detection and character recognition in scene images with unsupervised feature learning. In: Proceedings of 2011 international conference document analysis and recognition, vol 1, pp 440–445Google Scholar
  60. 60.
    Wang T, Wu DJ, Coates A, Ng AY (2012) End-to-end text recognition with convolutional neural networks. In: Proceedings of 21st international conference on pattern recognition, vol 1, pp 3304–3308Google Scholar
  61. 61.
    Bissacco A, Cummins M, Netzer Y, Neven H (2013) PhotoOCR: reading text in uncontrolled conditions. In: Proceedings of 2013 IEEE international conference on computer vision, vol 1, pp 785–792Google Scholar
  62. 62.
    Neumann L, Matas J (2013) Scene text localization and recognition with oriented stroke detection. In: Proceedings of 2013 IEEE international conference on computer vision, vol 1, pp 97–104Google Scholar
  63. 63.
    Jaderberg M, Vedaldi A, Zisserman A (2014) Deep features for text spotting. In: Proceedings of 13th European conference on computer vision, vol 1, pp 512–528Google Scholar
  64. 64.
    Zhang Z, Shen W, Yao C, Bai X (2015) Symmetry-based text line detection in natural scenes. In: Proceedings of 2015 IEEE conference on computer vision and pattern recognition, vol 1, pp 2558–2567Google Scholar
  65. 65.
    Xiong B, Grauman K (2016) Text detection in stores using a repetition prior. In: Proceedings of 2016 IEEE winter conference on applications of computer vision, vol 1, pp 1–9Google Scholar
  66. 66.
    Matas J, Chum O, Urban M, Pajdla T (2004) Robust wide-baseline stereo from maximally stable extremal regions. Image Vis Comput 22(10):761–767Google Scholar
  67. 67.
    Ye Q, Gao W, Wang W, Zeng W (2003) A robust text detection algorithm in images and video frames. In: Proceedings of 2003 joint conference on 4th international conference on information, communications and signal processing 4th Pacific Rim conference on multimedia, vol 2, pp 802–806Google Scholar
  68. 68.
    Chen X, Yang J, Zhang J (2004) Automatic detection and recognition of signs from natural scenes. IEEE Trans Image Process 13(1):87–99ADSGoogle Scholar
  69. 69.
    Anthimopoulos M, Gatos B, Pratikakis I (2008) A hybrid system for text detection in video frames. In: Proceedings of 8th IAPR international workshop on document analysis systems, vol 1, pp 286–292Google Scholar
  70. 70.
    Park J, Lee G (2008) A robust algorithm for text region detection in natural scene images. Can J Electr Comput Eng 33(3):215–222Google Scholar
  71. 71.
    Song Y, Liu A, Pang L, Lin S, Zhang Y, Tang S (2008) A novel image text extraction method based on K-means clustering. In: Proceedings of 7th IEEE/ACIS international conference on computer & information science, vol 1, pp 185–190Google Scholar
  72. 72.
    Neumann L, Matas J (2010) A method for text localization and recognition in real-world images. In: Proceedings of 10th Asian Conference on Computer Vision, vol 1, pp 770–783Google Scholar
  73. 73.
    Chen H, Tsai SS, Schroth G, Chen DM, Grzeszczuk R, Girod B (2011) Robust text detection in natural images with edge-enhanced maximally stable extremal regions. In: Proceedings of 18th international conference on image processing, vol 1, pp 2609–2612Google Scholar
  74. 74.
    Shivakumara P, Sreedhar RP, Phan TQ, Lu S, Tan CL (2012) Multioriented video scene text detection through bayesian classification and boundary growing. IEEE Trans Circuits Syst Video Technol 22(8):1227–1235Google Scholar
  75. 75.
    Shivakumara P, Phan TQ, Tan CL (2011) A Laplacian approach to multi-oriented text detection in video. IEEE Trans Pattern Anal Mach Intell 33(2):412–419Google Scholar
  76. 76.
    Koo H, Kim D (2013) Scene text detection via connected component clustering and nontext filtering. IEEE Trans Image Process 22(6):2296–2305ADSMathSciNetzbMATHGoogle Scholar
  77. 77.
    Yi C, Tian Y (2013) Text extraction from scene images by character appearance and structure modeling. Comput Vis Image Underst 117(2):182–194Google Scholar
  78. 78.
    Neumann L, Matas J (2011) Text localization in real-world images using efficiently pruned exhaustive search. In: Proceedings of 2011 international conference document analysis and recognition, vol 1, pp 687–691Google Scholar
  79. 79.
    Huang W, Qiao Y, Tang X (2014) Robust scene text detection with convolution neural network induced MSER trees. In: Proceedings of 13th European conference on computer vision, vol 1, pp 497–511Google Scholar
  80. 80.
    Joan SPF, Valli S (2016) An enhanced text detection technique for the visually impaired to read text. Inf Syst Front 19(5):1039–1056Google Scholar
  81. 81.
    Sun L, Huo Q, Jia W, Chen K (2015) A robust approach for text detection from natural scene images. Pattern Recogn 48(9):2906–2920Google Scholar
  82. 82.
    Zhu A, Gao R, Uchida S (2016) Could scene context be beneficial for scene text detection? Pattern Recogn 58:204–215Google Scholar
  83. 83.
    Pan Y-F, Hou X, Liu CL (2011) A hybrid approach to detect and localize texts in natural scene images. IEEE Trans Image Process 20(2):800–813ADSMathSciNetzbMATHGoogle Scholar
  84. 84.
    Yi C, Tian Y (2012) Localizing text in scene images by boundary clustering, stroke segmentation, and string fragment classification. IEEE Trans Image Process 21(9):4256–4268ADSMathSciNetzbMATHGoogle Scholar
  85. 85.
    Opitz M, Diem M, Fiel S, Kleber F, Sablatnig R (2014) End-to-end text recognition using local ternary patterns, MSER and deep convolutional nets. In: Proceedings of 11th IAPR international workshop document analysis systems, vol 1, pp 186–190Google Scholar
  86. 86.
    Anthimopoulos M, Gatos B, Pratikakis I (2013) Detection of artificial and scene text in images and video frames. Pattern Anal Appl 16(3):431–446MathSciNetGoogle Scholar
  87. 87.
    Weinman JJ, Butler Z, Knoll D, Feild J (2014) Toward integrated scene text reading. IEEE Trans Pattern Anal Mach Intell 36(2):375–387Google Scholar
  88. 88.
    Shi C-Z, Wang C-H, Xiao B-H, Gao S, Hu J-L (2014) Scene text recognition using structure-guided character detection and linguistic knowledge. IEEE Trans Circuits Syst Video Technol 24(7):1235–1250Google Scholar
  89. 89.
    Gao S, Wang C, Xiao B, Shi C, Zhou W, Zhang Z (2015) Scene text recognition by learning co-occurrence of strokes based on spatiality embedded dictionary. IET Comput Vis 9(1):138–148Google Scholar
  90. 90.
    Tehsin S, Masood A, Kausar S, Javed Y (2013) Text localization and detection method for born-digital images. IETE J Res 59(4):343–349Google Scholar
  91. 91.
    Yi C, Tian Y, Arditi A (2014) Portable camera-based assistive text and product label reading from hand-held objects for blind persons. IEEE/ASME Trans Mechatron 19(3):808–817Google Scholar
  92. 92.
    Yin XC, Pei WY, Zhang J, Hao HW (2015) Multi-orientation scene text detection with adaptive clustering. IEEE Trans Pattern Anal Mach Intell 37(9):1930–1937Google Scholar
  93. 93.
    Zhu S (2016) Text detection in natural scenes and technical diagrams with convolutional feature learning and cascaded classification. Dissertation, Rochester Institute of TechnologyGoogle Scholar
  94. 94.
    Huang W (2016) Context modeling for semantic text matching and scene text detection. Dissertation, The Pennsylvania State UniversityGoogle Scholar
  95. 95.
    Liang G, Shivakumara P, Lu T, Tan CL (2015) Multi-spectral fusion based approach for arbitrarily oriented scene text detection in video images. IEEE Trans Image Process 24(11):4488–4501ADSMathSciNetzbMATHGoogle Scholar
  96. 96.
    Yang H, Wu S, Deng C, Lin W (2015) Scale and orientation invariant text segmentation for born-digital compound images. IEEE Trans Cybern 45(3):533–547Google Scholar
  97. 97.
    Zhang Z, Zhang C, Shen W, Yao C, Liu W, Bai X (2016) Multi-oriented text detection with fully convolutional networks. In: Proceedings of IEEE conference on computer vision and pattern recognition, vol 1, pp 4159–4167Google Scholar
  98. 98.
    Dey S, Shivakumara P, Raghunandan KS, Pal U, Lu T, Kumar GH, Chan CC (2017) Script independent approach for multi-oriented text detection in scene image. Neurocomputing 242:96–112Google Scholar
  99. 99.
    Khare V, Shivakumara P, Paramesran R, Blumenstein M (2016) Arbitrarily-oriented multi-lingual text detection in video. Multimed Tools Appl 1:1–31Google Scholar
  100. 100.
    González A, Bergasa LM, Yebes JJ (2014) Text detection and recognition on traffic panels from street-level imagery using visual appearance. IEEE Trans Intell Transp Syst 15(1):228–238Google Scholar

Copyright information

© The National Academy of Sciences, India 2018

Authors and Affiliations

  1. 1.Department of Computer Science and Engineering, College of Engineering, GuindyAnna UniversityChennaiIndia

Personalised recommendations