Skip to main content
Log in

An evidence-based model of saliency feature extraction for scene text analysis

  • Original Paper
  • Published:
International Journal on Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Abstract

Saliency text is that characters are ordered with visibility and expressivity. It also contains important clues for video analysis, indexing, and retrieval. Thus, in order to localize the saliency text, a critical stage is to collect key points from real text pixels. In this paper, we propose an evidence-based model of saliency feature extraction (SFE) to probe saliency text points (STPs), which have strong text signal structure in multi-observations simultaneously and always appear between text and its background. Through the multi-observations, each signal structure with rhythms of signal segments is extracted at every location in the visual field. It supports source of evidences for our evidence-based model, where evidences are measured to effectively estimate the degrees of plausibility for obtaining the STP. The evaluation results on benchmark datasets also demonstrate that our proposed approach achieves the state-of-the-art performance on exploring real text pixels and significantly outperforms the existing algorithms for detecting text candidates. The STPs can be the extremely reliable text candidates for future text detectors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19

Similar content being viewed by others

References

  1. Wei, Y.C., Lin, C.H.: A robust video text detection approach using SVM. Expert Syst. Appl. 39(12), 10832–10840 (2012). doi:10.1016/j.eswa.2012.03.010

    Article  Google Scholar 

  2. Reffle, U., Ringlstetter, C.: Unsupervised profiling of OCRed historical documents. Pattern Recognit. 46(5), 1346–1357 (2013). doi:10.1016/j.patcog.2012.10.002

    Article  Google Scholar 

  3. Minetto, R., Thome, N., Cord, M., Leite, N.J., Stolfi, J.: T-HOG: an effective gradient-based descriptor for single line text regions. Pattern Recognit. 46(3), 1078–1090 (2013). doi:10.1016/j.patcog.2012.10.009

    Article  Google Scholar 

  4. Shi, C.Z., Wang, C.H., Xiao, B.H., Zhang, Y., Gao, S.: Scene text detection using graph model built upon maximally stable extremal regions. Pattern Recognit. Lett. 34(2), 107–116 (2013). doi:10.1016/j.patrec.2012.09.019

    Article  Google Scholar 

  5. Fabrizio, J., Marcotegui, B., Cord, M.: Text detection in street level images. Pattern Anal. Appl. (2013). doi:10.1007/s10044-013-0329-7

    MathSciNet  Google Scholar 

  6. Shivakumara, P., Dutta, A., Tan, C.L., Pal, U.: Multi-oriented scene text detection in video based on wavelet and angle projection boundary growing. Multimed. Tools Appl. (2013). doi:10.1007/s11042-013-1385-0

  7. Yi, C., Tian, Y.: Text extraction from scene images by character appearance and structure modeling. Comput. Vis. Image Underst. 117(2), 182–194 (2013). doi:10.1016/j.cviu.2012.11.002

    Article  Google Scholar 

  8. Shahab, A, Shafait, F, Dengel, A, Uchida, S.: How salient is scene text? Paper presented at the 10th IAPR International Workshop on Document Analysis Systems (2012)

  9. Gao, R, Shafait, F, Uchida, S, Feng, Y.L: A hierarchical visual saliency model for character detection in natural scenes. Paper presented at the Fifth International Workshop on Camera-Based Document Analysis and Recognition (2013)

  10. Gao, R., Uchida, S., Shahab, A., Shafait, F., Frinken, V.: Visual saliency models for text detection in real world. PLoS One 9(12), e114539 (2014). doi:10.1371/journal.pone.0114539

    Article  Google Scholar 

  11. Koo, H.I., Kim, D.H.: Scene text detection via connected component clustering and nontext filtering. IEEE Trans. Image Process. 22, 2296–2305 (2013)

    Article  MathSciNet  Google Scholar 

  12. Chen, X, Yuille, A.L.: Detecting and reading text in natural scenes. Paper presented at the Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2004)

  13. Yi, C., Tian, Y.: Localizing text in scene images by boundary clustering, stroke segmentation, and string fragment classification. IEEE Trans. Image Process. 21(9), 4256–4268 (2012). doi:10.1109/TIP.2012.2199327

    Article  MathSciNet  Google Scholar 

  14. Shivakumara, P., Sreedhar, R.P., Phan, T.Q., Lu, S.J., Tan, C.L.: Multioriented video scene text detection through Bayesian classification and boundary growing. IEEE Trans. Circuits Syst. Video Technol. 22(8), 1227–1235 (2012). doi:10.1109/Tcsvt.2012.2198129

    Article  Google Scholar 

  15. Moradi, M., Mozaffari, S.: Hybrid approach for Farsi/Arabic text detection and localisation in video frames. IET Image Process. 7(2), 154–164 (2013). doi:10.1049/iet-ipr.2012.0441

    Article  Google Scholar 

  16. Shivakumara, P., Phan, T.Q., Tan, C.L.: A Laplacian approach to multi-oriented text detection in video. IEEE Trans. Pattern Anal. Mach. Intell. 33(2), 412–419 (2011). doi:10.1109/TPAMI.2010.166

    Article  Google Scholar 

  17. MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. Paper presented at the Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability (1967)

  18. Crandall, D., Antani, S., Kasturi, R.: Extraction of special effects caption text events from digital video. Int. J. Doc. Anal. Recognit. 5, 138–157 (2003)

    Article  Google Scholar 

  19. Kasturi, R., Goldgof, D., Soundararajan, P., Manohar, V., Garofolo, J., Bowers, R., Boonstra, M., Korzhova, V., Zhang, J.: Framework for performance evaluation of face, text, and vehicle detection and tracking in video: data, metrics, and protocol. IEEE Trans. Pattern Anal. Mach. Intell. 31(2), 319–336 (2009). doi:10.1109/TPAMI.2008.57

    Article  Google Scholar 

  20. Aradhya, V.N.M., Pavithra, M.S., Naveena, C.: A robust multilingual text detection approach based on transforms and wavelet entropy. Paper presented at the 2nd International Conference on Computer, Communication, Control and Information Technology (2012)

  21. Anthimopoulos, M., Gatos, B., Pratikakis, I.: A two-stage scheme for text detection in video images. Image Vis. Comput. 28(9), 1413–1426 (2010). doi:10.1016/j.imavis.2010.03.004

    Article  Google Scholar 

  22. Canny, J.: A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell. 8(6), 79–98 (1986)

    Google Scholar 

  23. Wang, F., Ngo, C.-W., Pong, T.-C.: Structuring low-quality videotaped lectures for cross-reference browsing by video text analysis. Pattern Recognit. 41(10), 3257–3269 (2008). doi:10.1016/j.patcog.2008.03.024

    Article  Google Scholar 

  24. Mtimet, J., Amiri, H.: A layer-based segmentation method for compound images. Paper presented at the 10th International Multi-Conference on Systems, Signals & Devices (SSD) (2013)

  25. Khosravi, H., Kabir, E.: Farsi font recognition based on Sobel-Roberts features. Pattern Recognit. Lett. 31(1), 75–82 (2010). doi:10.1016/j.patrec.2009.09.002

    Article  Google Scholar 

  26. Shivakumara, P., Huang, W.H., Phan, T.Q., Tan, C.L.: Accurate video text detection through classification of low and high contrast images. Pattern Recognit. 43(6), 2165–2185 (2010). doi:10.1016/j.patcog.2010.01.009

    Article  Google Scholar 

  27. Park, J., Lee, G., Kim, E., Lim, J., Kim, S., Yang, H., Lee, M., Hwang, S.: Automatic detection and recognition of Korean text in outdoor signboard images. Pattern Recognit. Lett. 31(12), 1728–1739 (2010). doi:10.1016/j.patrec.2010.05.024

    Article  Google Scholar 

  28. Choi, S., Yun, J.P., Koo, K., Kim, S.W.: Localizing slab identification numbers in factory scene images. Expert Syst. Appl. 39(9), 7621–7636 (2012). doi:10.1016/j.eswa.2012.01.124

    Article  Google Scholar 

  29. Shivakumara, P., Phan, T.Q., Tan, C.L.: New Fourier-statistical features in RGB space for video text detection. IEEE Trans. Circuits Syst. Video Technol. 20(11), 1520–1532 (2010). doi:10.1109/Tcsvt.2010.2077772

    Article  Google Scholar 

  30. Kim, W., Kim, C.: A new approach for overlay text detection and extraction from complex video scene. IEEE Trans. Image Process 18(2), 401–411 (2009). doi:10.1109/TIP.2008.2008225

    Article  MathSciNet  Google Scholar 

  31. Zhao, X., Lin, K.H., Fu, Y., Hu, Y.X., Liu, Y.C., Huang, T.S.: Text from corners: a novel approach to detect text and caption in videos. IEEE Trans. Image Process. 20(3), 790–799 (2011). doi:10.1109/Tip.2010.2068553

    Article  MathSciNet  Google Scholar 

  32. Harris, C., Stephens, M.: A combined corner and edge detector. Paper presented at the Alvey Vision Conference (1988)

  33. Schmid, C., Mohr, R., Bauckhage, C.: Evaluation of interest point detectors. Int. J. Comput. Vis. 37(2), 151–172 (2000)

    Article  MATH  Google Scholar 

  34. Li, Z., Liu, G., Qian, X., Guo, D., Jiang, H.: Effective and efficient video text extraction using key text points. IET Image Process. 5(8), 671–683 (2011). doi:10.1049/iet-ipr.2010.0397

    Article  MathSciNet  Google Scholar 

  35. Shafer, G.: A Mathematical Theory of Evidence. Princeton University Press, Princeton (1976)

    MATH  Google Scholar 

  36. Klir, G.J., Yuan, B.: Fuzzy Sets and Fuzzy Logic: Theory and Applications. Prentice Hall, Upper Saddle River (1995)

    MATH  Google Scholar 

  37. Wang, L.X.: A Course in Fuzzy Systems and Control. Prentice-Hall, Upper Saddle River (1997)

    MATH  Google Scholar 

  38. Guan, J.W., Bell, D.A.: Evidence Theory and Its Applications. North-Holland, New York (1991)

    MATH  Google Scholar 

  39. Lee, K.C., Song, H.J., Sohn, K.H.: Detection-estimation based approach for impulsive noise removal. Electron. Lett. 34(5), 449–450 (1998). doi:10.1049/El:19980369

    Article  Google Scholar 

  40. Lin, T.-C., Yu, P.-T.: Adaptive two-pass median filter based on support vector machines for image restoration. Neural Comput. 16(2), 332–353 (2004). doi:10.1162/089976604322742056

    Article  MATH  Google Scholar 

  41. Wong, E.K., Chen, M.: A new robust algorithm for video text extraction. Pattern Recognit. 36(6), 1397–1406 (2003). doi:10.1016/s0031-3203(02)00230-3

    Article  MATH  Google Scholar 

  42. Bostrom, N.: What is a singleton? Linguist. Philos. Investig. 5(2), 48–54 (2006)

    Google Scholar 

  43. Huang, X., Ma, H.: Automatic detection and localization of natural scene text in video. Paper presented at the IEEE ICPR (2010)

  44. Lin,W.-H., C,Y-L.: Affine face clustering and recognition based on wavelet features. Paper presented at the International Conference of Information Management (2005)

  45. Lin, W.-H., Chen, Y.-L., Pan, S.-W.: A novel face search system. Paper presented at the New Era Management Conference, Taiwan (2004)

  46. Bloch, I.: Some aspects of Dempster–Shafer evidence theory for classification of multi-modality medical images taking partial volume effect into account. Pattern Recognit. Lett. 17(8), 905–919 (1996)

    Article  Google Scholar 

  47. Le Hegarat-Mascle, S., Bloch, I., Vidal-Madjar, D.: Application of Dempster–Shafer evidence theory to unsupervised classification in multisource remote sensing. IEEE Trans. Geosci. Remote Sens. 35(4), 1018–1031 (1997)

    Article  Google Scholar 

  48. Alkaabi, S., Deravi, F.: Candidate pruning for fast corner detection. Electron. Lett. 40(1), 18–19 (2004). doi:10.1049/El:20040023

    Article  Google Scholar 

  49. Ko, S.-J., Lee, Y.H.: Center weighted median filters and their applications to image enhancement. IEEE Trans. Circuits Syst. 38(9), 984–993 (1991)

    Article  Google Scholar 

  50. Zhou, W., David, Z.: Progressive switching median filter for the removal of impulse noise from highly corrupted images. IEEE Trans. Circuits Systems II: Analog Digit. Signal Process. 46, 77–80 (1999)

  51. Boston, J.R.: A signal detection system based on Dempster–Shafer theory and comparison to fuzzy detection. IEEE Trans. Syst. Man Cybernet., Part C (Appl. Rev.) 30(1), 45–51 (2000). doi:10.1109/5326.827453

    Article  MathSciNet  Google Scholar 

  52. Awad, A.S., Man, H.: High performance detection filter for impulse noise removal in images. Electron. Lett. 44(3), 192–193 (2008). doi:10.1049/El:20083401

    Article  Google Scholar 

  53. Sun, T., Neuvo, Y.: Detail-preserving median based filters in image processing. Pattern Recognit. Lett. 15, 341–347 (1994)

    Article  Google Scholar 

  54. ICDAR 2003 text locating dataset. http://algoval.essex.ac.uk/icdar/Datasets.html (2003)

  55. Shahab, A., Shafait, F., Dengel, A.: ICDAR 2011 robust reading competition challenge 2: reading text in scene images. Paper presented at the International Conference on Document Analysis and Recognition (ICDAR), September (2011)

  56. Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Gomez, L., Robles, S., Mas, J., Fernandez, D., Almazan J.: Heras LPdl ICDAR 2013 robust reading competition. In: 12th International Conference of Document Analysis and Recognition (2013)

  57. Wang, K., Belongie, S.: Word spotting in the wild. Paper presented at the European Conference on Computer Vision (ECCV) (2010)

  58. Chen, Y.-L.: TVBS news dataset. http://vtmtcic.blogspot.tw (2014)

  59. TVBS news. http://www.tvbs.com.tw/index/index.html (2014)

  60. Huang, X.: A novel video text extraction approach based on Log-Gabor filters. Paper presented at the IEEE CISP (2011)

  61. Kovesi, P.: Matlab and octave functions for computer vision and image processing. http://www.csse.uwa.edu.au/~pk/research/matlabfns (1991)

  62. Hua, X., Yin, P., Zhang, H.J.: Efficient video text recognition using multiple frame integration. Paper presented at the IEEE ICIP (2002)

  63. Ricardo, B.Y., Berthier, R.N.: Modern Information Retrieval. Addison Wesley, New York (1999)

    Google Scholar 

  64. Zhen, W., Zhiqiang, W.: An efficient video text recognition system. Paper presented at the IEEE IHMSC (2010)

  65. Gali, P.K, Morellas, V., Papanikolopoulos, N.: A novel framework for annotating videos using text. Paper presented at the Mediterranean Conference on Control and Automation (2011)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pao-Ta Yu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, YL., Yu, PT. An evidence-based model of saliency feature extraction for scene text analysis. IJDAR 19, 269–287 (2016). https://doi.org/10.1007/s10032-016-0270-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10032-016-0270-6

Keywords

Navigation