Redefining the DCT-based feature for scene text detection

Analysis and comparison of spatial frequency-based features
  • Hideaki Goto
Original Paper


We analyze some spatial frequency-based features used for text region detection in natural scene images, and redefine the DCT-based feature. We employ Fisher’s discriminant analysis to improve the DCT-based feature and to achieve higher accuracy. An unsupervised thresholding method for discriminating text and non-text regions is introduced and tested as well. Experimental results show that a wide high frequency band, covering some lower-middle frequency components, is generally more suitable for scene text detection despite the original definition of the DCT-based feature.


Scene text Text region detection Discrete cosine transform Fisher’s discriminant analysis 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Crandall, D., Antani, S., Kasturi, R.: Extraction of special effects caption text events from digital video. Int. J. Document Anal. Recogn. (IJDAR) 5(2–3), 138–157 (2003)CrossRefGoogle Scholar
  2. 2.
    Fisher, R.A.: The use of multiple measurements in taxonomic problems. Ann. Eugen. 7, 179–188 (1936)Google Scholar
  3. 3.
    Gllavata, J., Ewerth, R., Freisleben, B.: Text detection in images based on unsupervised classification of high-frequency wavelet coefficients. In: Proceedings of 17th International Conference on Pattern Recognition, vol. 1, pp. 425–428 (2004)Google Scholar
  4. 4.
    Jung, K., Kim, K.I., Jain, A.K.: Text information extraction in images and video: a survey. Pattern Recognit. 37, 977–997 (2004)CrossRefGoogle Scholar
  5. 5.
    Kim, K.C., Byun, H.R., Song, Y.J., Choi, Y.W., Chi, S.Y., Kim, K.K., Chung, Y.K.: Scene text extraction in natural scene images using hierarchical feature combining and verification. In: Proceedings of 17th International Conference on Pattern Recognition, vol. 2, pp. 679–682 (2004)Google Scholar
  6. 6.
    Liang, J., Doermann, D., Li, H.: Camera-based analysis of text and documents: a survey. Int. J. Document Anal. Recogn. (IJDAR) 7(2–3), 84–104 (2005)CrossRefGoogle Scholar
  7. 7.
    Lim, Y.K., Choi, S.H., Lee, S.W.: Text extraction in MPEG compressed video for content-based indexing. In: Proceedings of 15th International Conference on Pattern Recognition, vol. 4, pp. 409–412 (2000)Google Scholar
  8. 8.
    Lucas, S.M., Panaretos, A., Sosa, L., Tang, A., Wong, S., Young, R.: ICDAR 2003 robust reading competitions. In: Proceedings of 7th International Conference on Document Analysis and Recognition (ICDAR 2003), vol. II, pp.682–687 (2003)Google Scholar
  9. 9.
    Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. SMC-9(1), 62–66 (1979)MathSciNetGoogle Scholar
  10. 10.
    Zhong, Y., Zhang, H., Jain, A.K.: Automatic caption localization in compressed video. IEEE Trans. Pattrn Anal. Mach. Intell. PAMI-22(4), 385–392 (2000)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag 2008

Authors and Affiliations

  1. 1.Information Synergy CenterTohoku UniversitySendaiJapan

Personalised recommendations