Abstract
We analyze some spatial frequency-based features used for text region detection in natural scene images, and redefine the DCT-based feature. We employ Fisher’s discriminant analysis to improve the DCT-based feature and to achieve higher accuracy. An unsupervised thresholding method for discriminating text and non-text regions is introduced and tested as well. Experimental results show that a wide high frequency band, covering some lower-middle frequency components, is generally more suitable for scene text detection despite the original definition of the DCT-based feature.
Similar content being viewed by others
References
Crandall, D., Antani, S., Kasturi, R.: Extraction of special effects caption text events from digital video. Int. J. Document Anal. Recogn. (IJDAR) 5(2–3), 138–157 (2003)
Fisher, R.A.: The use of multiple measurements in taxonomic problems. Ann. Eugen. 7, 179–188 (1936)
Gllavata, J., Ewerth, R., Freisleben, B.: Text detection in images based on unsupervised classification of high-frequency wavelet coefficients. In: Proceedings of 17th International Conference on Pattern Recognition, vol. 1, pp. 425–428 (2004)
Jung, K., Kim, K.I., Jain, A.K.: Text information extraction in images and video: a survey. Pattern Recognit. 37, 977–997 (2004)
Kim, K.C., Byun, H.R., Song, Y.J., Choi, Y.W., Chi, S.Y., Kim, K.K., Chung, Y.K.: Scene text extraction in natural scene images using hierarchical feature combining and verification. In: Proceedings of 17th International Conference on Pattern Recognition, vol. 2, pp. 679–682 (2004)
Liang, J., Doermann, D., Li, H.: Camera-based analysis of text and documents: a survey. Int. J. Document Anal. Recogn. (IJDAR) 7(2–3), 84–104 (2005)
Lim, Y.K., Choi, S.H., Lee, S.W.: Text extraction in MPEG compressed video for content-based indexing. In: Proceedings of 15th International Conference on Pattern Recognition, vol. 4, pp. 409–412 (2000)
Lucas, S.M., Panaretos, A., Sosa, L., Tang, A., Wong, S., Young, R.: ICDAR 2003 robust reading competitions. In: Proceedings of 7th International Conference on Document Analysis and Recognition (ICDAR 2003), vol. II, pp.682–687 (2003)
Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. SMC-9(1), 62–66 (1979)
Zhong, Y., Zhang, H., Jain, A.K.: Automatic caption localization in compressed video. IEEE Trans. Pattrn Anal. Mach. Intell. PAMI-22(4), 385–392 (2000)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Goto, H. Redefining the DCT-based feature for scene text detection. IJDAR 11, 1–8 (2008). https://doi.org/10.1007/s10032-008-0061-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10032-008-0061-9