Localization of Caption Texts in Natural Scenes Using a Wavelet Transformation

  • Javier Jiménez
  • Enric Martí
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3287)


Automatic extraction of text from multimedia contents is an important problem that needs to be solved in order to obtain more effective retrieval engines. Recently, Crandall, Antani and Kasturi have shown that a direct analysis of certain DCT coefficients can be used to locate potential regions of caption text in MPEG-1 videos. In this paper, we extend their proposal to wavelet-coded images, and show that localization of text superimposed in natural scenes can also be effectively and efficiently performed by a wavelet transformation of the image followed by an analysis of the distribution of second order statistics on high frequency wavelet bands.


Natural-scene statistics text localization text segmentation wavelets texture analysis image analysis computer vision 


  1. 1.
    O’Gorman, L., Kasturi, R. (eds.): Document Image Analysis. IEEE Computer Society Press, Los Alamitos (1997) (Published as Technical Briefing)Google Scholar
  2. 2.
    Hu, J., Bagga, A.: Categorizing images in web documents. IEEE Trans. on Multimedia, 22–30 (2004)Google Scholar
  3. 3.
    Allier, B., Duong, J., Gagneux, A., Mallet, P., Emptoz, H.: Texture feature characterization for logical pre-labeling. In: Proc. of Int. Conference on Document Analysis and Recognition, pp. 567–571 (2003)Google Scholar
  4. 4.
    Zhong, Y., Karu, K., Jain, A.K.: Locating text in complex color images. In: Proc. of Int. Conference on Document Analysis and Recognition, pp. 146–149 (1995)Google Scholar
  5. 5.
    Patel, D.: Page segmentation for document image analysis using a neural network. Optical Engineering 35, 1854–1861 (1996)CrossRefGoogle Scholar
  6. 6.
    Payne, J.S., Stonham, T.J., Patel, D.: Document segmentation using texture analysis. In: Proc. of Int. Conference on Pattern Recognition, pp. 380–382 (1994)Google Scholar
  7. 7.
    Jain, A.K., Bhattacharjee, S.K.: Address block location on envelopes using Gabor filters. Pattern Recognition 25, 1459–1477 (1992)CrossRefGoogle Scholar
  8. 8.
    Menoti, D., Borges, D.L., Facon, J., Britto, A.S.: Segmentation of postal envelopes for address block location: an approach based on feature selection in wavelet space. In: Proc. of Int. Conf. on Document Analysis and Recognition, pp. 699–703 (2003)Google Scholar
  9. 9.
    Crandall, D., Antani, S., Kasturi, R.: Extraction of special effects caption text events from digital video. Int. Journal on Document Analysis and Recognition 5, 138–157 (2003)CrossRefGoogle Scholar
  10. 10.
    van Hateren, J.H., Ruderman, D.L.: Independent component analysis of natural image sequences yields spatio-temporal filters similar to simple cells in primary visual cortex. Proc. of the Royal Society of London, Series B 265, 2315–2320 (1998)CrossRefGoogle Scholar
  11. 11.
    Brodatz, P.: Textures: A photographic Album for Artists and Designers. Dover Publications, N.Y (1966)Google Scholar
  12. 12.
    Rao, K.R., Yip, P.: Discrete Cosine Transform. In: Algorithms, Advantages, Applications, Academic Press, London (1990)Google Scholar
  13. 13.
    Mallat, S.: A Wavelet Tour of Signal Processing. Academic Press, London (1998)zbMATHGoogle Scholar
  14. 14.
    Olshausen, B.A., Field, D.J.: Sparse coding with an overcomplete basis set: a strategy employed by V1. Vision Research 37, 3311–3325 (1997)CrossRefGoogle Scholar
  15. 15.
    Olshausen, B.A., Field, D.J.: Natural image statistics and efficient coding. Network Computation in Neural Systems 7, 333–339 (1996)CrossRefGoogle Scholar
  16. 16.
    Wang, J.Z.: Integrated Region-based Image Retrieval. Kluwer Academic Publishers, The Netherlands (2001)zbMATHGoogle Scholar
  17. 17.
    Mallat, S.: A theory for multiresolution signal decomposition: the wavelet representation. IEEE Trans. on Pattern Analysis and Machine Intelligence 11, 674–693 (1989)zbMATHCrossRefGoogle Scholar
  18. 18.
    Bhattacharya, U., Chaudhuri, B.B.: A majority voting scheme for multiresolution recognition of handprinted numerals. In: Proc. of Int. Conference on Document Analysis and Recognition (ICDAR), pp. 16–20 (2003)Google Scholar
  19. 19.
    Li, H., Doermann, D.S.: Automatic identification of text in digital video key frames. In: Proc. of Int. Conference on Pattern Recognition, pp. 129–132 (1998)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Javier Jiménez
    • 1
  • Enric Martí
    • 1
  1. 1.Centre de Visió per Computador, Dept. InformàticaUAB, Edifici O, Campus UABBellaterra (Barcelona)Spain

Personalised recommendations