Using Second Order Statistics to Enhance Automated Image Annotation

  • Ainhoa Llorente
  • Stefan Rüger
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5478)


We examine whether a traditional automated annotation system can be improved by using background knowledge. Traditional means any machine learning approach together with image analysis techniques. We use as a baseline for our experiments the work done by Yavlinsky et al. [1] who deployed non-parametric density estimation. We observe that probabilistic image analysis by itself is not enough to describe the rich semantics of an image. Our hypothesis is that more accurate annotations can be produced by introducing additional knowledge in the form of statistical co-occurrence of terms. This is provided by the context of images that otherwise independent keyword generation would miss. We test our algorithm with two different datasets: Corel 5k and ImageCLEF 2008. For the Corel 5k dataset, we obtain significantly better results while our algorithm appears in the top quartile of all methods submitted in ImageCLEF 2008.


Automated Image Annotation Statistical Analysis Word Co-occurrence Semantic Similarity 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Yavlinsky, A., Schofield, E., Rüger, S.: Automated image annotation using global features and robust nonparametric density estimation. In: Leow, W.-K., Lew, M., Chua, T.-S., Ma, W.-Y., Chaisorn, L., Bakker, E.M. (eds.) CIVR 2005. LNCS, vol. 3568, pp. 507–517. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  2. 2.
    Duygulu, P., Barnard, K., de Freitas, J.F.G., Forsyth, D.A.: Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2353, pp. 97–112. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  3. 3.
    Melamed, I.D.: Empirical methods for exploiting parallel texts. PhD thesis, University of Pennsylvania (1998)Google Scholar
  4. 4.
    Jin, R., Chai, J.Y., Si, L.: Effective automatic image annotation via a coherent language model and active learning. In: Proceedings of the 12th International ACM Conferencia on Multimedia, pp. 892–899 (2004)Google Scholar
  5. 5.
    Mori, Y., Takahashi, H., Oka, R.: Image-to-word transformation based on dividing and vector quantizing images with words. In: International Workshop on Multimedia Intelligent Storage and Retrieval Management (1999)Google Scholar
  6. 6.
    Barnard, K., Duygulu, P., Forsyth, D., de Freitas, N., Blei, D., Jordan, M.: Matching words and pictures. Journal of Machine Learning Research 3, 1107–1135 (2003)zbMATHGoogle Scholar
  7. 7.
    Jin, Y., Khan, L., Wang, L., Awad, M.: Image annotations by combining multiple evidence & WordNet. In: Proceedings of the 13th International ACM Conference on Multimedia, pp. 706–715 (2005)Google Scholar
  8. 8.
    Liu, J., Li, M., Ma, W.Y., Liu, Q., Lu, H.: An adaptive graph model for automatic image annotation. In: Proceedings of the 8th ACM international workshop on Multimedia information retrieval, pp. 61–70 (2006)Google Scholar
  9. 9.
    Zhou, X., Wang, M., Zhang, Q., Zhang, J., Shi, B.: Automatic image annotation by an iterative approach: incorporating keyword correlations and region matching. In: Proceedings of the International ACM Conference on Image and Video Retrieval, pp. 25–32 (2007)Google Scholar
  10. 10.
    Jeon, J., Lavrenko, V., Manmatha, R.: Automatic image annotation and retrieval using cross-media relevance models. In: Proceedings of International ACM Conference on Research and Development in Information Retrieval, pp. 119–126 (2003)Google Scholar
  11. 11.
    Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley, Reading (1999)Google Scholar
  12. 12.
    Escalante, H.J., Montes, M., Sucar, L.E.: Word co-occurrence and Markov Random Fields for improving automatic image annotation. In: Proceedings of the 18th British Machine Vision Conference (2007)Google Scholar
  13. 13.
    Tollari, S., Detyniecki, M., Fakeri-Tabrizi, A., Amini, M.R., Gallinari, P.: UPMC/LIP6 at ImageCLEFphoto 2008: On the exploitation of visual concepts (VCDT). In: Evaluating Systems for Multilingual and Multimodal Information Access – 9th Workshop of the Cross-Language Evaluation Forum (2008)Google Scholar
  14. 14.
    Deselaers, T., Hanbury, A.: The visual concept detection task in ImageCLEF 2008. In: Evaluating Systems for Multilingual and Multimodal Information Access – 9th Workshop of the Cross-Language Evaluation Forum (2008)Google Scholar
  15. 15.
    Miller, G.A., Charles, W.G.: Contextual correlates of semantic similarity. Journal of Language and Cognitive Processes 6, 1–28 (1991)CrossRefGoogle Scholar
  16. 16.
    Manning, C.D., Schütze, H.: Foundations of statistical natural language processing. MIT Press, Cambridge (1999)zbMATHGoogle Scholar
  17. 17.
    Makadia, A., Pavlovic, V., Kumar, S.: A new baseline for image annotation. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302. Springer, Heidelberg (2008)Google Scholar
  18. 18.
    Fawcett, T.: An introduction to ROC analysis. Pattern Recognition Letters 27(8), 861–874 (2006)MathSciNetCrossRefGoogle Scholar
  19. 19.
    Hauptmann, A., Yan, R., Lin, W.H.: How many high-level concepts will fill the semantic gap in news video retrieval? In: Proceedings of the 6th ACM International Conference on Image and Video Retrieval, pp. 627–634 (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Ainhoa Llorente
    • 1
  • Stefan Rüger
    • 1
  1. 1.Knowledge Media InstituteThe Open UniversityMilton KeynesUnited Kingdom

Personalised recommendations