Exploring the Semantics behind a Collection to Improve Automated Image Annotation

  • Ainhoa Llorente
  • Enrico Motta
  • Stefan Rüger
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6242)


The goal of this research is to explore several semantic relatedness measures that help to refine annotations generated by a baseline non-parametric density estimation algorithm. Thus, we analyse the benefits of performing a statistical correlation using the training set or using the World Wide Web versus approaches based on a thesaurus like WordNet or Wikipedia (considered as a hyperlink structure). Experiments are carried out using the dataset provided by the 2009 edition of the ImageCLEF competition, a subset of the MIR-Flickr 25k collection. Best results correspond to approaches based on statistical correlation as they do not depend on a prior disambiguation phase like WordNet and Wikipedia. Further work needs to be done to assess whether proper disambiguation schemas might improve their performance.


Semantic Similarity Semantic Relatedness Area Under Curve Mean Average Precision Automate Image Annotation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Miller, G.A., Charles, W.G.: Contextual correlates of semantic similarity. Journal of Language and Cognitive Processes 6, 1–28 (1991)CrossRefGoogle Scholar
  2. 2.
    Llorente, A., Rüger, S.: Using second order statistics to enhance automated image annotation. In: Proceedings of the 31st European Conference on Information Retrieval, vol. 5478, pp. 570–577 (2009)Google Scholar
  3. 3.
    Nowak, S., Dunker, P.: Overview of the CLEF 2009 Large Scale – Visual Concenpt Detection and Annotation Task. In: Peters, C., et al. (eds.) CLEF 2009 Workshop, Part II. LNCS, vol. 6242, pp. 94–109. Springer, Heidelberg (2010)Google Scholar
  4. 4.
    Yavlinsky, A., Schofield, E., Rüger, S.: Automated image annotation using global features and robust nonparametric density estimation. In: Proceedings of the International ACM Conference on Image and Video Retrieval, pp. 507–517 (2005)Google Scholar
  5. 5.
    Gracia, J., Mena, E.: Web-based measure of semantic relatedness. In: Bailey, J., Maier, D., Schewe, K.-D., Thalheim, B., Wang, X.S. (eds.) WISE 2008. LNCS, vol. 5175, pp. 136–150. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  6. 6.
    Cilibrasi, R., Vitanyi, P.: The Google similarity distance. IEEE Transactions on Knowledge and Data Engineering 19(3), 370–383 (2007)CrossRefGoogle Scholar
  7. 7.
    Budanitsky, A., Hirst, G.: Evaluating WordNet-based Measures of Lexical Semantic Relatedness. Computational Linguistics 32(1), 13–47 (2006)CrossRefGoogle Scholar
  8. 8.
    Jiang, J.J., Conrath, D.W.: Semantic similarity based on corpus statistics and lexical taxonomy. In: Proceedings of International Conference Research on Computational Linguistics (1997)Google Scholar
  9. 9.
    Hirst, G., St-Onge, D.: Lexical chains as representations of context for the detection and correction of malapropisms. In: WordNet: A Lexical Database for English, pp. 305–332. The MIT Press, Cambridge (1998)Google Scholar
  10. 10.
    Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence, pp. 448–453 (1995)Google Scholar
  11. 11.
    Banerjee, S., Pedersen, T.: Extended gloss overlaps as a measure of semantic relatedness. In: Proceedings of the Eighteenth International Conference on Artificial Intelligence (2003)Google Scholar
  12. 12.
    Medelyan, O., Milne, D., Legg, C., Witten, I.H.: Mining meaning from wikipedia. International Journal of Human-Computer Studies 67(9), 716–754 (2009)CrossRefGoogle Scholar
  13. 13.
    Ponzetto, S., Strube, M.: Knowledge derived from wikipedia for computing semantic relatedness. Journal of Artificial Intelligence Research (JAIR) 30, 181–212 (2007)zbMATHGoogle Scholar
  14. 14.
    Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using wikipedia-based explicit semantic analysis. In: Proceedings of the 20th International Joint Conference for Artificial Intelligence, pp. 1606–1611 (2007)Google Scholar
  15. 15.
    Milne, D., Witten, I.: An effective, low-cost measure of semantic relatedness obtained from wikipedia links. In: Proceedings of the first AAAI Workshop on Wikipedia and Artifical Intellegence (2008)Google Scholar
  16. 16.
    Agirre, E., Alfonseca, E., Hall, K., Kravalova, J., Pasca, M., Soroa, A.: A study on similarity and relatedness using distributional and wordnet-based approaches. In: Proceedings of NAACL-HLT (2009)Google Scholar
  17. 17.
    Nowak, S., Lukashevich, H.: Multilabel classification evaluation using ontology information. In: Proceedings of ESWC Workshop on Inductive Reasoning and Machine Learning on the Semantic Web (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Ainhoa Llorente
    • 1
  • Enrico Motta
    • 1
  • Stefan Rüger
    • 1
  1. 1.Knowledge Media InstituteThe Open UniversityMilton KeynesUnited Kingdom

Personalised recommendations