Using Semantic Distance in a Content-Based Heterogeneous Information Retrieval System

  • Ahmad El Sayed
  • Hakim Hacid
  • Djamel Zighed
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4944)


This paper brings two contributions in relation with the semantic heterogeneous (documents composed of texts and images) information retrieval: (1) A new context-based semantic distance measure for textual data, and (2) an IR system providing a conceptual and an automatic indexing of documents by considering their heterogeneous content using a domain specific ontology. The proposed semantic distance measure is used in order to automatically fuzzify our domain ontology. The two proposals are evaluated and very interesting results were obtained. Using our semantic distance measure, we obtained a correlation ratio of 0.89 with human judgments on a set of words pairs which led our measure to outperform all the other measures. Preliminary combination results obtained on a specialized corpus of web pages are also reported.


Semantic Similarity Word Pair Domain Ontology Query Term Query Expansion 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Barnard, K., Duygulu, P., Forsyth, D.A.: Clustering art. CVPR (2), 434–441 (2001)Google Scholar
  2. 2.
    Barnard, K., Forsyth, D.A.: Learning the semantics of words and pictures. In: ICCV, pp. 408–415 (2001)Google Scholar
  3. 3.
    Celebi, E., Alpkocak, A.: Semantic image retrieval and auto annotation by covering keyword space to image space. In: MMM, Beijing, China, pp. 153–160 (2006)Google Scholar
  4. 4.
    Manning, C., Schutze, H.: Foundations of statistical natural language processing (1999)Google Scholar
  5. 5.
    Church, K.W., Hanks, P.: Word association norms, mutual information, and lexicography. In: Proceedings of the 27th. Annual Meeting of the Association for Computational Linguistics, pp. 76–83, Vancouver, B.C. (1989)Google Scholar
  6. 6.
    Duygulu, P., Barnard, K., de Freitas, J.F.G., Forsyth, D.A.: Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2353, pp. 97–112. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  7. 7.
    Hacid, H.: Neighborhood graphs for semi-automatic annotation of large image databases. In: Cham, T.-J., Cai, J., Dorai, C., Rajan, D., Chua, T.-S., Chia, L.-T. (eds.) MMM 2007. LNCS, vol. 4351, pp. 586–595. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  8. 8.
    Hindle, D.: Noun classification from predicate-argument structures. In: Meeting of the Association for Computational Linguistics, pp. 268–275 (1990)Google Scholar
  9. 9.
    Jeon, J., Lavrenko, V., Manmatha, R.: Automatic image annotation and retrieval using cross-media relevance models. In: SIGIR, pp. 119–126 (2003)Google Scholar
  10. 10.
    Jiang, J.J., Conrath, D.W.: Semantic similarity based on corpus statistics and lexical taxonomy (1997)Google Scholar
  11. 11.
    Leacock, C., Chodorow, M., Miller, G.A.: Using corpus statistics and wordnet relations for sense identification. Computational Linguistics 24(1), 147–165 (1998)Google Scholar
  12. 12.
    Li, J., Wang, J.Z.: Automatic linguistic indexing of pictures by a statistical modeling approach. IEEE Trans. Pattern Anal. Mach. Intell. 25(9), 1075–1088 (2003)CrossRefGoogle Scholar
  13. 13.
    Lin, D.: An information-theoretic definition of similarity. In: Proc. 15th International Conf. on Machine Learning, pp. 296–304. Morgan Kaufmann, San Francisco (1998)Google Scholar
  14. 14.
    Maron, O., Ratan, A.L.: Multiple-instance learning for natural scene classification. In: ICML, pp. 341–349 (1998)Google Scholar
  15. 15.
    Miller, G.A.: Wordnet: A lexical database for english. Commun. ACM 38(11), 39–41 (1995)CrossRefGoogle Scholar
  16. 16.
    Miller, G.A., Charles, W.: Contextual correlated of semantic similarity. Language and Cognitive Processes 6, 1–28 (1991)CrossRefGoogle Scholar
  17. 17.
    Monay, F., Gatica-Perez, D.: On image auto-annotation with latent space models. In: ACM Multimedia, pp. 275–278 (2003)Google Scholar
  18. 18.
    Parry, D.: A fuzzy ontology for medical document retrieval. In: ACSW Frontiers 2004: Proceedings of the second workshop on Australasian information security, Data Mining and Web Intelligence, and Software Internationalisation, Dunedin, New Zealand, pp. 121–126. Australian Computer Society, Inc., Darlinghurst, Australia (2004)Google Scholar
  19. 19.
    Picard, R.W., Minka, T.P.: Vision texture for annotation. Multimedia Syst. 3(1), 3–14 (1995)CrossRefGoogle Scholar
  20. 20.
    Rada, R., Mili, H., Bicknell, E., Blettner, M.: Development and application of a metric on semantic nets. IEEE Transactions on Systems, Man, and Cybernetics 19(1), 17–30 (1989)CrossRefGoogle Scholar
  21. 21.
    Resnik, P.: Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural language. J. Artif. Intell. Res (JAIR) 11, 95–130 (1999)zbMATHGoogle Scholar
  22. 22.
    Sayed, A.E., Hacid, H., Zighed, D.: A multisource context-dependent approach for semantic distance between concepts. In: Wagner, R., Revell, N., Pernul, G. (eds.) DEXA 2007. LNCS, vol. 4653, Springer, Heidelberg (2007)CrossRefGoogle Scholar
  23. 23.
    Takahashi, Y.M.H., Oka, R.: Image-to-word transformation based on dividing and vector quantizing images with words. In: Proceedings of the International Workshop on Multimedia Intelligent Storage and Retrieval Management, pp. 341–349 (1999)Google Scholar
  24. 24.
    Toussaint, G.T.: The relative neighborhood graphs in a finite planar set. Pattern recognition 12, 261–268 (1980)zbMATHCrossRefMathSciNetGoogle Scholar
  25. 25.
    Turney, P.D.: Mining the Web for synonyms: PMI–IR versus LSA on TOEFL. In: Flach, P.A., De Raedt, L. (eds.) ECML 2001. LNCS (LNAI), vol. 2167, pp. 491–502. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  26. 26.
    Tversky, A.: Features of similarity. Psychological Review 84, 327–352 (1977)CrossRefGoogle Scholar
  27. 27.
    Veltkamp, R.C., Tanase, M.: Content-based image retrieval systems : A survey. Technical Report UU-CS-2000-34, Department of Computing Science, Utrecht University (2000)Google Scholar
  28. 28.
    Widyantoro, D.H.: A fuzzy ontology-based abstract search engine and its user studies. FUZZ-IEEE, 1291–1294 (2001)Google Scholar
  29. 29.
    Wu, Z., Palmer, M.: Verb semantics and lexical selection. In: 32nd. Annual Meeting of the Association for Computational Linguistics, pp. 133–138, New Mexico State University, Las Cruces, New Mexico (1994)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Ahmad El Sayed
    • 1
  • Hakim Hacid
    • 1
  • Djamel Zighed
    • 1
  1. 1.ERIC Laboratory - 5University of Lyon 2Bron cedexFrance

Personalised recommendations