Towards a Semantic Representation of Documents by Ontology-Document Mapping

  • Mustapha Baziz
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3192)


This paper deals with the use of ontologies in Information Retrieval field. It introduces an approach for document content representation by ontology-document matching. The approach consists in concepts (mono and multiword) detection from a document via a general purpose ontology, namely WordNet. Two criterions are then used: co-occurrence for identifying important concepts in a document, and semantic similarity to compute semantic relatedness between these concepts and then to disambiguate them. The result is a set of scored concepts-senses (nodes) with weighted links called semantic core of document which best represents the semantic content of the document. We regard the proposed and evaluated approach as a short but strong step toward the long term goal of Intelligent Indexing and Semantic Retrieval.


Information Retrieval Semantic representation of documents ontologies WordNet 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    OntoQuery project net site,
  2. 2.
    Khan, L., Luo, F.: Ontology Construction for Information Selectio. In: Proc. of 14th IEEE International Conference on Tools with Artificial Intelligence, Washington DC, November 2002, pp. 122–127 (2002)Google Scholar
  3. 3.
    Guarino, N., Masolo, C., Vetere, G.: OntoSeek: content-based access to the web. IEEE Intelligent Systems 14, 70–80 (1999)Google Scholar
  4. 4.
    Baziz, M., Aussenac-Gilles, N., et Boughanem, M.: Désambiguïsation et Expansion de Requêtes dans un SRI: Etude de l’apport des liens sémantiques. In: Hermes, V. (ed.) Revue des Sciences et Technologies de l’Information (RSTI) série ISI, December 2003, vol. 8(4/2003), pp. 113–136 (2003)Google Scholar
  5. 5.
    Mihalcea, R., Moldovan, D.: Semantic indexing using WordNet senses. In: Proceedings of ACL Workshop on IR & NLP, Hong Kong (October 2000)Google Scholar
  6. 6.
    Miller, G.: Wordnet: A lexical database. Communication of the ACM 38(11), 39–41 (1995)CrossRefGoogle Scholar
  7. 7.
    Lee, J.H., Kim, M.H., Lee, Y.J.: Information retrieval based on conceptual distance in IS-A hierarchies. Journal of Documentation 49(2), 188–207 (1993)CrossRefGoogle Scholar
  8. 8.
    Haav, H.M., Lubi, T.-L.: A Survey of Concept-based Information Retrieval Tools on the Web. In: Proc. of 5th East-European Conference ADBIS*2001, Vilnius Technika, vol. 2., pp. 29–41 (2001)Google Scholar
  9. 9.
    Gonzalo, J., Verdejo, F., Chugur, I., Cigarrán, J.: Indexing with WordNet synsets can improve text retrieval. In: Proc. the COLING/ACL 1998 Workshop on Usage of WordNet for Natural Language Processing (1998)Google Scholar
  10. 10.
    Zarg Ayouna, H., Salotti, S.: Mesure de similarité dans une ontologie pour l’indexation sémantique de documents XML. In: Dans Ing. des Connais, IC 2004, Lyon Mai, pp. 249–260 (2004)Google Scholar
  11. 11.
    Cucchiarelli, R., Navigli, F., Neri, P.: Velardi. Extending and Enriching WordNet with OntoLearn. In: Proc. of The Second Global Wordnet Conference 2004 (GWC 2004), Brno, Czech Republic (January 20-23, 2004)Google Scholar
  12. 12.
    Hirst, G., St. Onge, D.: Lexical chains as representations of context for the detection and correction of malapropisms. In: Fellbaum, C. (ed.) WordNet: An electronic lexical database, pp. 305–332. MIT Press, Cambridge (1998)Google Scholar
  13. 13.
    Resnik, P.: Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language. Journal of Artificial Intelligence Research (JAIR) 11, 95–130 (1999)zbMATHGoogle Scholar
  14. 14.
    Banerjee, S., Pedersen, T.: An adapted Lesk algorithm for word sense disambiguation using Word-Net. In: Proc. of the Third International Conference on Intelligent Text Processing and Computational Linguistics, Mexico City (February 2002)Google Scholar
  15. 15.
    Lesk, M.: Automatic sense disambiguation using machine readable dictionaries: How to tell a pine cone from a ice cream cone. In: Proc. of SIGDOC 1986 (1986)Google Scholar
  16. 16.
    Croft, W.B., Turtle, H.R., Lewis, D.D.: The Use of Phrases and Structured Queries in Information Retrieval. In: Bookstein, A., Chiaramella, Y., Salton, G., Raghavan, V.V. (eds.) Proceedings of the Fourteenth Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval, Chicago, Illinois, pp. 32–45 (1991)Google Scholar
  17. 17.
    Huang, X., Robertson, S.E.: Comparisons of Probabilistic Compound Unit Weighting Methods. In: Proc. of the ICDM 2001 Workshop on Text Mining, San Jose, USA (November 2001)Google Scholar
  18. 18.
    Magnini, B., Cavaglia, G.: Integrating Subject Field Codes into WordNet. In: Proc. of the 2nd International Conference on Language resources and Evaluation, LREC 2000, Atenas (2000)Google Scholar
  19. 19.
    Boughanem, M., Dkaki, T., Mothe Et, J., SoulÉ-Dupuy, C.: Mercure at TREC-7. In: Proceeding of Trec-7 (1998) Google Scholar
  20. 20.
    Buitelaar, P., Steffen, D., Volk, M., Widdows, D., Sacaleanu, B., Vintar, S., Peters, S., Uszkoreit, H.: Evaluation Resources for Concept-based Cross-Lingual IR in the Medical Domai. In: Proc. of LREC 2004, Lissabon, Portugal (May 2004)Google Scholar
  21. 21.
    The Sixth Text REtrieval Conference (TREC{6). Edited by E.M. Voorhees and D.K. Harman. Gaithersburg, MD: NIST (1998)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Mustapha Baziz
    • 1
  1. 1.IRITCampus universitaire ToulouseIIIToulouse Cedex 4France

Personalised recommendations