Conceptual Indexing Based on Document Content Representation

  • Mustapha Baziz
  • Mohand Boughanem
  • Nathalie Aussenac-Gilles
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3507)


This paper addresses an important problem related to the use of semantics in IR. It concerns the representation of document semantics and its proper use in retrieval. The approach we propose aims at representing the content of the document by the best semantic network called document semantic core in two main steps. During the first step concepts (words and phrases) are extracted from a document, driven by an external general-purpose ontology, namely WordNet. The second step a global disambiguation of the extracted concepts regarding to the document leads to build the best semantic network. Thus, the selected concepts represent the nodes of the semantic network whereas similarity measure values between connected nodes weight the links. The resulting scored concepts are used for the document conceptual indexing in Information Retrieval.


Information Retrieval Semantic Representation of Documents Similarity Measures Conceptual Indexing ontologies WordNet 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Krovetz, R., Croft, W.B.: Lexical ambiguity and information retrieval. ACM Transactions on Information Systems 10(2), 115–141 (1992)CrossRefGoogle Scholar
  2. 2.
    Khan, L., Luo, F.: Ontology Construction for Information Selection. In: Proc. of 14th IEEE International Conference on Tools with Artificial Intelligence, Washington DC, November 2002, pp. 122–127 (2002)Google Scholar
  3. 3.
    Mihalcea, R., Moldovan, D.: Semantic indexing using WordNet senses. In: Proceedings of ACL Workshop on IR & NLP, Hong Kong (October 2000)Google Scholar
  4. 4.
    Baziz, M., Boughanem, M., Aussenac-Gilles, N., Chrisment, C.: Semantic Cores for Representing Documents in IR. In: Proceeding of the 2005 ACM Symposium on Applied Computing, Santa Fe, New Mexico, USA, March 2005, vol. 2, pp. 1011–1017 (2005)Google Scholar
  5. 5.
    Haav, H.M., Lubi, T.-L.: A Survey of Concept-based Information Retrieval Tools on the Web. In: Proc. of 5th East-European Conference ADBIS 2001, Vilnius "Technika", vol. 2, pp. 29–41 (2001)Google Scholar
  6. 6.
    Guarino, N., Masolo, C., Vetere, G.: OntoSeek: content-based access to the web. IEEE Intelligent Systems 14, 70–80 (1999)Google Scholar
  7. 7.
    Voorhees, E.M.: Using WordNet to Disambiguate Word Sense for Text Retrieval. In: Proceedings of the 16th Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval, Pittsburgh, PA, pp. 171–180 (1993)Google Scholar
  8. 8.
    Stokoe, C., Oakes, M.P., Tait, J.: Word sense Disambiguation in Information Retrieval Revisited. In: Proceed. of the 26th Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval, Toronto, Canada, pp. 159–166 (2003)Google Scholar
  9. 9.
    Gonzalo, J., Verdejo, F., Chugur, I., Cigarrán, J.: Indexing with WordNet synsets can improve text retrieval. In: Proc. the COLING/ACL 1998 Workshop on Usage of WordNet for Natural Language Processing (1998)Google Scholar
  10. 10.
    Sanderson, M.: Retrieving with good senses. Information Retrieval 2(1), 49–69 (2000)CrossRefGoogle Scholar
  11. 11.
    Woods, W.: Conceptual Indexing: A Better Way to Organize Knowledge. Technical report SMLI TR-97-61, Sun Microsystems Laboratories, Mountain view, CA (1997)Google Scholar
  12. 12.
    Cucchiarelli, N.R., Neri, F., Velardi, P.: Extending and Enriching WordNet with OntoLearn. In: Proc. of The Second Global Wordnet Conference 2004 (GWC 2004), Brno, Czech Republic, January 20-23 (2004)Google Scholar
  13. 13.
    Croft, W.B., Turtle, H.R., Lewis, D.D.: The Use of Phrases and Structured Queries in Information Retrieval. In: Bookstein, A., Chiaramella, Y., Salton, G., Raghavan, V.V. (eds.) Proceedings of the 4th Annual International ACMSIGIR Conference on Research and Development in Information Retrieval, Chicago, Illinois, pp. 32–45 (1991)Google Scholar
  14. 14.
    Huang, X., Robertson, S.E.: Comparisons of Probabilistic Compound Unit Weighting Methods. In: Proc. of the ICDM 2001 Workshop on Text Mining, San Jose, USA (November 2001)Google Scholar
  15. 15.
    Budanitsky, A.: Lexical Semantic Relatedness and its Application in Natural Language Pro-cessing, technical report CSRG-390, Department of Computer Science, University of Toronto (August 1999)Google Scholar
  16. 16.
    Patwardhan, S., Banerjee, S., Pedersen, T.: Using measures of semantic relatedness for word sense disambiguation. In: Proceedings of the Fourth International Conference on Intelligent Text Processing and Computational Linguistics CICLING, Mexico City (2003)Google Scholar
  17. 17.
    Rennie, J.: WordNet: QueryData: a Perl module for accessing the WordNet database (2003),
  18. 18.
    Leacock, C., Chodorow, M.: Combining local context and WordNet similarity for word sense identification. In: Fellbaum 1998, pp. 265–283 (1998)Google Scholar
  19. 19.
    Resnik, P.: Using Information Content to Evaluate Semantic Similarity in a Taxonomy. In: Proceedings of the 14th Intern. Joint Conference on Artificial Intelligence (IJCAI) (1995)Google Scholar
  20. 20.
    Lin, D.: An information theoretic definition of similarity. In: Proceedings of the 15 th International Conference on Machine Learning, Madison, WI (1998)Google Scholar
  21. 21.
    Jiang, J.J., Conrath, D.W.: Semantic simi-larity based on corpus statistics and lexical taxonomy. In: Proceedings of International Conference on Research in Computational Linguistics, Taiwan (1997)Google Scholar
  22. 22.
    Lesk, M.E.: Automatic sense disambiguation using machine readable dictionaries: How to tell a pine cone from a nice cream cone. In: Proceedings of the SIGDOC Conference, Toronto (1986)Google Scholar
  23. 23.
    Boughanem, M., Dkaki, T., Mothe, J., Soulé-Dupuy, C.: Mercure at TREC-7. In: Proceeding of Trec-7 (1998)Google Scholar
  24. 24.
    Salton, G.: The SMART Retrieval System. Prentice-Hall, Englewood Cliffs (1971)Google Scholar
  25. 25.
    Buitelaar, P., Steffen, D., Volk, M., Widdows, D., Sacaleanu, B., Vintar, S., Peters, S., Uszkoreit, H.: Evaluation Resources for Concept-based Cross-Lingual IR in the Medical Domain. In: Proc. of LREC 2004, Lissabon, Portugal (May 2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Mustapha Baziz
    • 1
  • Mohand Boughanem
    • 1
  • Nathalie Aussenac-Gilles
    • 1
  1. 1.IRITToulouse Cedex 4France

Personalised recommendations