Exploiting Disambiguated Thesauri for Information Retrieval in Metadata Catalogs

  • Javier Nogueras-Iso
  • Javier Lacasta
  • José Ángel Bañares
  • Pedro R. Muro-Medrano
  • F. Javier Zarazaga-Soria
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3040)


Information in Digital Libraries is explicitly organized, described, and managed. The content of their data resources is summarized into small descriptions, usually called metadata, which can be either introduced manually or automatically generated. In this context, specialized thesauri are frequently used to provide accurate content for subject or keyword metadata elements. However, if a Digital Library aims at providing access for the general public, it is not reasonable to assume that casual users will use the same terms as the keywords used in metadata records. As an initial step to fill the semantic gap between user queries and metadata records, the authors of this paper already created a method for the semantic disambiguation of thesauri with respect to an upper-level ontology (WordNet). This paper presents now the integration of this disambiguation within an information retrieval system, in this case adapting the vector-space retrieval model. Thanks to the disambiguation, both metadata records and queries can be homogenously represented as a collection of WordNet synsets, thus enabling the computing of a similarity value, which ranks the results.


Digital Library Retrieval Model Query Term User Query Information Retrieval System 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Clark, P., Thompson, J., Holmback, H., Duncan, L.: Exploiting a thesaurus-based semantic net for knowledge-based search. In: Proc 12th Conf on Innovative Application of AI (AAAI/IAAI 2000), pp. 988–995 (2000)Google Scholar
  2. 2.
    Mata, E.J., Ansó, J., Bañares, J.A., Muro-Medrano, P.R., Rubio, J.: Enriquecimiento de tesauros con wordnet: una aproximación heurística. In: Actas IX CAEPIA, Gijón, pp. 593–602 (2001)Google Scholar
  3. 3.
    Miller, G.A.: Wordnet: An on-line lexical database. Int. J. Lexicography 3 (1990)Google Scholar
  4. 4.
    Gonzalo, J., Verdejo, F., Chugur, I., Cigarran, J.: Indexing with WordNet synsets can improve Text Retrieval. In: Proc. COLING/ACL 1998 Workshop on Usage of WordNet for Natural Language Processing (1998)Google Scholar
  5. 5.
    Sanderson, M.: Word sense disambiguation and information retrieval. In: Proceedings of the 17th International Conference on Research and Development in Information Retrieval (1994)Google Scholar
  6. 6.
    Salton, G. (ed.): The SMART retrieval system - Experiments in Automatic Document Processing. Prentice Hall, Inc., Englewood Cliffs (1971)Google Scholar
  7. 7.
    Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)zbMATHGoogle Scholar
  8. 8.
    Voorhees, E.M.: Using WordNet to disambiguate Word Senses for Text Retrieval. In: SIGIR 1993, Proc. 16th annual international ACM SIGIR conf. on Research and Development in Information Retrieval, pp. 171–180 (1993)Google Scholar
  9. 9.
    Voorhees, E.M.: On Expanding Query Vectors with Lexically Related Words. In: Text REtrieval Conference, pp. 223–232 (1993)Google Scholar
  10. 10.
    Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Information Processing & Management 24, 513–523 (1988)CrossRefGoogle Scholar
  11. 11.
    Bernabé, M.A., Gould, M., Muro-Medrano, P.R., Nogueras, J., Zarazaga, F.J.: Effective steps toward the Spain National Geographic Information Infrastructure. In: Proc 4th AGILE Conference on Geographic Information Science, Brno, Czech Republic, pp. 236–243 (2001)Google Scholar
  12. 12.
    Nassar, N.: Searching With Isearch, Moving beyond WAIS. Web Techniques magazine (1997),
  13. 13.
    Scherer, D., Brennan, C.: Exploring Oracle Text Basics. Oracle Magazine (March/April 2001)

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Javier Nogueras-Iso
    • 1
  • Javier Lacasta
    • 1
  • José Ángel Bañares
    • 1
  • Pedro R. Muro-Medrano
    • 1
  • F. Javier Zarazaga-Soria
    • 1
  1. 1.Computer Science and Systems Engineering DepartmentUniversity of ZaragozaZaragozaSpain

Personalised recommendations