Experiments in Term Expansion Using Thesauri in Spanish

  • Ángel F. Zazo
  • Carlos G. Figuerola
  • José L. A. Berrocal
  • Emilio Rodríguez
  • Raquel Gómez
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2785)

Abstract

This paper presents some experiments carried out this year in the Spanish monolingual task at CLEF2002. The objective is to continue our research on term expansion. Last year we presented results regarding stemming. Now, our effort is centred on term expansion using thesauri. Many words that derive from the same stem have a close semantic content. However other words with very different stems also have semantically close senses. In this case, the analysis of the relationships between words in a document collection can be used to construct a thesaurus of related terms. The thesaurus can then be used to expand a term with the best related terms. This paper describes some experiments carried out to study term expansion using association and similarity thesauri.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    Furnas, G. W., Landauer, T.K., Gomez, L.M., Dumais, S.T.: The vocabulary problem in human-system communication. Comunications of the ACM 30 (1987) 964–971. 301CrossRefGoogle Scholar
  2. [2]
    Wolfram, D., Spink, A., Janses, B.J., Saracevic, T.: Vox populi: The public searching of the web. Journal of the American Society for Information Science and Technology 52 (2001) 1073–1074. 301CrossRefGoogle Scholar
  3. [3]
    Xu, J., Croft, W. B.: Corpus-based stemming using cooccurrence of word variants. ACM Transactions on Information Systems 16 (1998) 61–81. 302CrossRefGoogle Scholar
  4. [4]
    Figuerola, C. G., Gómez Díaz, R., Zazo Rodríguez, Á.F., Alonso Berrocal, J. L.: Spanish monolingual track: the impact of stemming on retrieval. In Peters, C., Braschler, M., Gonzalo, J., Kluck, M., eds.: Evaluation of Cross-Language Information Retrieval Systems. Second Workshop of the Cross-Languge Evaluation Forum, CLEF 2001. Darmstadt, Germany, September 2001. Revised Papers. Volume 2406 of Lecture Notes in Computer Science. Springer, Berlin, etc. ISBN: 3-540-44042-9 (2002) 253-261. 302Google Scholar
  5. [5]
    Voorhees, E.: Query expansion using lexical-semantic relations. In Croft, W. B., van Rijsbergen, C., eds.: Proceedings of the 17th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval. Dublin. Ireland, 3-6 July 1994 (Special Issue of the SIGIR Forum), ACM/Springer-Verlag (1994) 61-69. 302Google Scholar
  6. [6]
    Han, C., Fujii, H., Croft, W.: Automatic query expansion for japanese text retrieval. Technical Report UM-CS-1995-011, Department of Computer Science, Lederle Graduate Research Center, University of Massachusetts (1995) On line: ftp://www.cs.umass.edu/pub/techrept/techreport/1995/UM-CS-1995-011%.ps. 302
  7. [7]
    Minker, J., Wilson, G., Zimmerman, B.: An evaluation of query expansion by the addition of clustered terms for a document retrieval system. Information Storage and Retrieval 8 (1972) 329–348. 302CrossRefGoogle Scholar
  8. [8]
    Crouch, C.J., Yang, B.: Experiments in automatic statistical thesaurus construction. [20] 77-88. 302Google Scholar
  9. [9]
    Salton, G., McGill, M.: Introduction to Modern Information Retrieval. McGraw-Hill, New-York (1983). 302, 303MATHGoogle Scholar
  10. [10]
    Qiu, Y., Frei, H. P.: Concept-based query expansion. In Korfhage, R., Ras-mussen, E. M., Willett, P., eds.: Proceedings of the 16th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval. Pittsburgh, PA, USA, June 27 — July 1, 1993, ACM Press (1993) 160-169. 302, 303, 304, 305Google Scholar
  11. [11]
    Jing, Y., Croft, W. B.: An association thesaurus for information retrieval. In: Proceedings of RIAO-94, 4th International Conference “Recherche d’Information Assistee par Ordinateur”, New York, US (1994) 146-160. 302Google Scholar
  12. [12]
    Grefenstette, G.: Use of syntactic context to produce term association lists for text retrieval. [20] 89-97. 302Google Scholar
  13. [13]
    Schutze, H.: Dimensions of meaning. In: Proceedings of Supercomputing’ 92, Minneapolis, 1992. (1992) 787-796. 302Google Scholar
  14. [14]
    Billhardt, H., Borrajo, D., Maojo, V.: A context vector model for information retrieval. Journal of the American Society for Information Science and Technology 53 (2002) 236–249. 302CrossRefGoogle Scholar
  15. [15]
    Peat, H. J., Willet, P.: The limitations of term co-occurrence data for query expansion in document retrieval systems. Journal of the American Society for Information Science 42 (1991) 378–383. 303CrossRefGoogle Scholar
  16. [16]
    Smeaton, A., van Rijsbergen, C.: The retrieval effects of query expansion on a feedback document retrieval system. The Computer Journal 26 (1983) 239–246. 303CrossRefGoogle Scholar
  17. [17]
    van Rijsbergen, C.: Information Retrieval. Second edn. Dept. of Computer Science, University of Glasgow (1979). 303Google Scholar
  18. [18]
    Zazo Rodríguez, Á.F., Figuerola, C. G., Berrocal, J.L.A., Rodríguez, E.: Tesauros de asociación y similitud para la expansión automática de consultas: Algunos resultados experimentales. Technical Report DPTOIA-IT-2002-007, Departamento de Informática y Automática — Universidad de Salamanca (2002) On line: http://www.tejo.usal.es/inftec/2002/DPTOIA-IT-2002-007.pdf. 305
  19. [19]
    Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Information Processing & Management 24 (1988) 513–523. 305CrossRefGoogle Scholar
  20. [20]
    Belkin, N.J., Ingwersen, P., Pejtersen, A.M., eds.: Proceedings of the 15th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval. Copenhagen, Denmark, June 21–24. In Belkin, N. J., Ingwersen, P., Pejtersen, A.M., eds.: Proceedings of the 15th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval. Copenhagen, Denmark, June 21–24, ACM Press (1992). 310Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Ángel F. Zazo
    • 1
  • Carlos G. Figuerola
    • 1
  • José L. A. Berrocal
    • 1
  • Emilio Rodríguez
    • 1
  • Raquel Gómez
    • 1
  1. 1.Grupo de Recuperación Automatizada de la Información (REINA) Dpto. Informática y AutomáticaUniversidad de SalamancaSalamancaSpain

Personalised recommendations