Using Thesauri in Cross-Language Retrieval of German and French Indexed Collections
For CLEF 2002, Berkeley’s Group One experimented with Russian, French and English as query languages, and investigated thesaurus-aided retrieval for the special CLEF collections GIRT and Amaryllis. Two techniques were used to locate source language topic terms within the controlled vocabulary and replace them with the document language thesaurus terms to form the query sent against the collection index. This form of controlled vocabulary-aided translation is called thesaurus matching. Results show that thesaurus-aided cross-language retrieval performs slightly worse than machine translation retrieval on average, but can yield decidedly better results for particular queries. In addition, Berkeley submitted runs to the monolingual and bilingual (French and German) CLEF main tasks. We found that bilingual retrieval sometimes outperforms monolingual retrieval and postulate reasons to explain this phenomenon.
KeywordsMachine Translation Defense Advance Research Project Agency Stop Word List Fuzzy Match Query Topic
Unable to display preview. Download preview PDF.
- A. Chen. Multilingual Information Retrieval using English and Chinese Queries. In C. Peters, M. Braschler, J. Gonzalo, and M. Kluck, editors, Evaluation of Cross-Language Information Retrieval Systems Second Workshop of the Cross-Language Evaluation Forum, CLEF 2001, Darmstadt, Germany, September 3–4, 2001, pages 44-58. Springer Computer Science Series LNCS 2406, 2001. 355Google Scholar
- W. Cooper, A. Chen, and F. Gey. Full text retrieval based on probabilistic equations with coefficients fitted by logistic regression. In D.K. Harman, editor, The Second Text REtrieval Conference (TREC-2), pages 57-66, March 1994. 350Google Scholar
- H. Schott (ed.). Thesaurus for the Social Sciences. [Vol. 1:] German-English. [Vol. 2:] English-German. [Edition] 1999. InformationsZentrum Sozialwissenschaften Bonn, 2000. 353Google Scholar
- H. Jiang F. Gey and N. Perelman. Working with Russian queries for the GIRT, bilingual and multilingual CLEF tasks. In Carol Peters, Martin Braschler, Julio Gonzalo, and Michael Kluck, editors, Cross Language Retrieval Evaluation, Proceedings of the CLEF 2001 Workshop, pages 235-243. Springer Computer Science Series LNCS 2406, 2002. 353Google Scholar
- W. Hersh, S. Price, and L. Donohoe. Assessing Thesaurus-Based Query Expansion Using the UMLS Metathesaurus. In Proceedings of the 2000 American Medical Informatics Association (AMIA) Symposium, 2000. 349Google Scholar
- M. Kluck and F. Gey. The domain-specific task of CLEF — specific evaluation strategies in Cross-language Information Retrieval. In Cross Language Retrieval Evaluation, Proceedings of the CLEF 2000 Workshop, pages 48-56. Springer Computer Science Series LNCS 2069, 2001. 353Google Scholar