Abstract
Multilingual retrieval (querying of multiple document collections each in a different language) can be achieved by combining several individual techniques which enhance retrieval: machine translation to cross the language barrier, relevance feedback to add words to the initial query, decompounding for languages with complex term structure, and data fusion to combine monolingual retrieval results from different languages. Using the CLEF 2001 and CLEF 2002 topics and document collections, this paper evaluates these techniques within the context of a monolingual document ranking formula based upon logistic regression. Each individual technique yields improved performance over runs which do not utilize that technique. Moreover the techniques are complementary, in that combining the best techniques outperforms individual technique performance. An approximate but fast document translation using bilingual wordlists created from machine translation systems is presented and evaluated. The fast document translation is as effective as query translation in multilingual retrieval. Furthermore, when fast document translation is combined with query translation in multilingual retrieval, the performance is significantly better than that of query translation or fast document translation.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Ballesteros L and Croft WB (1997) Phrasal translation and query expansion techniques for cross-language information retrieval. In: Proceedings of the SIGIR'97. The ACM Press, New York, pp. 84–91.
Ballesteros L and CroftW(1998) Statistical methods for cross-language information retrieval. In: Grefenstette G, Ed. Cross Language Information Retrieval, Kluwer.
Braschler M, Ripplinger B and Schäuble P (2002) Experiments with the eurospider retrieval system for CLEF 2001. In: Peters C et al, Eds. Evaluation of Cross-Language Information Retrieval Systems. Lecture Notes in Computer Science, Vol. 2406. Springer-Verlag, Berlin, pp. 102–110.
Callan JP, Lu Z and Croft WB (1995) Searching distributed collections with inference networks. In: Proceedings of the ACM-SIGIR. The ACM Press, New York, pp. 21–28.
Chen A (2002a) Multilingual information retrieval using English and Chinese queries. In: Peters C et al, Eds. Evaluation of Cross-Language Information Retrieval Systems. Lecture Notes in Computer Science, Vol. 2406. Springer-Verlag, Berlin, pp. 44–58.
Chen A (2002b) Cross-language retrieval experiments at CLEF 2002. In: Peters C, Ed. Working Notes for the CLEF 2002 Workshop 19-20 Sept., Rome, Italy, pp. 5-20.
Cooper WS, Chen A and Gey FC (1994) Full text retrieval based on probabilistic equations with coefficients fitted by logistic regression. In: Harman DK, Ed. The Second Text REtrieval Conference (TREC-2), pp. 57-64.
Gey FC, Jiang H, Chen A and Larson RR (1999) Manual queries and machine translation in cross-language retrieval and interactive retrieval with Cheshire II at TREC-7. In: Voorhees EM and Harman DK, Eds. The Seventh Text REtrieval Conference (TREC-7). NIST Special Publication 500-24, National Institute of Standards and Technology, Gaithersburg, MD, pp. 527–540.
Gey FC, Jiang H, Petras V and Chen A (2001) Cross-language retrieval for the CLEF collections-Comparing multiple methods of retrieval. In: Peters C, Ed. Evaluation of Cross-Language Information Retrieval Systems. Lecture Notes in Computer Science, Vol. 2069. Springer-Verlag, Berlin, pp. 116–128.
Grefenstette G (1998), Ed. Cross-Language Information Retrieval. Kluwer Academic Publishers, Boston, MA.
Harman D (1992) Relevance feedback and other query modification techniques. In: Frakes W and Baeza-Yates R, Eds. Information Retrieval: Data Structures & Algorithms. Prentice Hall, pp. 241-263.
Hiemstra D, Kraaij W, Pohlmann R and Westerveld T (2001) Translation resources, merging strategies, and relevance feedback for cross-language information retrieval. In: Peters C, Ed. Evaluation of Cross-Language Information Retrieval Systems. Lecture Notes in Computer Science, Vol. 2069. Springer-Verlag, Berlin, pp. 102–115.
Hull D (1993) Using statistical testing in the evaluation of retrieval experiments, In: Proceedings of the SIGIR'93. The ACM Press, New York, pp. 329–338.
Kraaij W (2002) TNO at CLEF 2001: Comparing translation resources. In: Peters C et al., Eds. Evaluation of Cross-Language Information Retrieval Systems. Lecture Notes in Computer Science, Vol. 2406. Springer-Verlag, Berlin, pp. 78–93.
Lezius W, Rapp R and Wettler M (1988) A freely available morphological analyzer, disambiguator and context sensitive Lemmatizer for German. In: COLING-ACL'98, pp. 743-748.
McCarley JS and Roukos S (1998) Fast document translation for cross-language information retrieval. In: Farwell D, Gerber L and Hovy E, Eds. Machine Translation and the Information Soup. Lecture Notes in Computer Science, Vol. 1529. Springer-Verlag, Berlin, pp. 150–157.
McNamee P and Mayfield J (2002) JHU/APL experiments at CLEF: Translation resources and score normalization. In: Peters C et al., Eds. Evaluation of Cross-Language Information Retrieval Systems. Lecture Notes in Computer Science, Vol. 2406. Springer-Verlag, Berlin, pp. 193–208.
Monz C and de Rijke M (2002) Shallow morphological analysis in monolingual information retrieval for Dutch, German, and Italian. In: Peters C et al., Eds. Evaluation of Cross-Language Information Retrieval Systems. Lecture Notes in Computer Science, Vol. 2406, Springer-Verlag, Berlin, pp. 263–277.
Oard DW and Diekema AR (1998) Cross-language information retrieval. Annual Review of Information Science and Technology, 33:223–256.
Oard DW (1998) A comparative study of query and document translation for cross-language information retrieval. In: Farwell D, Gerber L and Hovy E, Eds. Machine Translation and the Information Soup. Lecture Notes in Computer Science, Vol. 1529. Springer-Verlag, Berlin, pp. 472–483.
Peters C (2001), Ed. Evaluation of cross-language information retrieval systems. Lecture Notes in Computer Science, Vol. 2069, Springer-Verlag, Berlin.
Peters C, Braschler M, Gonzalo J and Kluck M (2002a), Eds. Evaluation of cross-language information retrieval systems. Lecture Notes in Computer Science, Vol. 2406, Springer-Verlag, Berlin.
Peters C (2002b), Ed. Working Notes for the CLEF 2002 Workshop 19-20 Sept., Rome, Italy.
Picchi E and Peters C (1998) Cross language information retrieval: A system for comparable corpus querying. In: Grefenstette G, Ed. Cross Language Information Retrieval, Kluwer.
Pirkila A, Hedlund T, Keskustalo H and Jävelin K (2001) Dictionary-based cross-language information retrieval: Problems, methods, and research findings. Information Retrieval, 4:209–230.
Porter M (2001) Snowball: A language for stemming algorithms. Available at http://snowball.tartarus.org/texts/ introduction.html.
Rice JA (1995) Mathematical Statistics and Data Analysis, 2nd edn. Duxbury Press, Belmont, California.
Robertson SE and Sparck Jones K (1976) Relevance weighting of search terms. Journal of the American Society for Information Science, 129-146.
Robertson SE, Walker S and Beaulieu M (2000) Experimentation as a way of life: Okapi at TREC. Information Processing & Management, 36:95–108.
Savoy J (2002a) Report on CLEF 2001 experiments: Effective combined query-translation approach. In: Peters C, Ed. Evaluation of Cross-Language Information Retrieval Systems. Lecture Notes in Computer Science, Vol. 2069. Springer-Verlag, Berlin, 2001, pp. 27–43.
Savoy J (2002b) Report on CLEF 2002 experiments: Combining multiple sources of evidence. In: Peters C, Ed. Working Notes for the CLEF 2002 Workshop 19-20 Sept., Rome, Italy, pp. 31-46.
Voorhees EM and Harman DK (1998), Eds. The Seventh Text Retrieval Conference (TREC-7). NIST.
Voorhees EM and Harman DK (1999), Eds. The Eighth Text Retrieval Conference (TREC-8). NIST.
Yang Y, Carbonell J, Brown R and Frederking R (1998) Translingual information retrieval: Learning from bilingual corpora. Artificial Intelligence, 103:323–345.
Xu J, Weischedel R and Fraser A (2001) TREC-9 cross-lingual retrieval at BBN. In: Voorhees EM and Harman DK, Eds. The Ninth Text Retrieval Conference (TREC-9), NIST Special Publication 500-249, pp. 106-116.
Xu J, Weischedel R and Fraser A (2002) Trec 2001 cross-lingual retrieval at BBN. In: Voorhees EM and Harman DK, Eds. The Tenth Text Retrieval Conference (TREC-2001), NIST Special Publication 500-250, pp. 68-77.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Chen, A., Gey, F.C. Multilingual Information Retrieval Using Machine Translation, Relevance Feedback and Decompounding. Information Retrieval 7, 149–182 (2004). https://doi.org/10.1023/B:INRT.0000009444.89549.90
Issue Date:
DOI: https://doi.org/10.1023/B:INRT.0000009444.89549.90