Abstract
The problem of finding documents written in a language that the searcher cannot read is perhaps the most challenging application of cross-language information retrieval technology. In interactive applications, that task involves at least two steps: (1) the machine locates promising documents in a collection that is larger than the searcher could scan, and (2) the searcher recognizes documents relevant to their intended use from among those nominated by the machine. This article presents the results of experiments designed to explore three techniques for supporting interactive relevance assessment: (1) full machine translation, (2) rapid term-by-term translation, and (3) focused phrase translation. Machine translation was found to better support this task than term-by-term translation, and focused phrase translation further improved recall without an adverse effect on precision. The article concludes with an assessment of the strengths and weaknesses of the evaluation framework used in this study and some remarks on implications of these results for future evaluation campaigns.
Article PDF
Similar content being viewed by others
References
Capstick J, Diagne AK, Erbach G, Uzkoreit H, Leisenberg A and Leisenberg M (1999) A system for supporting cross-lingual information retrieval. Information Processing and Management, 36(2):275–289.
Cleveland DB and Cleveland AD (2000) Introduction to Indexing and Abstracting, 3rd edn. Libraries Unlimited, Englewood, CO.
Hearst MA (1999) User interfaces and visualization. In: Baeza-Yates R and Ribeiro-Neto B, Eds. Modern Information Retrieval. Addison Wesley, New York, Chapt. 10.
Hersh W, Turpin A, Price S, Chan B, Kraemer D, Sacherek L and Olson D (1998) Do batch and user evaluations give the same results? In: Proceedings of the 23nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 17-24.
Lagergren E and Over P (1998) Comparing interactive information retrieval systems across sites: The TREC-6 interactive track matrix experiment. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.
López-Ostenero F, Gonzalo J, Peñas A and Verdejo F (2001) Noun phrase translations for cross-language document selection. In: Evaluation of Cross-Language Information Retrieval Systems, Second Workshop of the Cross-Language Evaluation Forum, CLEF-2001. Revised papers. Springer-Verlag, LNCS 2406.
Michos S, Stamatatos E and Fakotakis N (1999) Supporting multilinguality in library automation systems using AI tools. Applied Artificial Intelligence.
Oard DW and Diekema AR (1998) Cross-language information retrieval. In: Annual Review of Information Science and Technology, Vol. 33, American Society for Information Science.
Oard DW, Levow G-A and Cabezas CI (2001) CLEF experiments at Maryland: Statistical stemming and backoff translation. In: Peters C, Ed. Proceedings of the First Cross-Language Evaluation Forum. Cross-Language Information Retrieval and Evaluation. Springer-Verlag, LNCS 2069.
Oard DW and Resnik P (1999) Support for interactive document selection in cross-language information retrieval. Information Processing and Management, 35(3):363–379.
Ogden W, Cowie J, Davis M, Ludovik E, Molina-Salgado H and Shin H (1999) Getting information from documents you cannot read: An interactive cross-language text retrieval and summarization system. In: Joint ACM DL/SIGIR Workshop on Multilingual Information Discovery and Access.
OgdenWC and Davis MW (2000) Improving cross-language text retrieval with human interactions. In: Proceedings of the 33rd Hawaii International Conference on System Sciences.
Peñas A, Gonzalo J and Verdejo F (2001) Cross-language information access through phrase browsing. In: Applications of Natural Language to Information Systems, pp. 121-130.
Pinheiro J and Bates D (2000) Mixed-Effects Models in S and S-PLUS. Springer.
Resnik P (1997) Evaluating multilingual gisting of Web pages. In: AAAI Symposium on Cross-Language Text and Speech Retrieval.
Sanderson M (1998) Accurate user-directed summarization from existing tools. In: Proceedings of the 7th International Conference on Information and Knowledge Management.
Sanderson M and Bathie Z (2001) iCLEF at Sheffield. In: Evaluation of Cross-Language Information Retrieval Systems. Second Workshop of the Cross-Language Evaluation Forum, CLEF-2001. Revised papers. Springer-Verlag, LNCS 2406.
Suzuki M, Inoue N and Hashimoto K (2001) A method for supporting document selection in cross-language information retrieval and its evaluation. Computers and the Humanities, 35(4):421–438.
Taylor K and White J (1998) Predicting what MT is good for: User judgments and task performance. In: Farwell D, Gerber L and Hovy E, Eds. Third Conference of the Association for Machine Translation in the Americas, Springer. Lecture Notes in Artificial Intelligence 1529, pp. 364-373.
van Rijsbergen CJ (1979) Information Retrieval, 2nd edn. Butterworths, London.
Voorhees E (1998) Variations in relevance judgments and the measurement of retrieval effectiveness. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.
Wang J and Oard DW (2001) iCLEF 2001 at Maryland: Comparting term-for-term and gloss translations. In: Evaluation of Cross-Language Information Retrieval Systems. Second Workshop of the Cross-Language Evaluation Forum, CLEF-2001. Revised papers. Springer-Verlag, LNCS 2406.
White JS and Taylor KB (1998) A task-oriented evaluation metric for machine translation. In: First International Conference on Language Resources and Evaluation, pp. 21-25.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Oard, D.W., Gonzalo, J., Sanderson, M. et al. Interactive Cross-Language Document Selection. Information Retrieval 7, 205–228 (2004). https://doi.org/10.1023/B:INRT.0000009446.22036.e3
Issue Date:
DOI: https://doi.org/10.1023/B:INRT.0000009446.22036.e3