Information Retrieval

, Volume 9, Issue 1, pp 71–93 | Cite as

A merging strategy proposal: The 2-step retrieval status value method

  • Fernando Martínez-Santiago
  • L. Alfonso Ureña-López
  • Maite Martín-Valdivia
Article

Abstract

A usual strategy to implement CLIR (Cross-Language Information Retrieval) systems is the so-called query translation approach. The user query is translated for each language present in the multilingual collection in order to compute an independent monolingual information retrieval process per language. Thus, this approach divides documents according to language. In this way, we obtain as many different collections as languages. After searching in these corpora and obtaining a result list per language, we must merge them in order to provide a single list of retrieved articles.

In this paper, we propose an approach to obtain a single list of relevant documents for CLIR systems driven by query translation. This approach, which we call 2-step RSV (RSV: Retrieval Status Value), is based on the re-indexing of the retrieval documents according to the query vocabulary, and it performs noticeably better than traditional methods.

The proposed method requires query vocabulary alignment: given a word for a given query, we must know the translation or translations to the other languages. Because this is not always possible, we have researched on a mixed model. This mixed model is applied in order to deal with queries with partial word-level alignment. The results prove that even in this scenario, 2-step RSV performs better than traditional merging methods.

Keywords

CLIR Merging strategies Pseudo-relevance feedback 2-step RSV Mixed 2-step RSV 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Airio E, Keskustalo H, Hedlund1 T and Pirkola A (2003) UTACLIR @ CLEF 2002—Bilingual and Multilingual Runs with a Unified Process. In C Peters, M Braschler, J Gonzalo, and M Kluck, (Eds.), Advances in Cross-Language Information Retrieval, Third Workshop of the Cross-Language Evaluation Forum, CLEF 2002. Rome, Italy, September 19-20, 2002. Revised Papers, vol. 2785 of Lecture Notes in Computer Science, pp. 91–100. Springer Verlag.Google Scholar
  2. Callan JP, Lu Z and Croft WB (1995) Searching distributed collections with inference networks. In Proceedings of the 18th International Conference of the ACM SIGIR'95, pp. 21–28, New York. The ACM Press.Google Scholar
  3. Calvé A and Savoy J (2000) Database merging strategy based on logistic regression, Information Processing & Management, 36:341–359.Google Scholar
  4. Chen A (2003) Cross-language retrieval experiments at CLEF-2002, In C Peters, M Braschler, J Gonzalo, and M Kluck, (Eds.), Advances in Cross-Language Information Retrieval, Third Workshop of the Cross-Language Evaluation Forum, CLEF 2002. Rome, Italy, September 19–20, 2002. Revised Papers, vol. 2785 of Lecture Notes in Computer Science, pp. 26–48. Springer Verlag.Google Scholar
  5. Dumais S (1994) Latent Semantic Indexing (LSI) and TREC-2, In Proceedings of TREC'2, volume 500-215, pp. 105–115, Gaithersburg. NIST, D. K. Harman.Google Scholar
  6. Gey F, Jiang H, Chen A and Larson R (2000) Manual Queries and Machine Translation in Cross-Language Retrieval and Interactive Retrieval with Cheshire II at TREC-7. In EM Voorhees and DK Harman (Eds.), Proceedings of the Seventh Text REtrieval Conference (TREC-7), vol. 500-242, pp. 527–540. NIST.Google Scholar
  7. Grefenstette G, ed. (1998) Cross-Language Information Retrieval, Kluwer academic publishers, Boston, USA.Google Scholar
  8. Harman DK (1992) Relevance feedback revisited. In NJ Belkin, P Ingwersen, and AM Pejtersen (Eds.), Proceedings of the 15th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-92), pp. 1–10. ACM.Google Scholar
  9. Martín M, Martínez-Santiago F and Ureña L (2003) Aprendizaje neuronal aplicado a la fusión de colecciones multilingües en CLIR, Procesamiento del Lenguaje Natural, 1(31):227–234.Google Scholar
  10. Martínez-Santiago F, Martín M and Ureña L (2003) SINAI at CLEF 2002: Experiments with merging strategies. In C Peters, M Braschler, J Gonzalo, and M Kluck (Eds.), Advances in Cross-Language Information Retrieval, Third Workshop of the Cross-Language Evaluation Forum, CLEF 2002. Rome, Italy, September 19-20, 2002. Revised Papers, vol. 2785 of Lecture Notes in Computer Science, pp. 103–110.Google Scholar
  11. Martínez-Santiago F, Montejo-Ráez A, Ureña L and Diaz M (2004) SINAI at CLEF 2003: Merging and decompounding. Advances in Cross-Language Information Retrieval. Lecture Notes in Computer Science. Springer Verlag, pp. 192–200.Google Scholar
  12. McNamee P and Mayfield J (2002) JHU/APL Experiments at CLEF: Translation resources and score normalization. In C Peters, M Braschler, J Gonzalo, and M Kluck, (Eds.), Evaluation of Cross-Language Information Retrieval Systems, Second Workshop of the Cross-Language Evaluation Forum, CLEF 2001, Darmstadt, Germany, September 3-4, 2001, Revised Papers, volume 2406 of Lecture Notes in Computer Science, pp. 193–208. Springer Verlag.Google Scholar
  13. Moulinier I and Molina-Salgado H (2003) Thomson Legal and Regulatory experiments for CLEF 2002. In C Peters, M Braschler, J Gonzalo, and M Kluck (Eds.), Advances in Cross-Language Information Retrieval, Third Workshop of the Cross-Language Evaluation Forum, CLEF 2002. Rome, Italy, September 19-20, 2002. Revised Papers, volume 2785 of Lecture Notes in Computer Science, pp. 155–163. Springer Verlag.Google Scholar
  14. Nie J and Jin F (2002) Merging different languages in a single document collection. In C Peters, M Braschler, J Gonzalo, and M Kluck (Eds.), Evaluation of Cross-Language Information Retrieval Systems, Second Workshop of the Cross-Language Evaluation Forum, CLEF 2001, Darmstadt, Germany, September 3-4, 2001, Revised Papers, volume 2406 of Lecture Notes in Computer Science, pp. 59–62. Springer Verlag.Google Scholar
  15. Pirkola A (1998) The efects of query structure and dictionary setups in dictionarybased cross-language information retrieval. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Melbourne, Australia.Google Scholar
  16. Powell AL, French JC, Callan J, Connell M and Viles CL (2000) The impact of database selection on distributed searching. In Press TA (ed.), Proceedings of the 23rd International Conference of the ACM-SIGIR'2000, pp. 232–239, New York.Google Scholar
  17. Robertson SE, Walker S and Beaulieu M (2000) Experimentation as a way of life: Okapi at TREC, Information Processing and Management, 1(36):95–108.Google Scholar
  18. Savoy J (2002) Report on CLEF-2001 Experiments In C Peters, M Braschler, J Gonzalo and M Kluck (Eds.), Evaluation of Cross-Language Information Retrieval Systems, Second Workshop of the Cross-Language Evaluation Forum, CLEF 2001, Darmstadt, Germany, September 3–4, 2001, Revised Papers, vol. 2406 of Lecture Notes in Computer Science, pp. 27–43. Springer Verlag.Google Scholar
  19. Savoy J (2003a) Cross-Language information retrieval: Experiments based on CLEF 2000 corpora, Information Processing & Management, 39:75–115.CrossRefGoogle Scholar
  20. Savoy J (2003b) Report on CLEF-2002 Experiments: Combining Multiple Sources of Evidence, In C Peters, M Braschler, J Gonzalo, and M Kluck (Eds.), Advances in Cross-Language Information Retrieval, Third Workshop of the Cross-Language Evaluation Forum, CLEF 2002. Rome, Italy, September 19–20, 2002. Revised Papers, vol. 2785 of Lecture Notes in Computer Science, pp. 31–46. Springer Verlag.Google Scholar
  21. Savoy J (2004) Combining multiple strategies for effective cross-language retrieval, Information Retrieval, 7(1/2):121–148.CrossRefGoogle Scholar
  22. Sheridan P, Braschler P and Schäuble P (1997) Cross-Language information retrieval in a multilingual legal domain, In Proceedings of the First European Conference on Research and Advanced Technology for Digital Libraries, pp. 253–268.Google Scholar
  23. Sperer R and Oard DW (2000) Structured translation for cross-language information retrieval. In Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, pp. 120–127. ACM Press.Google Scholar
  24. Towell G, Voorhees EM, Gupta NK and Johnson-Laird B (1995) Learning collection fusion strategies for information retrieval. In Proceedings of the Twelfth Annual Machine Learning Conference, Lake Tahoe.Google Scholar
  25. Voorhees E, Gupta NK and Johnson-Laird B (1995a) The collection fusion problem. In Harman, D. K., (Ed.), Proceedings of the 3th Text Retrieval Conference TREC-3, vol. 500–225, pp. 95–104, Gaithersburg. National Institute of Standards and Technology, Special Publication.Google Scholar
  26. Voorhees E, Gupta NK and Johnson-Laird B (1995b) Learning collection fusion strategies. In ACM, editor, Proceedings of the Eighteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 172–179, Seattle.Google Scholar

Copyright information

© Springer Science + Business Media, Inc. 2006

Authors and Affiliations

  • Fernando Martínez-Santiago
    • 1
  • L. Alfonso Ureña-López
    • 1
  • Maite Martín-Valdivia
    • 1
  1. 1.Department of Computer ScienceUniversity of JaénJaénSpain

Personalised recommendations