Ranking Multilingual Documents Using Minimal Language Dependent Resources

  • G. S. K. Santosh
  • N. Kiran Kumar
  • Vasudeva Varma
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6609)

Abstract

This paper proposes an approach of extracting simple and effective features that enhances multilingual document ranking (MLDR). There is limited prior research on capturing the concept of multilingual document similarity in determining the ranking of documents. However, the literature available has worked heavily with language specific tools, making them hard to reimplement for other languages. Our approach extracts various multilingual and monolingual similarity features using a basic language resource (bilingual dictionary). No language-specific tools are used, hence making this approach extensible for other languages. We used the datasets provided by Forum for Information Retrieval Evaluation (FIRE) for their 2010 Adhoc Cross-Lingual document retrieval task on Indian languages. Experiments have been performed with different ranking algorithms and their results are compared. The results obtained showcase the effectiveness of the features considered in enhancing multilingual document ranking.

Keywords

Multilingual Document Ranking Feature Engineering Wikipedia Levenshtein Edit Distance 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Savoy, J., Calve, A.L., Vrajitoru, D.: Report on the TREC-5 experiment: Data fusion and Collection fusion. In: The Fifth Text Retrieval Conference (TREC-5), pp. 489–502 (1997)Google Scholar
  2. 2.
    Martinez-Santiago, F., Urena-Lopez, L., Martin-Valdiva, M.: A merging strategy proposal: The 2-step retrieval status value method. In: Information Retrieval, pp. 71–93 (2006)Google Scholar
  3. 3.
    Powell, A., French, J., Callan, J., Connell, M., Viles, C.: The impact of Database Selection on Distributed Searching. In: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 232–239. ACM, New York (2000)Google Scholar
  4. 4.
    Lin, W., Chen, H.: Merging Mechanisms in Multilingual Information Retrieval. In: Peters, C., Braschler, M., Gonzalo, J., Kluck, M. (eds.) CLEF 2002. LNCS, vol. 2785, pp. 175–186. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  5. 5.
    Tsai, M., Wang, Y., Chen, H.: A Study of Learning a Merge Model for Multilingual Information Retrieval. In: Proceedings of SIGIR 2008, pp. 195–202. ACM, New York (2008)Google Scholar
  6. 6.
    Gao, W., Niu, C., Zhou, M., Wong, K.-F.: Joint ranking for multilingual web search. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds.) ECIR 2009. LNCS, vol. 5478, pp. 114–125. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  7. 7.
    Savoy, J., Berger, P.-Y.: Selection and merging strategies for multilingual information retrieval. In: Peters, C., Clough, P., Gonzalo, J., Jones, G.J.F., Kluck, M., Magnini, B. (eds.) CLEF 2004. LNCS, vol. 3491, pp. 27–37. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  8. 8.
    Wo, J., Si, L., Nyberg, E., Mitamura, T.: Probabilistic Models for Answer-Ranking in Multilingual Question-Answering. ACM Transactions on Information Systems (2010)Google Scholar
  9. 9.
    Huang, A.: Similarity measures for Text Document Clustering. In: Proceedings of New Zealand Computer Science Research Student Conference, pp. 49–56 (2008)Google Scholar
  10. 10.
    Wu, F., Weld, D.: Autonomously semantifying Wikipedia. In: Proceedings of Sixteenth CIKM, CIKM 2007. ACM, New York (2007)Google Scholar
  11. 11.
    Ganesh, S., Harsha, S., Pingali, P., Varma, V.: Statistical Transliteration for Cross Language Information Retrieval using HMM alignment model and CRF. In: 2nd International Workshop on CLIA, 3rd International Joint Conference on Natural Language Processing (IJCNLP 2008) (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • G. S. K. Santosh
    • 1
  • N. Kiran Kumar
    • 1
  • Vasudeva Varma
    • 1
  1. 1.International Institute of Information TechnologyHyderabadIndia

Personalised recommendations