Bengali and Hindi to English CLIR Evaluation

  • Debasis Mandal
  • Mayank Gupta
  • Sandipan Dandapat
  • Pratyush Banerjee
  • Sudeshna Sarkar
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5152)

Abstract

This paper presents a cross-language retrieval system for the retrieval of English documents in response to queries in Bengali and Hindi, as part of our participation in CLEF 2007 Ad-hoc bilingual track. We followed the dictionary-based Machine Translation approach to generate the equivalent English query out of Indian language topics. Our main challenge was to work with a limited coverage dictionary (of coverage ~ 20%) that was available for Hindi-English, and virtually non-existent dictionary for Bengali-English. So we depended mostly on a phonetic transliteration system to overcome this. The CLEF results point to the need for a rich bilingual lexicon, a translation disambiguator, Named Entity Recognizer and a better transliterator for CLIR involving Indian languages. The best MAP values for Bengali and Hindi CLIR for our experiment were 7.26% and 4.77%, which are 20% and 13% of our best monolingual retrieval, respectively.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Hull, D., Grefenstette, G.: Querying across languages: A dictionary-based approach to multilingual information retrieval. In: Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Zurich, Switzerland, pp. 49–57 (1996)Google Scholar
  2. 2.
    Diekema, A.R.: Translation Events in Cross-Language Information Retrieval. ACM SIGIR Forum 38(1) (2004)Google Scholar
  3. 3.
    Bertoldi, N., Federico, M.: Statistical Models for Monolingual and Bilingual Information Retrieval. Information Retrieval 7, 53–72 (2004)CrossRefGoogle Scholar
  4. 4.
    Monz, C., Dorr, B.: Iterative Translation Disambiguation for Cross-Language Information Retrieval. In: SIGIR 2005, Salvador, Brazil, pp. 520–527 (2005)Google Scholar
  5. 5.
    Mandal, D., Dandapat, S., Gupta, M., Banerjee, P., Sarkar, S.: Bengali and Hindi to English Cross-language Text Retrieval under Limited Resources. In: Nardi, A., Peters, C. (eds.) Working Notes for the CLEF 2007 Workshop (2007)Google Scholar
  6. 6.
    Larkey, L.S., Connell, M.E., Abduljaleel, N.: Hindi CLIR in Thirty Days. ACM Transactions on Asian Language Information Processing (TALIP) 2(2), 130–142 (2003)CrossRefGoogle Scholar
  7. 7.
    Oard, D.W.: The surprise language exercises. ACM Transactions on Asian Language Information Processing (TALIP) 2(2), 79–84 (2003)CrossRefGoogle Scholar
  8. 8.
    Xu, J., Weischedel, R.: Cross-Lingual Retrieval for Hindi. ACM Transactions on Asian Language Information Processing (TALIP) 2(1), 164–168 (2003)CrossRefGoogle Scholar
  9. 9.
    Allan, J., Lavrenko, V., Connell, M.E.: A Month to Topic Detection and Tracking in Hindi. ACM Transactions on Asian Language Processing (TALIP) 2(2), 85–100 (2003)CrossRefGoogle Scholar
  10. 10.
    Pingali, P., Tune, K.K., Varma, V.: Hindi, Telugu, Oromo, English CLIR Evaluation. In: Peters, C., Clough, P., Gey, F.C., Karlgren, J., Magnini, B., Oard, D.W., de Rijke, M., Stempfhuber, M. (eds.) CLEF 2006. LNCS, vol. 4730. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  11. 11.
    Chinnakotla, M.K., Ranadive, S., Bhattacharyya, P., Damani, O.P.: Hindi and Marathi to English Cross Language Information Retrieval at CLEF 2007. In: Nardi, A., Peters, C. (eds.) Working Notes for the CLEF 2007 Workshop (2007)Google Scholar
  12. 12.
    Jagarlamudi, J., Kumaran, A.: Cross-Lingual Information Retrieval System for Indian Languages. In: Nardi, A., Peters, C. (eds.) Working Notes for the CLEF 2007 Workshop (2007)Google Scholar
  13. 13.
    Bandyopadhyay, S., Mondal, T., Naskar, S.K., Ekbal, A., Haque, R., Godavarthy, S.R.: Bengali, Hindi and Telugu to English Ad-hoc Bilingual task at CLEF 2007. In: Nardi, A., Peters, C. (eds.) Working Notes for the CLEF 2007 Workshop (2007)Google Scholar
  14. 14.
    Pingali, P., Jagarlamudi, J., Varma, V.: Webkhoj: Indian language IR from Multiple Character Encodings. In: International World Wide Web Conference (2006)Google Scholar
  15. 15.
    Pingali, P., Varma, V.: IIIT Hyderabad at CLEF 2007 Adhoc Indian Language CLIR task. In: Nardi, A., Peters, C. (eds.) Working Notes for the CLEF 2007 Workshop (2007)Google Scholar
  16. 16.
    Pingali, P., Varma, V.: Multilingual Indexing Support for CLIR using Language Modeling. In: Bulletin of the IEEE Computer Society Technical Committee on Data Engineering (2007)Google Scholar
  17. 17.
    Clough, P., Sanderson, M.: Measuring Pseudo Relevance Feedback & CLIR. In: SIGIR 2004, UK (2004)Google Scholar
  18. 18.
    Nunzio, G.M.D., Ferro, N., Mandl, T., Peters, C.: CLEF 2007: Ad-Hoc Track Overview. In: Nardi, A., Peters, C. (eds.) Working Notes for the CLEF 2007 Workshop (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Debasis Mandal
    • 1
  • Mayank Gupta
    • 1
  • Sandipan Dandapat
    • 1
  • Pratyush Banerjee
    • 1
  • Sudeshna Sarkar
    • 1
  1. 1.Department of Computer Science and EngineeringIIT KharagpurIndia

Personalised recommendations