Exploiting Multiple Translation Resources for English-Persian Cross Language Information Retrieval

  • Hosein Azarbonyad
  • Azadeh Shakery
  • Heshaam Faili
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8138)


One of the most important issues in Cross Language Information Retrieval (CLIR) which affects the performance of CLIR systems is how to exploit available translation resources. This issue can be more challenging when dealing with a language that lacks appropriate translation resources. Another factor that affects the performance of a CLIR system is the degree of ambiguity of query words. In this paper, we propose to combine different translation resources for CLIR. We also propose two different methods that exploit phrases in the query translation process to solve the problem of ambiguousness of query words. Our evaluation results on English-Persian CLIR show the superiority of phrase based and combinational translation CLIR methods over other CLIR methods.


Cross Language Information Retrieval English-Persian CLIR Phrase Based Query Translation Combining Translation Resources for CLIR 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    AleAhmad, A., Amiri, H., Darrudi, E., Rahgozar, M., Oroumchian, F.: Hamshahri: A standard persian text collection. Know.-Based Syst. 22(5), 382–387 (2009)CrossRefGoogle Scholar
  2. 2.
    Brown, P.F., Pietra, V.J.D., Pietra, S.A.D., Mercer, R.L.: The mathematics of statistical machine translation: parameter estimation. Comput. Linguist. 19(2), 263–311 (1993)Google Scholar
  3. 3.
    Hashemi, H.B.: Using Comparable Corpora for English-Persian Cross-Language Information Retrieval. Master’s thesis, University of Tehran, Tehran, Iran (2011)Google Scholar
  4. 4.
    Baradaran Hashemi, H., Shakery, A., Faili, H.: Creating a persian-english comparable corpus. In: Agosti, M., Ferro, N., Peters, C., de Rijke, M., Smeaton, A. (eds.) CLEF 2010. LNCS, vol. 6360, pp. 27–39. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  5. 5.
    Koehn, P.: Pharaoh: A beam search decoder for phrase-based statistical machine translation models. In: Frederking, R.E., Taylor, K.B. (eds.) AMTA 2004. LNCS (LNAI), vol. 3265, pp. 115–124. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  6. 6.
    Nie, J.Y.: Cross-Language Information Retrieval. Synthesis Lectures on Human Language Technologies. Morgan & Claypool Publishers (2010)Google Scholar
  7. 7.
    Nie, J.Y., Isabelle, P., Plamondon, P., Foster, G.: Using a probabilistic translation model for cross-language information retrieval. In: 6th Workshop on Very Large Corpora, pp. 18–27 (1998)Google Scholar
  8. 8.
    Pilevar, M.T., Faili, H., Pilevar, A.H.: TEP: Tehran english-persian parallel corpus. In: Gelbukh, A. (ed.) CICLing 2011, Part II. LNCS, vol. 6609, pp. 68–79. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  9. 9.
    Talvensaari, T., Pirkola, A., Järvelin, K., Juhola, M., Laurikkala, J.: Focused web crawling in the acquisition of comparable corpora. Inf. Retr. 11(5), 427–445 (2008)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Hosein Azarbonyad
    • 1
  • Azadeh Shakery
    • 1
  • Heshaam Faili
    • 1
  1. 1.School of Electrical and Computer Engineering, College of EngineeringUniversity of TehranTehranIran

Personalised recommendations