Reranking Hypotheses of Machine-Translated Queries for Cross-Lingual Information Retrieval

  • Shadi SalehEmail author
  • Pavel Pecina
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9822)


Machine Translation (MT) systems employed to translate queries for Cross-Lingual Information Retrieval typically produce a single translation with maximum translation quality. This, however, might not be optimal with respect to retrieval quality and other translation variants might lead to better retrieval results. In this paper, we explore a method using multiple translations produced by an MT system, which are reranked using a supervised machine-learning method trained to directly optimize retrieval quality. We experiment with various types of features and the results obtained on the medical-domain test collection from the CLEF eHealth Lab series show significant improvement of retrieval quality compared to a system using single translation provided by MT.


Machine Translation Mean Average Precision Good Translation Query Translation Retrieval Quality 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This research was supported by the Czech Science Foundation (grant no. P103/12/G084) and the EU H2020 project KConnect (contract no. 644753).


  1. 1.
    Aronson, A.R., Lang, F.M.: An overview of MetaMap: historical perspective and recent advances. J. Am. Med. Inform. Assoc. 17(3), 229–236 (2010)CrossRefGoogle Scholar
  2. 2.
    Choi, S., Choi, J.: Exploring effective information retrieval technique for the medical web documents: SNUMedinfo at CLEFeHealth2014 Task 3. In: Proceedings of the ShARe/CLEF eHealth Evaluation Lab, pp. 167–175 (2014)Google Scholar
  3. 3.
    Crammer, K., Singer, Y.: Ultraconservative online algorithms for multiclass problems. J. Mach. Learn. Res. 3, 951–991 (2003)MathSciNetzbMATHGoogle Scholar
  4. 4.
    Darwish, K., Oard, D.W.: Probabilistic structured query methods. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, pp. 338–344. ACM, New York (2003)Google Scholar
  5. 5.
    Dušek, O., Hajič, J., Hlaváčová, J., Novák, M., Pecina, P., Rosa, R., et al.: Machine translation of medical texts in the Khresmoi project. In: Proceedings of the Ninth Workshop on Statistical Machine Translation, Baltimore, USA, pp. 221–228 (2014)Google Scholar
  6. 6.
    Fujii, A., Ishikawa, T.: Applying machine translation to two-stage cross-language information retrieval. In: White, J.S. (ed.) AMTA 2000. LNCS (LNAI), vol. 1934, pp. 13–24. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  7. 7.
    Goeuriot, L., Kelly, L., Li, W., Palotti, J., Pecina, P., Zuccon, G., Hanbury, A., Jones, G., Mueller, H.: ShARe/CLEF eHealth evaluation lab 2014, Task 3: user-centred health information retrieval. In: Proceedings of CLEF 2014 (2014)Google Scholar
  8. 8.
    Goeuriot, L., Kelly, L., Suominen, H., Hanlen, L., Néváol, A., Grouin, C., Palotti, J., Zuccon, G.: Overview of the CLEF eHealth evaluation lab 2015. In: Mothe, J., Savoy, J., Kamps, J., Pinel-Sauvagnat, K., Jones, G.J.F., SanJuan, E., Cappellato, L., Ferro, N. (eds.) CLEF 2015. LNCS, vol. 9283, pp. 429–443. Springer, Heidelberg (2015)CrossRefGoogle Scholar
  9. 9.
    Herbert, B., Szarvas, G., Gurevych, I.: Combining query translation techniques to improve cross-language information retrieval. In: Clough, P., Foley, C., Gurrin, C., Jones, G.J.F., Kraaij, W., Lee, H., Mudoch, V. (eds.) ECIR 2011. LNCS, vol. 6611, pp. 712–715. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  10. 10.
    Hiemstra, D., de Jong, F.: Disambiguation strategies for cross-language information retrieval. In: Abiteboul, S., Vercoustre, A.-M. (eds.) ECDL 1999. LNCS, vol. 1696, pp. 274–293. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  11. 11.
    Hull, D.: Using statistical testing in the evaluation of retrieval experiments. In: Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Pittsburgh, USA, pp. 329–338 (1993)Google Scholar
  12. 12.
    Hull, D.A.: Using structured queries for disambiguation in cross-language information retrieval. In: AAAI Symposium on Cross-Language Text and Speech Retrieval, California, USA, pp. 84–98 (1997)Google Scholar
  13. 13.
    Humphreys, B.L., Lindberg, D.A.B., Schoolman, H.M., Barnett, G.O.: The unified medical language system. J. Am. Med. Inform. Assoc. 5(1), 1–11 (1998)CrossRefGoogle Scholar
  14. 14.
    Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., et al.: Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics. Demo and Poster Sessions, Czech Republic, Prague, pp. 177–180 (2007)Google Scholar
  15. 15.
    Liu, X., Nie, J.: Bridging layperson’s queries with medical concepts - GRIUM @CLEF2015 eHealth Task 2. In: Working Notes of CLEF 2015 Conference and Labs of the Evaluation forum, Toulouse, France, vol. 1391 (2015)Google Scholar
  16. 16.
    Macdonald, C., Plachouras, V., He, B., Lioma, C., Ounis, I.: University of Glasgow at WebCLEF 2005: experiments in per-field normalisation and language specific stemming. In: Peters, C., et al. (eds.) CLEF 2005. LNCS, vol. 4022, pp. 898–907. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  17. 17.
    McCarley, J.S.: Should we translate the documents or the queries in cross-language information retrieval? In: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics, College Park, Maryland, pp. 208–214 (1999)Google Scholar
  18. 18.
    McCullagh, P., Nelder, J.: Generalized Linear Models. Chapman & Hall/CRC Monographs on Statistics & Applied Probability. Taylor & Francis, New York (1989)Google Scholar
  19. 19.
    Nikoulina, V., Kovachev, B., Lagos, N., Monz, C.: Adaptation of statistical machine translation model for cross-lingual information retrieval in a service context. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, Avignon, France, pp. 109–119 (2012)Google Scholar
  20. 20.
    Oard, D.W.: A comparative study of query and document translation for cross-language information retrieval. In: Farwell, D., Gerber, L., Hovy, E. (eds.) AMTA 1998. LNCS (LNAI), vol. 1529, pp. 472–483. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  21. 21.
    Ounis, I., Amati, G., Plachouras, V., He, B., Macdonald, C., Lioma, C.: Terrier: a high performance and scalable information retrieval platform. In: Proceedings of Workshop on Open Source Information Retrieval, Seattle, WA, USA (2006)Google Scholar
  22. 22.
    Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, Philadelphia, USA, pp. 311–318 (2002)Google Scholar
  23. 23.
    Pecina, P., Dušek, O., Goeuriot, L., Hajič, J., Hlavářová, J., Jones, G.J., et al.: Adaptation of machine translation for multilingual information retrieval in the medical domain. Artif. Intell. Med. 61(3), 165–185 (2014)CrossRefGoogle Scholar
  24. 24.
    Schuyler, P.L., Hole, W.T., Tuttle, M.S., Sherertz, D.D.: The UMLS Metathesaurus: representing different views of biomedical concepts. Bull. Med. Libr. Assoc. 81(2), 217 (1993)Google Scholar
  25. 25.
    Smucker, M.D., Allan, J.: An investigation of Dirichlet prior smoothing’s performance advantage. Technical report, University of Massachusetts (2005)Google Scholar
  26. 26.
    Sokolov, A., Hieber, F., Riezler, S.: Learning to translate queries for CLIR. In: Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval, Gold Coast, Australia, pp. 1179–1182 (2014)Google Scholar
  27. 27.
    Sokolov, A., Jehl, L., Hieber, F., Riezler, S.: Boosting cross-language retrieval by learning bilingual phrase associations from relevance rankings. In: Proceedings of the Conference on Empirical Methods in NLP, Seattle, USA (2013)Google Scholar
  28. 28.
    Tillmann, C., Vogel, S., Ney, H., Zubiaga, A., Sawaf, H.: Accelerated DP based search for statistical translation. In: European Conference on Speech Communication and Technology, Rhodes, Greece, pp. 2667–2670 (1997)Google Scholar
  29. 29.
    Ture, F., Boschee, E.: Learning to translate: a query-specific combination approach for cross-lingual information retrieval. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, Qatar, pp. 589–599 (2014)Google Scholar
  30. 30.
    Ture, F., Lin, J., Oard, D.W.: Looking inside the box: context-sensitive translation for cross-language information retrieval. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, Portland, Oregon, USA, pp. 1105–1106 (2012)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Institute of Formal and Applied Linguistics, Faculty of Mathematics and PhysicsCharles UniversityPragueCzech Republic

Personalised recommendations