Advertisement

Zero-Shot Language Transfer for Cross-Lingual Sentence Retrieval Using Bidirectional Attention Model

  • Goran GlavašEmail author
  • Ivan Vulić
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11437)

Abstract

We present a neural architecture for cross-lingual mate sentence retrieval which encodes sentences in a joint multilingual space and learns to distinguish true translation pairs from semantically related sentences across languages. The proposed model combines a recurrent sequence encoder with a bidirectional attention layer and an intra-sentence attention mechanism. This way the final fixed-size sentence representations in each training sentence pair depend on the selection of contextualized token representations from the other sentence. The representations of both sentences are then combined using the bilinear product function to predict the relevance score. We show that, coupled with a shared multilingual word embedding space, the proposed model strongly outperforms unsupervised cross-lingual ranking functions, and that further boosts can be achieved by combining the two approaches. Most importantly, we demonstrate the model’s effectiveness in zero-shot language transfer settings: our multilingual framework boosts cross-lingual sentence retrieval performance for unseen language pairs without any training examples. This enables robust cross-lingual sentence retrieval also for pairs of resource-lean languages, without any parallel data.

Keywords

Cross-lingual retrieval Language transfer Bidirectional attention model Sentence retrieval 

Notes

Acknowledgments

The work described in this paper has been partially supported by the “Eliteprogramm für Postdoktorandinnen und Postdoktoranden” of the Baden Württemberg Stiftung, within the scope of the AGREE (Algebraic Reasoning over Events from Text and External Knowledge) grant. We thank the anonymous reviewers for their useful comments.

References

  1. 1.
    Agirre, E., et al.: Semeval-2016 task 1: semantic textual similarity, monolingual and cross-lingual evaluation. In: SemEval, pp. 497–511. ACL (2016)Google Scholar
  2. 2.
    Artetxe, M., Labaka, G., Agirre, E.: Learning bilingual word embeddings with (almost) no bilingual data. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 451–462. Association for Computational Linguistics, Vancouver, July 2017. http://aclweb.org/anthology/P17-1042
  3. 3.
    Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: International Conference on Learning Representations (2014)Google Scholar
  4. 4.
    Ballesteros, L., Croft, B.: Dictionary methods for cross-lingual information retrieval. In: Wagner, R.R., Thoma, H. (eds.) DEXA 1996. LNCS, vol. 1134, pp. 791–801. Springer, Heidelberg (1996).  https://doi.org/10.1007/BFb0034731CrossRefGoogle Scholar
  5. 5.
    Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. ACL 5, 135–146 (2017). http://arxiv.org/abs/1607.04606Google Scholar
  6. 6.
    Brychcín, T., Svoboda, L.: UWB at semeval-2016 task 1: semantic textual similarity using lexical, syntactic, and semantic information. In: SemEval, pp. 588–594. ACL (2016)Google Scholar
  7. 7.
    Cer, D., Diab, M., Agirre, E., Lopez-Gazpio, I., Specia, L.: Semeval-2017 task 1: semantic textual similarity multilingual and crosslingual focused evaluation. In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pp. 1–14. Association for Computational Linguistics, Vancouver, August 2017. http://www.aclweb.org/anthology/S17-2001
  8. 8.
    Conneau, A., Lample, G., Ranzato, M., Denoyer, L., Jégou, H.: Word translation without parallel data. arXiv preprint arXiv:1710.04087 (2017)
  9. 9.
    Dumais, S.T., Letsche, T.A., Littman, M.L., Landauer, T.K.: Automatic cross-language retrieval using latent semantic indexing. In: AAAI Spring Symposium on Cross-language Text and Speech Retrieval, vol. 15, p. 21 (1997)Google Scholar
  10. 10.
    Franco-Salvador, M., Rosso, P., Navigli, R.: A knowledge-based representation for cross-language document retrieval and categorization. In: Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, pp. 414–423 (2014)Google Scholar
  11. 11.
    Glavaš, G., Ponzetto, S.P.: Dual tensor model for detecting asymmetric lexico-semantic relations. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 1757–1767. Association for Computational Linguistics, Copenhagen, September 2017. https://www.aclweb.org/anthology/D17-1185
  12. 12.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  13. 13.
    Jelinek, F., Mercer, R.: Interpolated estimation of markov source parameters from sparse data. In: Proceedings of Workshop on Pattern Recognition in Practice 1980, pp. 381–402 (1980)Google Scholar
  14. 14.
    Jimenez, S.: Sergiojimenez at semeval-2016 Task 1: effectively combining paraphrase database, string matching, WordNet, and word embedding for semantic textual similarity. In: SemEval, pp. 749–757. ACL (2016)Google Scholar
  15. 15.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of ICLR (Conference Track) (2015). https://arxiv.org/abs/1412.6980
  16. 16.
    Koehn, P.: Europarl: a parallel corpus for statistical machine translation. In: Proceedings of the 10th Machine Translation Summit, pp. 79–86 (2005)Google Scholar
  17. 17.
    Levow, G.A., Oard, D.W., Resnik, P.: Dictionary-based techniques for cross-language information retrieval. Inf. Process. Manage. 41(3), 523–547 (2005)CrossRefGoogle Scholar
  18. 18.
    Lin, Z., et al.: A structured self-attentive sentence embedding. In: Proceedings of the International Conference on Learning Representations (2017)Google Scholar
  19. 19.
    Litschko, R., Glavaš, G., Ponzetto, S.P., Vulić, I.: Unsupervised cross-lingual information retrieval using monolingual data only. arXiv preprint arXiv:1805.00879 (2018)
  20. 20.
    Liu, Y., Sun, C., Lin, L., Wang, X.: Learning natural language inference using bidirectional LSTM model and inner-attention. arXiv preprint arXiv:1605.09090 (2016)
  21. 21.
    Losada, D.E.: Statistical query expansion for sentence retrieval and its effects on weak and strong queries. Inf. Retrieval 13(5), 485–506 (2010)CrossRefGoogle Scholar
  22. 22.
    Luong, T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1412–1421. Association for Computational Linguistics, Lisbon, September 2015. http://aclweb.org/anthology/D15-1166
  23. 23.
    Martino, G.D.S., et al.: Cross-language question re-ranking. In: Proceedings of SIGIR, pp. 1145–1148 (2017)Google Scholar
  24. 24.
    Mikolov, T., Le, Q.V., Sutskever, I.: Exploiting similarities among languages for machine translation. arXiv preprint arXiv:1309.4168 (2013)
  25. 25.
    Mimno, D., Wallach, H.M., Naradowsky, J., Smith, D.A., McCallum, A.: Polylingual topic models. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, vol. 2, pp. 880–889. Association for Computational Linguistics (2009)Google Scholar
  26. 26.
    Moro, A., Raganato, A., Navigli, R.: Entity linking meets word sense disambiguation: a unified approach. Trans. Assoc. Comput. Linguist. 2, 231–244 (2014)CrossRefGoogle Scholar
  27. 27.
    Munteanu, D.S., Marcu, D.: Improving machine translation performance by exploiting non-parallel corpora. Comput. Linguist. 31(4), 477–504 (2005)CrossRefGoogle Scholar
  28. 28.
    Murdock, V., Croft, W.B.: A translation model for sentence retrieval. In: Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 684–691. Association for Computational Linguistics (2005)Google Scholar
  29. 29.
    Navigli, R., Ponzetto, S.P.: Babelnet: the automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artif. Intell. 193, 217–250 (2012)MathSciNetCrossRefGoogle Scholar
  30. 30.
    Nie, J.Y., Simard, M., Isabelle, P., Durand, R.: Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the web. In: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, pp. 74–81. ACM (1999)Google Scholar
  31. 31.
    Oard, D.W.: A comparative study of query and document translation for cross-language information retrieval. In: Farwell, D., Gerber, L., Hovy, E. (eds.) AMTA 1998. LNCS (LNAI), vol. 1529, pp. 472–483. Springer, Heidelberg (1998).  https://doi.org/10.1007/3-540-49478-2_42CrossRefGoogle Scholar
  32. 32.
    Otterbacher, J., Erkan, G., Radev, D.R.: Using random walks for question-focused sentence retrieval. In: Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 915–922. Association for Computational Linguistics (2005)Google Scholar
  33. 33.
    Pappu, A., Blanco, R., Mehdad, Y., Stent, A., Thadani, K.: Lightweight multilingual entity extraction and linking. In: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, pp. 365–374. ACM (2017)Google Scholar
  34. 34.
    Pirkola, A.: The effects of query structure and dictionary setups in dictionary-based cross-language information retrieval. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 55–63. ACM (1998)Google Scholar
  35. 35.
    Platt, J.C., Toutanova, K., Yih, W.t.: Translingual document representations from discriminative projections. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 251–261. Association for Computational Linguistics (2010)Google Scholar
  36. 36.
    Ponte, J.M., Croft, W.B.: A language modeling approach to information retrieval. In: SIGIR, pp. 275–281. ACM (1998)Google Scholar
  37. 37.
    Rauf, S.A., Schwenk, H.: Parallel sentence generation from comparable corpora for improved SMT. Mach. Transl. 25(4), 341–375 (2011)CrossRefGoogle Scholar
  38. 38.
    Resnik, P., Smith, N.A.: The web as a parallel corpus. Comput. Linguist. 29(3), 349–380 (2003)CrossRefGoogle Scholar
  39. 39.
    Robertson, S., Zaragoza, H., et al.: The probabilistic relevance framework: Bm25 and beyond. Found. Trends® Inf. Retrieval 3(4), 333–389 (2009)CrossRefGoogle Scholar
  40. 40.
    Ruder, S., Vulić, I., Søgaard, A.: A survey of cross-lingual word embedding models (2017). arXiv preprint arXiv:1706.04902
  41. 41.
    Smith, J.R., Quirk, C., Toutanova, K.: Extracting parallel sentences from comparable corpora using document level alignment. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 403–411. Association for Computational Linguistics (2010)Google Scholar
  42. 42.
    Smith, J.R., Saint-Amand, H., Plamada, M., Koehn, P., Callison-Burch, C., Lopez, A.: Dirt cheap web-scale parallel text from the common crawl. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 1374–1383 (2013)Google Scholar
  43. 43.
    Smith, S.L., Turban, D.H., Hamblin, S., Hammerla, N.Y.: Offline bilingual word vectors, orthogonal transformations and the inverted softmax. In: Proceedings of International Conference on Learning Representations (ICLR 2017, Conference Track) (2017)Google Scholar
  44. 44.
    Socher, R., Chen, D., Manning, C.D., Ng, A.: Reasoning with neural tensor networks for knowledge base completion. In: Proceedings of the 2013 Annual Conference on Neural Information Processing Systems, pp. 926–934 (2013)Google Scholar
  45. 45.
    Song, F., Croft, W.B.: A general language model for information retrieval. In: Proceedings of the Eighth International Conference on Information and Knowledge Management, pp. 316–321. ACM (1999)Google Scholar
  46. 46.
    Sorg, P., Cimiano, P.: Exploiting wikipedia for cross-lingual and multilingual information retrieval. Data Knowl. Eng. 74, 26–45 (2012)CrossRefGoogle Scholar
  47. 47.
    Tian, J., Zhou, Z., Lan, M., Wu, Y.: Ecnu at semeval-2017 task 1: Leverage kernel-based traditional nlp features and neural networks to build a universal model for multilingual and cross-lingual semantic textual similarity. In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pp. 191–197 (2017)Google Scholar
  48. 48.
    Vulić, I., De Smet, W., Moens, M.F.: Cross-language information retrieval models based on latent topic models trained with document-aligned comparable corpora. Inf. Retrieval 16(3), 331–368 (2013)CrossRefGoogle Scholar
  49. 49.
    Vulić, I., Moens, M.F.: Monolingual and cross-lingual information retrieval models based on (bilingual) word embeddings. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 363–372. ACM (2015)Google Scholar
  50. 50.
    Yang, B., Yih, W.t., He, X., Gao, J., Deng, L.: Embedding entities and relations for learning and inference in knowledge bases. In: Proceedings of the 2015 International Conference on Learning Representations (2015)Google Scholar
  51. 51.
    Yih, W.T., Toutanova, K., Platt, J.C., Meek, C.: Learning discriminative projections for text similarity measures. In: Proceedings of the Fifteenth Conference on Computational Natural Language Learning, pp. 247–256. Association for Computational Linguistics (2011)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Data and Web Science GroupUniversity of MannheimMannheimGermany
  2. 2.Language Technology LabUniversity of CambridgeCambridgeUK

Personalised recommendations