Advertisement

Integrating Specialized Bilingual Lexicons of Multiword Expressions for Domain Adaptation in Statistical Machine Translation

Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 781)

Abstract

Domain adaptation consists in adapting Machine Translation (MT) systems designed for one domain to work in another. Multiword expressions generally characterize specific-domains vocabularies. Translating multiword expressions is a challenge for current Statistical Machine Translation (SMT) systems because corpus-based approaches are effective only when large amounts of parallel corpora are available. However, parallel corpora are only available for a limited number of language pairs and domains, and the process of building corpora for several language pairs and domains is time consuming and expensive. This paper describes an experimental evaluation of the impact of using a specialized bilingual lexicon of multiword expressions in order to obtain better domain adaptation for the state of the art statistical machine translation system Moses. Our study concerns the English-French language pair and two kinds of texts: in-domain texts from Europarl (European Parliament Proceedings) and out-of-domain texts from Emea (European Medicines Agency Documents). We introduce three methods to integrate extracted bilingual multiword expressions in Moses. We experimentally show that integrating specialized bilingual lexicons of multiword expressions improve translation quality of Moses for both in-domain and out-of-domain texts.

Keywords

Statistical machine translation Domain adaptation Bilingual lexicon Multiword expression 

Notes

Acknowledgments

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 700381.

References

  1. 1.
    Sag, Ivan A., Baldwin, T., Bond, F., Copestake, A., Flickinger, D.: Multiword expressions: a pain in the neck for NLP. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 1–15. Springer, Heidelberg (2002).  https://doi.org/10.1007/3-540-45715-1_1 CrossRefGoogle Scholar
  2. 2.
    Bungum, L., Gambäck, B.: A survey of domain adaptation in machine translation towards a refinement of domain space. In: Proceedings of the India-Norway Workshop on Web Concepts and Technologies (2011)Google Scholar
  3. 3.
    Ceauşfu, A., Tinsley, J., Zhang, J., Way, A.: Experiments on domain adaptation for patent machine translation in the PLuTO project. In: Proceedings of EAMT (2011)Google Scholar
  4. 4.
    Mathur, P., Federico, M., Köprü, S., Khadivi, S., Sawaf, H.: Topic adaptation for machine translation of E-commerce content. In: Proceedings of MT Summit XV (2015)Google Scholar
  5. 5.
    Langlais, P.: Improving a general-purpose statistical translation engine by terminological lexicons. In: Proceedings of COLING: Second International Workshop on Computational Terminology (2002)Google Scholar
  6. 6.
    Lewis, W.D., Wendt, C., Bullock, D.: Achieving domain specificity in SMT without overt siloing. In: Proceedings of LREC (2010)Google Scholar
  7. 7.
    Hildebrand, A.S., Eck, M., Vogel, S., Alex, W.: Adaptation of the translation model for statistical machine translation based on information retrieval. In: Proceedings of the EAMT (2005)Google Scholar
  8. 8.
    Civera, J., Juan, A.: Domain adaptation in statistical machine translation with mixture modelling. In: Proceedings of the Second Workshop on Statistical Machine Translation (2007)Google Scholar
  9. 9.
    Bertoldi, N., Federico, M.: Domain adaptation for statistical machine translation with monolingual resources. In: Proceedings of the 4th Workshop on Statistical Machine Translation (2009)Google Scholar
  10. 10.
    Banerjee, P., Du, J., Li, B., Naskar, S.K., Way, A., van Genabith, J.: Combining multi-domain statistical machine translation models using automatic classifiers. In: Proceedings of AMTA (2010)Google Scholar
  11. 11.
    Daumé III, H., Jagarlamudi, J.: Domain adaptation for machine translation by mining unseen words. In: Proceedings of ACL (2011)Google Scholar
  12. 12.
    Pecina, P., Toral, A., Way, A., Papa-vassiliou, V., Prokopidis, P., Giagkou, M.: Towards using web-crawled data for domain adaptation in statistical machine translation. In: Proceedings of EAMT (2011)Google Scholar
  13. 13.
    Wang, W., Macherey, K., Macherey, W., Och, F., Xu, P.: Improved domain adaptation for statistical machine translation. In: Proceedings of AMTA (2012)Google Scholar
  14. 14.
    Hasler, E., Haddow, B., Koehn, P.: Combining domain and topic adaptation for SMT. In: Proceedings of AMTA (2014)Google Scholar
  15. 15.
    DeNero, J., Klein, D: The complexity of phrase alignment problems. In: Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies (2008)Google Scholar
  16. 16.
    Daille, B., Gaussier, E., Langé, J.M.: Towards automatic extraction of monolingual and bilingual terminology. In: Proceedings of the 15th Conference on Computational Linguistics ACL (1994)Google Scholar
  17. 17.
    Blank, I.: Terminology extraction from parallel technical texts. In: Véronis, J. (ed.) Parallel Text Processing, vol. 13. Springer, Dordrecht (2000).  https://doi.org/10.1007/978-94-017-2535-4_12 CrossRefGoogle Scholar
  18. 18.
    Barbu, A.M: Simple linguistic methods for improving a word alignment algorithm. In: Proceedings of the 7th International Conference on the Statistical Analysis of Textual Data (2004)Google Scholar
  19. 19.
    Semmar, N., Servan, C., De Chalendar, G., Le Ny, B., Bouzaglou, J.J.: A hybrid word alignment approach to improve translation lexicons with compound, words and idiomatic expressions. In: Proceedings of the 32nd Translating and the Computer Conference, ASLIB (2010)Google Scholar
  20. 20.
    Mihalcea, R., Pedersen, T.: An evaluation exercise for word alignment. In: Proceedings of the HLT-NAACL 2003 Workshop on Building and Using Parallel Texts: Data Driven Machine Translation and Beyond (2003)Google Scholar
  21. 21.
    Besançon, R., De Chalendar, G., Ferret, O., Gara, F., Laib, M., Mesnard, O., Semmar, N.: LIMA: a multilingual framework for linguistic analysis and linguistic resources development and evaluation. In: Proceedings of LREC (2010)Google Scholar
  22. 22.
    Germann, U.: Yawat: yet another word alignment tool. In: Proceedings of ACL 2008Google Scholar
  23. 23.
    Bouamor, D., Semmar, N., Zweigenbaum, P.: Identifying bilingual Multiword expressions for statistical machine translation. In: Proceedings of LREC (2012)Google Scholar
  24. 24.
    Tiedemann, J.: Parallel data, tools and interfaces in OPUS. In: Proceedings of LREC (2012)Google Scholar
  25. 25.
    Papineni, K., Roukos, S., Ward, T., Zhu, W.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of ACL (2002)Google Scholar
  26. 26.
    Semmar, N., Zennaki, O., Laib, M.: Improving the performance of an example-based machine translation system using a domain-specific bilingual lexicon. In: Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation, PACLIC (2015)Google Scholar
  27. 27.
    Semmar, N., Zennaki, O., Laib, M.: Evaluating the impact of using a domain-specific bilingual lexicon on the performance of a hybrid machine translation approach. In: Proceedings of Recent Advances in Natural Language Processing International Conference, RANLP (2015)Google Scholar
  28. 28.
    Bouamor, D., Semmar, N., Zweigenbaum, P.: Automatic construction of a multiword expressions bilingual lexicon: a statistical machine translation evaluation perspective. In: Proceedings of the 3rd Workshop on Cognitive Aspects of the Lexicon, COLING (2012)Google Scholar
  29. 29.
    Ren, Z., Lu, Y., Cao, J., Liu, Q., Huang, Y.: Improving statistical machine translation using domain bilingual multiword expressions. In: Proceedings of the Workshop on Multiword Expressions, ACL-IJCNLP (2009)Google Scholar
  30. 30.
    Fraser, A., Marcu, D.: Measuring word alignment quality for statistical machine translation. Assoc. Comput. Linguist. 33(3), 293–303 (2007)MathSciNetCrossRefMATHGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2018

Authors and Affiliations

  1. 1.CEA, LIST, Vision and Content Engineering LaboratoryGif-sur-YvetteFrance

Personalised recommendations