Abstract
Domain adaptation consists in adapting Machine Translation (MT) systems designed for one domain to work in another. Multiword expressions generally characterize specific-domains vocabularies. Translating multiword expressions is a challenge for current Statistical Machine Translation (SMT) systems because corpus-based approaches are effective only when large amounts of parallel corpora are available. However, parallel corpora are only available for a limited number of language pairs and domains, and the process of building corpora for several language pairs and domains is time consuming and expensive. This paper describes an experimental evaluation of the impact of using a specialized bilingual lexicon of multiword expressions in order to obtain better domain adaptation for the state of the art statistical machine translation system Moses. Our study concerns the English-French language pair and two kinds of texts: in-domain texts from Europarl (European Parliament Proceedings) and out-of-domain texts from Emea (European Medicines Agency Documents). We introduce three methods to integrate extracted bilingual multiword expressions in Moses. We experimentally show that integrating specialized bilingual lexicons of multiword expressions improve translation quality of Moses for both in-domain and out-of-domain texts.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Sag, Ivan A., Baldwin, T., Bond, F., Copestake, A., Flickinger, D.: Multiword expressions: a pain in the neck for NLP. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 1–15. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45715-1_1
Bungum, L., Gambäck, B.: A survey of domain adaptation in machine translation towards a refinement of domain space. In: Proceedings of the India-Norway Workshop on Web Concepts and Technologies (2011)
CeauÅŸfu, A., Tinsley, J., Zhang, J., Way, A.: Experiments on domain adaptation for patent machine translation in the PLuTO project. In: Proceedings of EAMT (2011)
Mathur, P., Federico, M., Köprü, S., Khadivi, S., Sawaf, H.: Topic adaptation for machine translation of E-commerce content. In: Proceedings of MT Summit XV (2015)
Langlais, P.: Improving a general-purpose statistical translation engine by terminological lexicons. In: Proceedings of COLING: Second International Workshop on Computational Terminology (2002)
Lewis, W.D., Wendt, C., Bullock, D.: Achieving domain specificity in SMT without overt siloing. In: Proceedings of LREC (2010)
Hildebrand, A.S., Eck, M., Vogel, S., Alex, W.: Adaptation of the translation model for statistical machine translation based on information retrieval. In: Proceedings of the EAMT (2005)
Civera, J., Juan, A.: Domain adaptation in statistical machine translation with mixture modelling. In: Proceedings of the Second Workshop on Statistical Machine Translation (2007)
Bertoldi, N., Federico, M.: Domain adaptation for statistical machine translation with monolingual resources. In: Proceedings of the 4th Workshop on Statistical Machine Translation (2009)
Banerjee, P., Du, J., Li, B., Naskar, S.K., Way, A., van Genabith, J.: Combining multi-domain statistical machine translation models using automatic classifiers. In: Proceedings of AMTA (2010)
Daumé III, H., Jagarlamudi, J.: Domain adaptation for machine translation by mining unseen words. In: Proceedings of ACL (2011)
Pecina, P., Toral, A., Way, A., Papa-vassiliou, V., Prokopidis, P., Giagkou, M.: Towards using web-crawled data for domain adaptation in statistical machine translation. In: Proceedings of EAMT (2011)
Wang, W., Macherey, K., Macherey, W., Och, F., Xu, P.: Improved domain adaptation for statistical machine translation. In: Proceedings of AMTA (2012)
Hasler, E., Haddow, B., Koehn, P.: Combining domain and topic adaptation for SMT. In: Proceedings of AMTA (2014)
DeNero, J., Klein, D: The complexity of phrase alignment problems. In: Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies (2008)
Daille, B., Gaussier, E., Langé, J.M.: Towards automatic extraction of monolingual and bilingual terminology. In: Proceedings of the 15th Conference on Computational Linguistics ACL (1994)
Blank, I.: Terminology extraction from parallel technical texts. In: Véronis, J. (ed.) Parallel Text Processing, vol. 13. Springer, Dordrecht (2000). https://doi.org/10.1007/978-94-017-2535-4_12
Barbu, A.M: Simple linguistic methods for improving a word alignment algorithm. In: Proceedings of the 7th International Conference on the Statistical Analysis of Textual Data (2004)
Semmar, N., Servan, C., De Chalendar, G., Le Ny, B., Bouzaglou, J.J.: A hybrid word alignment approach to improve translation lexicons with compound, words and idiomatic expressions. In: Proceedings of the 32nd Translating and the Computer Conference, ASLIB (2010)
Mihalcea, R., Pedersen, T.: An evaluation exercise for word alignment. In: Proceedings of the HLT-NAACL 2003 Workshop on Building and Using Parallel Texts: Data Driven Machine Translation and Beyond (2003)
Besançon, R., De Chalendar, G., Ferret, O., Gara, F., Laib, M., Mesnard, O., Semmar, N.: LIMA: a multilingual framework for linguistic analysis and linguistic resources development and evaluation. In: Proceedings of LREC (2010)
Germann, U.: Yawat: yet another word alignment tool. In: Proceedings of ACL 2008
Bouamor, D., Semmar, N., Zweigenbaum, P.: Identifying bilingual Multiword expressions for statistical machine translation. In: Proceedings of LREC (2012)
Tiedemann, J.: Parallel data, tools and interfaces in OPUS. In: Proceedings of LREC (2012)
Papineni, K., Roukos, S., Ward, T., Zhu, W.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of ACL (2002)
Semmar, N., Zennaki, O., Laib, M.: Improving the performance of an example-based machine translation system using a domain-specific bilingual lexicon. In: Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation, PACLIC (2015)
Semmar, N., Zennaki, O., Laib, M.: Evaluating the impact of using a domain-specific bilingual lexicon on the performance of a hybrid machine translation approach. In: Proceedings of Recent Advances in Natural Language Processing International Conference, RANLP (2015)
Bouamor, D., Semmar, N., Zweigenbaum, P.: Automatic construction of a multiword expressions bilingual lexicon: a statistical machine translation evaluation perspective. In: Proceedings of the 3rd Workshop on Cognitive Aspects of the Lexicon, COLING (2012)
Ren, Z., Lu, Y., Cao, J., Liu, Q., Huang, Y.: Improving statistical machine translation using domain bilingual multiword expressions. In: Proceedings of the Workshop on Multiword Expressions, ACL-IJCNLP (2009)
Fraser, A., Marcu, D.: Measuring word alignment quality for statistical machine translation. Assoc. Comput. Linguist. 33(3), 293–303 (2007)
Acknowledgments
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 700381.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Semmar, N., Laib, M. (2018). Integrating Specialized Bilingual Lexicons of Multiword Expressions for Domain Adaptation in Statistical Machine Translation. In: Hasida, K., Pa, W. (eds) Computational Linguistics. PACLING 2017. Communications in Computer and Information Science, vol 781. Springer, Singapore. https://doi.org/10.1007/978-981-10-8438-6_9
Download citation
DOI: https://doi.org/10.1007/978-981-10-8438-6_9
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-8437-9
Online ISBN: 978-981-10-8438-6
eBook Packages: Computer ScienceComputer Science (R0)