Abstract
Log-linear model based statistical machine translation systems (SMT) are usually composed of multiple feature functions. Each feature function is assigned a weight as a model parameter. In this paper, we consider that different input source sentences may have discrepant needs for model parameters. To adapt the model to different inputs, we propose a model parameters selection method for log-linear model based SMT systems. The method is mainly based on the characteristics of different feature functions themselves without any assumption on unseen test sets. Experimental results on two language pairs (Zh-En and Ug-Zh) show that our method leads to the improvements up to 2.4 and 2.2 BLEU score respectively, and it also shows the good interpretability of our proposed method.
This work was supported by the National Key Research and Development Program of China (Grant No. 2017YFB1002103) and the National Natural Science Foundation of China (No. 61732005).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The corpora include LDC2002E18, LDC2003E07, LDC2003E14, Hansards portion of LDC2004T07, LDC2004T08 and LDC2005T06.
- 2.
References
Che, W., Li, Z., Liu, T.: LTP: a Chinese language technology platform. In: Proceedings of the 23rd International Conference on Computational Linguistics: Demonstrations, pp. 13–16 (2010)
Hui, C., Zhao, H., Song, Y., Lu, B.L.: An empirical study on development set selection strategy for machine translation learning. In: Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR, pp. 67–71 (2010)
Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., et al.: Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, pp. 177–180 (2007)
Li, M., Zhao, Y., Zhang, D., Zhou, M.: Adaptive development data selection for log-linear model in statistical machine translation. In: Proceedings of the 23rd International Conference on Computational Linguistics, pp. 662–670 (2010)
Liu, L., Cao, H., Watanabe, T., Zhao, T., Yu, M., Zhu, C.: Locally training the log-linear model for SMT. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 402–411 (2012)
Lü, Y., Huang, J., Liu, Q.: Improving statistical machine translation performance by training data selection and optimization. EMNLP-CoNLL 2007, 343–350 (2007)
Och, F.J.: Minimum error rate training in statistical machine translation. In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics-Volume 1, pp. 160–167 (2003)
Och, F.J., Ney, H.: Discriminative training and maximum entropy models for statistical machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 295–302 (2002)
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 311–318 (2002)
Ratcliff, J.W., Metzener, D.E.: Pattern-matching-the Gestalt approach. Dr. Dobbs J. 13(7), 46 (1988)
Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 1715–1725 (2016)
Song, X., Specia, L., Cohn, T.: Data selection for discriminative training in statistical machine translation. In: 17th Annual Conference of the European Association for Machine Translation, EAMT, pp. 45–53 (2014)
Zahran, M.A., Tawfik, A.Y.: Adaptive tuning for statistical machine translation (AdapT). In: Gelbukh, A. (ed.) CICLing 2015. LNCS, vol. 9041, pp. 557–569. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-18111-0_42
Zhao, Y., Ji, Y., Xi, N., Huang, S., Chen, J.: Language model weight adaptation based on cross-entropy for statistical machine translation. In: Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation, pp. 20–30 (2011)
Zheng, Z., He, Z., Meng, Y., Yu, H.: Domain adaptation for statistical machine translation in development corpus selection. In: 2010 4th International Universal Communication Symposium (IUCS), pp. 2–7 (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Shi, X., Huang, H., Jian, P., Tang, YK. (2018). Training Set Similarity Based Parameter Selection for Statistical Machine Translation. In: Cai, Y., Ishikawa, Y., Xu, J. (eds) Web and Big Data. APWeb-WAIM 2018. Lecture Notes in Computer Science(), vol 10987. Springer, Cham. https://doi.org/10.1007/978-3-319-96890-2_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-96890-2_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-96889-6
Online ISBN: 978-3-319-96890-2
eBook Packages: Computer ScienceComputer Science (R0)