Training Set Similarity Based Parameter Selection for Statistical Machine Translation

Shi, Xuewen; Huang, Heyan; Jian, Ping; Tang, Yi-Kun

doi:10.1007/978-3-319-96890-2_6

Xuewen Shi¹⁶,
Heyan Huang¹⁶,
Ping Jian¹⁶ &
…
Yi-Kun Tang¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10987))

Included in the following conference series:

Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint International Conference on Web and Big Data

1333 Accesses

Abstract

Log-linear model based statistical machine translation systems (SMT) are usually composed of multiple feature functions. Each feature function is assigned a weight as a model parameter. In this paper, we consider that different input source sentences may have discrepant needs for model parameters. To adapt the model to different inputs, we propose a model parameters selection method for log-linear model based SMT systems. The method is mainly based on the characteristics of different feature functions themselves without any assumption on unseen test sets. Experimental results on two language pairs (Zh-En and Ug-Zh) show that our method leads to the improvements up to 2.4 and 2.2 BLEU score respectively, and it also shows the good interpretability of our proposed method.

This work was supported by the National Key Research and Development Program of China (Grant No. 2017YFB1002103) and the National Natural Science Foundation of China (No. 61732005).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The corpora include LDC2002E18, LDC2003E07, LDC2003E14, Hansards portion of LDC2004T07, LDC2004T08 and LDC2005T06.
2.
http://ee.dlut.edu.cn/CWMT2017/evaluation_en.html.

References

Che, W., Li, Z., Liu, T.: LTP: a Chinese language technology platform. In: Proceedings of the 23rd International Conference on Computational Linguistics: Demonstrations, pp. 13–16 (2010)
Google Scholar
Hui, C., Zhao, H., Song, Y., Lu, B.L.: An empirical study on development set selection strategy for machine translation learning. In: Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR, pp. 67–71 (2010)
Google Scholar
Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., et al.: Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, pp. 177–180 (2007)
Google Scholar
Li, M., Zhao, Y., Zhang, D., Zhou, M.: Adaptive development data selection for log-linear model in statistical machine translation. In: Proceedings of the 23rd International Conference on Computational Linguistics, pp. 662–670 (2010)
Google Scholar
Liu, L., Cao, H., Watanabe, T., Zhao, T., Yu, M., Zhu, C.: Locally training the log-linear model for SMT. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 402–411 (2012)
Google Scholar
Lü, Y., Huang, J., Liu, Q.: Improving statistical machine translation performance by training data selection and optimization. EMNLP-CoNLL 2007, 343–350 (2007)
Google Scholar
Och, F.J.: Minimum error rate training in statistical machine translation. In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics-Volume 1, pp. 160–167 (2003)
Google Scholar
Och, F.J., Ney, H.: Discriminative training and maximum entropy models for statistical machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 295–302 (2002)
Google Scholar
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 311–318 (2002)
Google Scholar
Ratcliff, J.W., Metzener, D.E.: Pattern-matching-the Gestalt approach. Dr. Dobbs J. 13(7), 46 (1988)
Google Scholar
Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 1715–1725 (2016)
Google Scholar
Song, X., Specia, L., Cohn, T.: Data selection for discriminative training in statistical machine translation. In: 17th Annual Conference of the European Association for Machine Translation, EAMT, pp. 45–53 (2014)
Google Scholar
Zahran, M.A., Tawfik, A.Y.: Adaptive tuning for statistical machine translation (AdapT). In: Gelbukh, A. (ed.) CICLing 2015. LNCS, vol. 9041, pp. 557–569. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-18111-0_42
Chapter Google Scholar
Zhao, Y., Ji, Y., Xi, N., Huang, S., Chen, J.: Language model weight adaptation based on cross-entropy for statistical machine translation. In: Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation, pp. 20–30 (2011)
Google Scholar
Zheng, Z., He, Z., Meng, Y., Yu, H.: Domain adaptation for statistical machine translation in development corpus selection. In: 2010 4th International Universal Communication Symposium (IUCS), pp. 2–7 (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Beijing Engineering Research Center of High Volume Language Information Processing and Cloud Computing Applications, School of Computer Science and Technology, Beijing Institute of Technology, Beijing, 100081, People’s Republic of China
Xuewen Shi, Heyan Huang, Ping Jian & Yi-Kun Tang

Authors

Xuewen Shi
View author publications
You can also search for this author in PubMed Google Scholar
Heyan Huang
View author publications
You can also search for this author in PubMed Google Scholar
Ping Jian
View author publications
You can also search for this author in PubMed Google Scholar
Yi-Kun Tang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ping Jian .

Editor information

Editors and Affiliations

South China University of Technology, Guangzhou, China
Yi Cai
Nagoya University, Nagoya, Japan
Yoshiharu Ishikawa
Hong Kong Baptist University, Kowloon Tong, Hong Kong, China
Jianliang Xu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shi, X., Huang, H., Jian, P., Tang, YK. (2018). Training Set Similarity Based Parameter Selection for Statistical Machine Translation. In: Cai, Y., Ishikawa, Y., Xu, J. (eds) Web and Big Data. APWeb-WAIM 2018. Lecture Notes in Computer Science(), vol 10987. Springer, Cham. https://doi.org/10.1007/978-3-319-96890-2_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-96890-2_6
Published: 19 July 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-96889-6
Online ISBN: 978-3-319-96890-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics