Skip to main content

Training Set Similarity Based Parameter Selection for Statistical Machine Translation

  • Conference paper
  • First Online:
Web and Big Data (APWeb-WAIM 2018)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10987))

  • 1333 Accesses

Abstract

Log-linear model based statistical machine translation systems (SMT) are usually composed of multiple feature functions. Each feature function is assigned a weight as a model parameter. In this paper, we consider that different input source sentences may have discrepant needs for model parameters. To adapt the model to different inputs, we propose a model parameters selection method for log-linear model based SMT systems. The method is mainly based on the characteristics of different feature functions themselves without any assumption on unseen test sets. Experimental results on two language pairs (Zh-En and Ug-Zh) show that our method leads to the improvements up to 2.4 and 2.2 BLEU score respectively, and it also shows the good interpretability of our proposed method.

This work was supported by the National Key Research and Development Program of China (Grant No. 2017YFB1002103) and the National Natural Science Foundation of China (No. 61732005).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The corpora include LDC2002E18, LDC2003E07, LDC2003E14, Hansards portion of LDC2004T07, LDC2004T08 and LDC2005T06.

  2. 2.

    http://ee.dlut.edu.cn/CWMT2017/evaluation_en.html.

References

  1. Che, W., Li, Z., Liu, T.: LTP: a Chinese language technology platform. In: Proceedings of the 23rd International Conference on Computational Linguistics: Demonstrations, pp. 13–16 (2010)

    Google Scholar 

  2. Hui, C., Zhao, H., Song, Y., Lu, B.L.: An empirical study on development set selection strategy for machine translation learning. In: Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR, pp. 67–71 (2010)

    Google Scholar 

  3. Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., et al.: Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, pp. 177–180 (2007)

    Google Scholar 

  4. Li, M., Zhao, Y., Zhang, D., Zhou, M.: Adaptive development data selection for log-linear model in statistical machine translation. In: Proceedings of the 23rd International Conference on Computational Linguistics, pp. 662–670 (2010)

    Google Scholar 

  5. Liu, L., Cao, H., Watanabe, T., Zhao, T., Yu, M., Zhu, C.: Locally training the log-linear model for SMT. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 402–411 (2012)

    Google Scholar 

  6. Lü, Y., Huang, J., Liu, Q.: Improving statistical machine translation performance by training data selection and optimization. EMNLP-CoNLL 2007, 343–350 (2007)

    Google Scholar 

  7. Och, F.J.: Minimum error rate training in statistical machine translation. In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics-Volume 1, pp. 160–167 (2003)

    Google Scholar 

  8. Och, F.J., Ney, H.: Discriminative training and maximum entropy models for statistical machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 295–302 (2002)

    Google Scholar 

  9. Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 311–318 (2002)

    Google Scholar 

  10. Ratcliff, J.W., Metzener, D.E.: Pattern-matching-the Gestalt approach. Dr. Dobbs J. 13(7), 46 (1988)

    Google Scholar 

  11. Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 1715–1725 (2016)

    Google Scholar 

  12. Song, X., Specia, L., Cohn, T.: Data selection for discriminative training in statistical machine translation. In: 17th Annual Conference of the European Association for Machine Translation, EAMT, pp. 45–53 (2014)

    Google Scholar 

  13. Zahran, M.A., Tawfik, A.Y.: Adaptive tuning for statistical machine translation (AdapT). In: Gelbukh, A. (ed.) CICLing 2015. LNCS, vol. 9041, pp. 557–569. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-18111-0_42

    Chapter  Google Scholar 

  14. Zhao, Y., Ji, Y., Xi, N., Huang, S., Chen, J.: Language model weight adaptation based on cross-entropy for statistical machine translation. In: Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation, pp. 20–30 (2011)

    Google Scholar 

  15. Zheng, Z., He, Z., Meng, Y., Yu, H.: Domain adaptation for statistical machine translation in development corpus selection. In: 2010 4th International Universal Communication Symposium (IUCS), pp. 2–7 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ping Jian .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Shi, X., Huang, H., Jian, P., Tang, YK. (2018). Training Set Similarity Based Parameter Selection for Statistical Machine Translation. In: Cai, Y., Ishikawa, Y., Xu, J. (eds) Web and Big Data. APWeb-WAIM 2018. Lecture Notes in Computer Science(), vol 10987. Springer, Cham. https://doi.org/10.1007/978-3-319-96890-2_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-96890-2_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-96889-6

  • Online ISBN: 978-3-319-96890-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics