Advertisement

Abstract

At present, the research on Tibetan machine translation is mainly focused on Tibetan-Chinese machine translation and the research on Chinese-Tibetan machine translation is almost blank. In this paper, the neural machine translation model is applied to the Chinese-Tibetan machine translation task for the first time, the syntax tree is also introduced into the Chinese-Tibetan neural machine translation model for the first time, and a good translation effect is achieved. Besides, the preprocessing methods we use are syllable segmentation on Tibetan corpus and character segmentation on Chinese Corpus, which has a better performance than the word segmentation on both Chinese and Tibetan corpus. The experimental results show that performance of the neural network translation model based on the completely self-attention mechanism is the best in the Chinese-Tibetan machine translation task and the BLEU score is increased by one percentage point.

Keywords

Neural machine translation Tibetan Syntactic tree Attention 

Notes

Acknowledgement

This work is supported by the National Science Foundation of China (61331013).

References

  1. Liu, Y.: Recent advances in neural machine translation. J. Comput. Res. Dev. 54(6), 1144–1149 (2017). (in Chinese)Google Scholar
  2. Zhao, T.: Machine Translation Theory. Harbin Institute of Technology Press (1900). (in Chinese)Google Scholar
  3. Hinton, G.E., Osindero, S., Teh, Y.-W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)MathSciNetCrossRefGoogle Scholar
  4. Wei, S.: Research on Tibetan - Chinese online translation system based on phrases. Dissertation, Northwest University for Nationalities (2015). (in Chinese)Google Scholar
  5. Cai, Z., Cai, R.: Research on the distribution of tibetan character forms. J. Chin. Inf. Process. 30(4), 98–105 (2016). (in Chinese)MathSciNetGoogle Scholar
  6. Dong, X., Cao, H., Jiang, T.: Phrase based Tibetan - Chinese statistical machine translation system. Technol. Wind 17, 60–61 (2012). (in Chinese)Google Scholar
  7. Luo, X.: Research on syntax-based Chinese-Tibetan statistical machine translation system. Dissertation, Xiamen University (2010). (in Chinese)Google Scholar
  8. Hua, Q.: Research on some key technologies of machine translation based on tree-to-string in tibetan language. Dissertation, Shanxi Normal University (2014a). (in Chinese)Google Scholar
  9. Cai, R.: Research on large-scale Sino-Tibetan bilingual corpus construction for natural language processing. J. Chin. Inf. Process. 25(6), 157–162 (2011)Google Scholar
  10. Pang, W.: Research on the construction technology of Tibetan-Chinese bilingual corpus of corpus based on web. Dissertation, Minzu University of China (2015). (in Chinese)Google Scholar
  11. Xiang, B., Zhang, G.: Research on the translation of han names in Chinese-Tibetan machine translation. J. Qinghai Normal Univ. (Nat. Sci.) 27(4), 88–90 (2011)Google Scholar
  12. Hua, G.: Tibetan verb researching in Chinese Tibetan machine translation. Dissertation, Qinghai Normal University (2014b). (in Chinese)Google Scholar
  13. Nuo, M., Wu, J., Liu, H., Ding, Z.: Research on phrase translation extraction for Chinese-Tibetan machine translation. J. Chin. Inf. Process. 25(3), 112–118 (2011)Google Scholar
  14. Li, Y., Xiong, D., Zhang, M., Jiang, J., Ma, N., Yin, J.: Research on Tibetan-Chinese neural machine translation. J. Chin. Inf. Process. 31(6), 103–109 (2017a)Google Scholar
  15. Guan, Q.: Research on Tibetan segmentation for machine translation. Electron. Test 11x, 46–48 (2015)Google Scholar
  16. Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems (2014)Google Scholar
  17. Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. Comput. Sci. (2014)Google Scholar
  18. D’Informatique, D.E., Ese, N., Esent, P., et al.: Long short-term memory in recurrent neural networks. EPFL 9(8), 1735–1780 (2001)Google Scholar
  19. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. Comput. Sci. (2014)Google Scholar
  20. Gehring, J., Auli, M., Grangier, D., et al.: Convolutional Sequence to Sequence Learning (2017)Google Scholar
  21. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems (2017)Google Scholar
  22. Xue, Y., Li, S., Zhao, T., Yang, M.: Syntax-based reordering model for phrasal statistical machine translation. J. Commun. Test 29(1), 7–14 (2008)Google Scholar
  23. Xiong, D., Liu, Q., Lin, S.: A survey of syntax-based statistic machine translation. J. Chin. Inf. Process. 22(2), 28–39 (2008)Google Scholar
  24. Chen, H., et al.: Improved neural machine translation with a syntax-aware encoder and decoder, pp. 1936–1945 (2017)Google Scholar
  25. Eriguchi, A., Tsuruoka, Y., Cho, K.: Learning to Parse and Translate Improves Neural Machine Translation (2017)Google Scholar
  26. Li, J., et al.: Modeling Source Syntax for Neural Machine Translation, pp. 688–697 (2017b)Google Scholar
  27. Aharoni, R., Goldberg, Y.: Towards String-to-Tree Neural Machine Translation (2017)Google Scholar
  28. Xiao, T., et al.: NiuTrans: an open source toolkit for phrase-based and syntax-based machine translation. In: Proceedings of the ACL 2012 System Demonstrations. Association for Computational Linguistics (2012)Google Scholar
  29. Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. arXiv preprint arXiv:1508.07909 (2015)
  30. Papineni, K., et al.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics (2002)Google Scholar
  31. Li, Y., et al.: TIP-LAS: an open source toolkit for Tibetan word segmentation and POS tagging. J. Chin. Inf. Process. 29(6), 203–207 (2015)CrossRefGoogle Scholar
  32. Li, Z., Sun, M.: Punctuation as implicit annotations for chinese word segmentation. Comput. Linguist. 35(4), 505–512 (2009)CrossRefGoogle Scholar
  33. Petrov, S., et al.: Learning accurate, compact, and interpretable tree annotation. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics (2006)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.National Language Resource Monitoring and Research Center of Minority LanguagesMinzu University of ChinaBeijingChina

Personalised recommendations