Skip to main content

Automatic Translating Between Ancient Chinese and Contemporary Chinese with Limited Aligned Corpora

  • Conference paper
  • First Online:
Natural Language Processing and Chinese Computing (NLPCC 2019)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11839))

Abstract

The Chinese language has evolved a lot during the long-term development. Therefore, native speakers now have trouble in reading sentences written in ancient Chinese. In this paper, we propose to build an end-to-end neural model to automatically translate between ancient and contemporary Chinese. However, the existing ancient-contemporary Chinese parallel corpora are not aligned at the sentence level and sentence-aligned corpora are limited, which makes it difficult to train the model. To build the sentence level parallel training data for the model, we propose an unsupervised algorithm that constructs sentence-aligned ancient-contemporary pairs by using the fact that the aligned sentence pair shares many of the tokens. Based on the aligned corpus, we propose an end-to-end neural model with copying mechanism and local attention to translate between ancient and contemporary Chinese. Experiments show that the proposed unsupervised algorithm achieves 99.4% F1 score for sentence alignment, and the translation model achieves 26.95 BLEU from ancient to contemporary, and 36.34 BLEU from contemporary to ancient.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, 7–9 May 2015, Conference Track Proceedings (2015)

    Google Scholar 

  2. Calixto, I., Liu, Q., Campbell, N.: Doubly-attentive decoder for multi-modal neural machine translation. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, Vancouver, 30 July–4 August, Volume 1: Long Papers, pp. 1913–1924 (2017)

    Google Scholar 

  3. Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: EMNLP 2014, pp. 1724–1734 (2014)

    Google Scholar 

  4. Feng, S., Liu, S., Yang, N., Li, M., Zhou, M., Zhu, K.Q.: Improving attention modeling with implicit distortion and fertility for machine translation. In: COLING 2016, pp. 3082–3092 (2016)

    Google Scholar 

  5. Gale, W.A., Church, K.W.: A program for aligning sentences in bilingual corpora. Comput. Linguist. 19, 75–102 (1993)

    Google Scholar 

  6. Gu, J., Lu, Z., Li, H., Li, V.O.K.: Incorporating copying mechanism in sequence-to-sequence learning. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, 7–12 August 2016, Berlin, Volume 1: Long Papers (2016)

    Google Scholar 

  7. Haruno, M., Yamazaki, T.: High-performance bilingual text alignment using statistical and dictionary information. Nat. Lang. Eng. 3(1), 1–14 (1997)

    Article  Google Scholar 

  8. Jean, S., Cho, K., Memisevic, R., Bengio, Y.: On using very large target vocabulary for neural machine translation. In: ACL 2015, pp. 1–10 (2015)

    Google Scholar 

  9. Kalchbrenner, N., Blunsom, P.: Recurrent continuous translation models. In: EMNLP 2013, pp. 1700–1709 (2013)

    Google Scholar 

  10. Lin, J., Sun, X., Ma, S., Su, Q.: Global encoding for abstractive summarization. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, 15–20 July 2018, Volume 2: Short Papers, pp. 163–169 (2018)

    Google Scholar 

  11. Lin, Z., Wang, X.: Chinese ancient-modern sentence alignment. In: Shi, Y., van Albada, G.D., Dongarra, J., Sloot, P.M.A. (eds.) ICCS 2007, Part II. LNCS, vol. 4488, pp. 1178–1185. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-72586-2_164

    Chapter  Google Scholar 

  12. Liu, Y., Wang, N.: Sentence alignment for ancient and modern Chinese parallel corpus. In: Lei, J., Wang, F.L., Deng, H., Miao, D. (eds.) AICI 2012. CCIS, vol. 315, pp. 408–415. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34240-0_54

    Chapter  Google Scholar 

  13. Luong, T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, Lisbon, 17–21 September 2015, pp. 1412–1421 (2015)

    Google Scholar 

  14. Ma, S., Sun, X., Wang, Y., Lin, J.: Bag-of-words as target for neural machine translation. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, 15–20 July 2018, Volume 2: Short Papers, pp. 332–338 (2018)

    Google Scholar 

  15. Meng, F., Lu, Z., Li, H., Liu, Q.: Interactive attention for neural machine translation. In: COLING 2016, pp. 2174–2185 (2016)

    Google Scholar 

  16. Mi, H., Wang, Z., Ittycheriah, A.: Supervised attentions for neural machine translation. In: EMNLP 2016, pp. 2283–2288 (2016)

    Google Scholar 

  17. Resnik, P.: Parallel strands: a preliminary investigation into mining the web for bilingual text. In: Farwell, D., Gerber, L., Hovy, E. (eds.) AMTA 1998. LNCS (LNAI), vol. 1529, pp. 72–82. Springer, Heidelberg (1998). https://doi.org/10.1007/3-540-49478-2_7

    Chapter  Google Scholar 

  18. Resnik, P.: Mining the web for bilingual text. In: Dale, R., Church, K.W. (eds.) ACL. ACL (1999)

    Google Scholar 

  19. See, A., Liu, P.J., Manning, C.D.: Get to the point: summarization with pointer-generator networks. In: Barzilay, R., Kan, M. (eds.) Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, Vancouver, 30 July–4 August, Volume 1: Long Papers, pp. 1073–1083. Association for Computational Linguistics (2017). https://doi.org/10.18653/v1/P17-1099

  20. Su, J., Tan, Z., Xiong, D., Ji, R., Shi, X., Liu, Y.: Lattice-based recurrent neural network encoders for neural machine translation. In: Singh, S.P., Markovitch, S. (eds.) Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 4–9 February 2017, San Francisco, pp. 3302–3308. AAAI Press (2017)

    Google Scholar 

  21. Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems, pp. 3104–3112 (2014)

    Google Scholar 

  22. Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: NIPS 2014, pp. 3104–3112 (2014)

    Google Scholar 

  23. Tjandra, A., Sakti, S., Nakamura, S.: Local monotonic attention mechanism for end-to-end speech recognition. CoRR abs/1705.08091 (2017)

    Google Scholar 

  24. Tu, Z., Lu, Z., Liu, Y., Liu, X., Li, H.: Modeling coverage for neural machine translation. In: ACL 2016 (2016)

    Google Scholar 

  25. Wang, X., Ren, F.: Chinese-Japanese clause alignment. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, pp. 400–412. Springer, Heidelberg (2005). https://doi.org/10.1007/978-3-540-30586-6_43

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qi Su .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, Z., Li, W., Su, Q. (2019). Automatic Translating Between Ancient Chinese and Contemporary Chinese with Limited Aligned Corpora. In: Tang, J., Kan, MY., Zhao, D., Li, S., Zan, H. (eds) Natural Language Processing and Chinese Computing. NLPCC 2019. Lecture Notes in Computer Science(), vol 11839. Springer, Cham. https://doi.org/10.1007/978-3-030-32236-6_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-32236-6_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-32235-9

  • Online ISBN: 978-3-030-32236-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics