Automatic Translating Between Ancient Chinese and Contemporary Chinese with Limited Aligned Corpora

Zhang, Zhiyuan; Li, Wei; Su, Qi

doi:10.1007/978-3-030-32236-6_13

Zhiyuan Zhang¹³,
Wei Li¹³ &
Qi Su¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11839))

Included in the following conference series:

CCF International Conference on Natural Language Processing and Chinese Computing

4990 Accesses
6 Citations
3 Altmetric

Abstract

The Chinese language has evolved a lot during the long-term development. Therefore, native speakers now have trouble in reading sentences written in ancient Chinese. In this paper, we propose to build an end-to-end neural model to automatically translate between ancient and contemporary Chinese. However, the existing ancient-contemporary Chinese parallel corpora are not aligned at the sentence level and sentence-aligned corpora are limited, which makes it difficult to train the model. To build the sentence level parallel training data for the model, we propose an unsupervised algorithm that constructs sentence-aligned ancient-contemporary pairs by using the fact that the aligned sentence pair shares many of the tokens. Based on the aligned corpus, we propose an end-to-end neural model with copying mechanism and local attention to translate between ancient and contemporary Chinese. Experiments show that the proposed unsupervised algorithm achieves 99.4% F1 score for sentence alignment, and the translation model achieves 26.95 BLEU from ancient to contemporary, and 36.34 BLEU from contemporary to ancient.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, 7–9 May 2015, Conference Track Proceedings (2015)
Google Scholar
Calixto, I., Liu, Q., Campbell, N.: Doubly-attentive decoder for multi-modal neural machine translation. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, Vancouver, 30 July–4 August, Volume 1: Long Papers, pp. 1913–1924 (2017)
Google Scholar
Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: EMNLP 2014, pp. 1724–1734 (2014)
Google Scholar
Feng, S., Liu, S., Yang, N., Li, M., Zhou, M., Zhu, K.Q.: Improving attention modeling with implicit distortion and fertility for machine translation. In: COLING 2016, pp. 3082–3092 (2016)
Google Scholar
Gale, W.A., Church, K.W.: A program for aligning sentences in bilingual corpora. Comput. Linguist. 19, 75–102 (1993)
Google Scholar
Gu, J., Lu, Z., Li, H., Li, V.O.K.: Incorporating copying mechanism in sequence-to-sequence learning. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, 7–12 August 2016, Berlin, Volume 1: Long Papers (2016)
Google Scholar
Haruno, M., Yamazaki, T.: High-performance bilingual text alignment using statistical and dictionary information. Nat. Lang. Eng. 3(1), 1–14 (1997)
Article Google Scholar
Jean, S., Cho, K., Memisevic, R., Bengio, Y.: On using very large target vocabulary for neural machine translation. In: ACL 2015, pp. 1–10 (2015)
Google Scholar
Kalchbrenner, N., Blunsom, P.: Recurrent continuous translation models. In: EMNLP 2013, pp. 1700–1709 (2013)
Google Scholar
Lin, J., Sun, X., Ma, S., Su, Q.: Global encoding for abstractive summarization. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, 15–20 July 2018, Volume 2: Short Papers, pp. 163–169 (2018)
Google Scholar
Lin, Z., Wang, X.: Chinese ancient-modern sentence alignment. In: Shi, Y., van Albada, G.D., Dongarra, J., Sloot, P.M.A. (eds.) ICCS 2007, Part II. LNCS, vol. 4488, pp. 1178–1185. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-72586-2_164
Chapter Google Scholar
Liu, Y., Wang, N.: Sentence alignment for ancient and modern Chinese parallel corpus. In: Lei, J., Wang, F.L., Deng, H., Miao, D. (eds.) AICI 2012. CCIS, vol. 315, pp. 408–415. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34240-0_54
Chapter Google Scholar
Luong, T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, Lisbon, 17–21 September 2015, pp. 1412–1421 (2015)
Google Scholar
Ma, S., Sun, X., Wang, Y., Lin, J.: Bag-of-words as target for neural machine translation. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, 15–20 July 2018, Volume 2: Short Papers, pp. 332–338 (2018)
Google Scholar
Meng, F., Lu, Z., Li, H., Liu, Q.: Interactive attention for neural machine translation. In: COLING 2016, pp. 2174–2185 (2016)
Google Scholar
Mi, H., Wang, Z., Ittycheriah, A.: Supervised attentions for neural machine translation. In: EMNLP 2016, pp. 2283–2288 (2016)
Google Scholar
Resnik, P.: Parallel strands: a preliminary investigation into mining the web for bilingual text. In: Farwell, D., Gerber, L., Hovy, E. (eds.) AMTA 1998. LNCS (LNAI), vol. 1529, pp. 72–82. Springer, Heidelberg (1998). https://doi.org/10.1007/3-540-49478-2_7
Chapter Google Scholar
Resnik, P.: Mining the web for bilingual text. In: Dale, R., Church, K.W. (eds.) ACL. ACL (1999)
Google Scholar
See, A., Liu, P.J., Manning, C.D.: Get to the point: summarization with pointer-generator networks. In: Barzilay, R., Kan, M. (eds.) Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, Vancouver, 30 July–4 August, Volume 1: Long Papers, pp. 1073–1083. Association for Computational Linguistics (2017). https://doi.org/10.18653/v1/P17-1099
Su, J., Tan, Z., Xiong, D., Ji, R., Shi, X., Liu, Y.: Lattice-based recurrent neural network encoders for neural machine translation. In: Singh, S.P., Markovitch, S. (eds.) Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 4–9 February 2017, San Francisco, pp. 3302–3308. AAAI Press (2017)
Google Scholar
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems, pp. 3104–3112 (2014)
Google Scholar
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: NIPS 2014, pp. 3104–3112 (2014)
Google Scholar
Tjandra, A., Sakti, S., Nakamura, S.: Local monotonic attention mechanism for end-to-end speech recognition. CoRR abs/1705.08091 (2017)
Google Scholar
Tu, Z., Lu, Z., Liu, Y., Liu, X., Li, H.: Modeling coverage for neural machine translation. In: ACL 2016 (2016)
Google Scholar
Wang, X., Ren, F.: Chinese-Japanese clause alignment. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, pp. 400–412. Springer, Heidelberg (2005). https://doi.org/10.1007/978-3-540-30586-6_43
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

MOE Key Lab of Computational Linguistics, School of EECS, Peking University, Beijing, China
Zhiyuan Zhang & Wei Li
School of Foreign Languages, Peking University, Beijing, China
Qi Su

Authors

Zhiyuan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Wei Li
View author publications
You can also search for this author in PubMed Google Scholar
Qi Su
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qi Su .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Jie Tang
National University of Singapore, Singapore, Singapore
Min-Yen Kan
Peking University, Beijing, China
Dongyan Zhao
Peking University, Beijing, China
Sujian Li
Zhengzhou University, Zhengzhou, China
Hongying Zan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, Z., Li, W., Su, Q. (2019). Automatic Translating Between Ancient Chinese and Contemporary Chinese with Limited Aligned Corpora. In: Tang, J., Kan, MY., Zhao, D., Li, S., Zan, H. (eds) Natural Language Processing and Chinese Computing. NLPCC 2019. Lecture Notes in Computer Science(), vol 11839. Springer, Cham. https://doi.org/10.1007/978-3-030-32236-6_13

Download citation

DOI: https://doi.org/10.1007/978-3-030-32236-6_13
Published: 30 September 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32235-9
Online ISBN: 978-3-030-32236-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the China Computer Federation (CCF) (opens in a new tab)