Improving Conversational Spoken Language Machine Translation via Pronoun Recovery

Hu, Yanlin; Huang, Heyan; Jian, Ping; Guo, Yuhang

doi:10.1007/978-981-10-0080-5_20

Yanlin Hu¹⁴,
Heyan Huang¹⁴,
Ping Jian¹⁴ &
…
Yuhang Guo¹⁴

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 568))

Included in the following conference series:

Chinese National Conference on Social Media Processing

816 Accesses
3 Altmetric

Abstract

Machine translation for social communication is necessary in daily life. However, spoken language translation faces many challenges especially in the translation of zero pronouns which is absent in the source language but appear in the target language. Dropping of pronouns severely affects the machine translation from pronoun dropped language such as Chinese to other languages. This phenomenon occurs more frequently in the conversational spoken language. In order to solve this problem, we insert the position of missing pronouns into the source side, then we use the word alignment method to filter the pronouns in order to pick up the pronouns which are really helpful for the machine translation. We achieve improvement on the translation of chat, message and telephone conversational corpus.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
This corpus comes from the DARPA Broad Operational Language Translation (BOLT) Program which includes message, chat,and telephone conversation parallel data sets The website is https://www.ldc.upenn.edu/sites/www.ldc.upenn.edu/files/bolt_1.pdf.

References

Chung, T., Gildea, D.: Effects of empty categories on machine translation. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (2010)
Google Scholar
Wang, H., Gao, W., Li, S.: Speech machine translation research review. Comput. Sci. 5, 47–50 (1998)
Google Scholar
Hoang, H., Birch, A., Callison-Burch, C., Zens, R., Federico, M., Bertoldi, N., Dyer, C., Cowan, B., Shen, W., Moran, C.: Moses: open source toolkit for statistical machine translation. Proc. Assoc. Comput. Linguist. 9(1), 177–180 (2007)
Google Scholar
Chen, C., Ng, V.: Chinese zero pronoun resolution: some recent advances. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1360–1365 (2013)
Google Scholar
Guillou, L.: Improving pronoun translation for statistical machine translation. In: Proceedings of the Student Research Workshop at the 13th Conference of the European Chapter of the Association for Computational Linguistics, pp. 1–10 (2012)
Google Scholar
Yamada, K., Knight, K.: A syntax-based statistical translation model. In: Proceedings of the 39th Annual Meeting on Association for Computational Linguistics, pp. 523–530 (2001)
Google Scholar
Xiang, B., Luo, X., Zhou, B.: Enlisting the ghost: modeling empty categories for machine translation. In: ACL, pp. 822–831 (2013)
Google Scholar
Xue, N., et al.: Chinese Treebank 8.0 LDC2013T21. Linguistic Data Consortium, Philadelphia (2013)
Google Scholar
Lafferty, J., Mccallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the 18th International Conference on Machine Learning 2001 (ICML 2001), pp. 282–289 (2001)
Google Scholar
Vogel, S., Ney, H., Tillmann, C.: HMM-based word alignment in statistical translation. In: Proceedings of the 16th Conference on Computational Linguistics, vol. 2, pp. 836–841 (1996)
Google Scholar
Che, W., Li, Z., Liu, T.: LTP: a chinese language technology platform. In: Proceedings of the 23rd International Conference on Computational Linguistics, pp. 13–16 (2010)
Google Scholar
Chiang, D.: A hierarchical phrase-based model for statistical machine translation. In: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, pp. 263–270 (2005)
Google Scholar
Chen, S.F., Goodman, J.: An empirical study of smoothing techniques for language modeling. Comput. Speech Lang. 13(4), 359–393 (1999)
Article Google Scholar
Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Comput. Linguist. 29(1), 19–51 (2003)
Article MATH Google Scholar
Och, F.J.: Minimum error rate training in statistical machine translation. In: Proceedings of the 41th Annual Meeting on Association for Computational Linguistics, vol. 32, no. 17, pp. 701–711 (2003)
Google Scholar
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 311–318 (2002)
Google Scholar
Koehn, P.: Statistical significance tests for machine translation evaluation. In: EMNLP, pp. 388–395. Citeseer (2004)
Google Scholar

Download references

Acknowledgments

This work is supported by the National Basic Research Program of China (973 Program, Grant No. 2013CB329303), the National Natural Science Foundation of China (Grant No. 61132009, 61202244) and Beijing Institute of Technology Research Fund Program for Young Scholars.

Author information

Authors and Affiliations

Beijing Engineering Research Center of High Volume Language Information Processing and Cloud Computing Applications, Department of Computer Science and Technology, Beijing Institute of Technology, Beijing, 100081, China
Yanlin Hu, Heyan Huang, Ping Jian & Yuhang Guo

Authors

Yanlin Hu
View author publications
You can also search for this author in PubMed Google Scholar
Heyan Huang
View author publications
You can also search for this author in PubMed Google Scholar
Ping Jian
View author publications
You can also search for this author in PubMed Google Scholar
Yuhang Guo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yanlin Hu .

Editor information

Editors and Affiliations

South China University of Technology, Guangzhou, China
Xichun Zhang
Tsinghua University, Beijing, China
Maosong Sun
South China University of Technology, Guangzhou, China
Zhenyu Wang
Fudan University, Shanghai, China
Xuanjing Huang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hu, Y., Huang, H., Jian, P., Guo, Y. (2015). Improving Conversational Spoken Language Machine Translation via Pronoun Recovery. In: Zhang, X., Sun, M., Wang, Z., Huang, X. (eds) Social Media Processing. SMP 2015. Communications in Computer and Information Science, vol 568. Springer, Singapore. https://doi.org/10.1007/978-981-10-0080-5_20

Download citation

DOI: https://doi.org/10.1007/978-981-10-0080-5_20
Published: 12 November 2015
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-0079-9
Online ISBN: 978-981-10-0080-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics