Abstract
Machine translation for social communication is necessary in daily life. However, spoken language translation faces many challenges especially in the translation of zero pronouns which is absent in the source language but appear in the target language. Dropping of pronouns severely affects the machine translation from pronoun dropped language such as Chinese to other languages. This phenomenon occurs more frequently in the conversational spoken language. In order to solve this problem, we insert the position of missing pronouns into the source side, then we use the word alignment method to filter the pronouns in order to pick up the pronouns which are really helpful for the machine translation. We achieve improvement on the translation of chat, message and telephone conversational corpus.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
This corpus comes from the DARPA Broad Operational Language Translation (BOLT) Program which includes message, chat,and telephone conversation parallel data sets The website is https://www.ldc.upenn.edu/sites/www.ldc.upenn.edu/files/bolt_1.pdf.
References
Chung, T., Gildea, D.: Effects of empty categories on machine translation. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (2010)
Wang, H., Gao, W., Li, S.: Speech machine translation research review. Comput. Sci. 5, 47–50 (1998)
Hoang, H., Birch, A., Callison-Burch, C., Zens, R., Federico, M., Bertoldi, N., Dyer, C., Cowan, B., Shen, W., Moran, C.: Moses: open source toolkit for statistical machine translation. Proc. Assoc. Comput. Linguist. 9(1), 177–180 (2007)
Chen, C., Ng, V.: Chinese zero pronoun resolution: some recent advances. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1360–1365 (2013)
Guillou, L.: Improving pronoun translation for statistical machine translation. In: Proceedings of the Student Research Workshop at the 13th Conference of the European Chapter of the Association for Computational Linguistics, pp. 1–10 (2012)
Yamada, K., Knight, K.: A syntax-based statistical translation model. In: Proceedings of the 39th Annual Meeting on Association for Computational Linguistics, pp. 523–530 (2001)
Xiang, B., Luo, X., Zhou, B.: Enlisting the ghost: modeling empty categories for machine translation. In: ACL, pp. 822–831 (2013)
Xue, N., et al.: Chinese Treebank 8.0 LDC2013T21. Linguistic Data Consortium, Philadelphia (2013)
Lafferty, J., Mccallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the 18th International Conference on Machine Learning 2001 (ICML 2001), pp. 282–289 (2001)
Vogel, S., Ney, H., Tillmann, C.: HMM-based word alignment in statistical translation. In: Proceedings of the 16th Conference on Computational Linguistics, vol. 2, pp. 836–841 (1996)
Che, W., Li, Z., Liu, T.: LTP: a chinese language technology platform. In: Proceedings of the 23rd International Conference on Computational Linguistics, pp. 13–16 (2010)
Chiang, D.: A hierarchical phrase-based model for statistical machine translation. In: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, pp. 263–270 (2005)
Chen, S.F., Goodman, J.: An empirical study of smoothing techniques for language modeling. Comput. Speech Lang. 13(4), 359–393 (1999)
Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Comput. Linguist. 29(1), 19–51 (2003)
Och, F.J.: Minimum error rate training in statistical machine translation. In: Proceedings of the 41th Annual Meeting on Association for Computational Linguistics, vol. 32, no. 17, pp. 701–711 (2003)
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 311–318 (2002)
Koehn, P.: Statistical significance tests for machine translation evaluation. In: EMNLP, pp. 388–395. Citeseer (2004)
Acknowledgments
This work is supported by the National Basic Research Program of China (973 Program, Grant No. 2013CB329303), the National Natural Science Foundation of China (Grant No. 61132009, 61202244) and Beijing Institute of Technology Research Fund Program for Young Scholars.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer Science+Business Media Singapore
About this paper
Cite this paper
Hu, Y., Huang, H., Jian, P., Guo, Y. (2015). Improving Conversational Spoken Language Machine Translation via Pronoun Recovery. In: Zhang, X., Sun, M., Wang, Z., Huang, X. (eds) Social Media Processing. SMP 2015. Communications in Computer and Information Science, vol 568. Springer, Singapore. https://doi.org/10.1007/978-981-10-0080-5_20
Download citation
DOI: https://doi.org/10.1007/978-981-10-0080-5_20
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-0079-9
Online ISBN: 978-981-10-0080-5
eBook Packages: Computer ScienceComputer Science (R0)