Abstract
There are additional difficulties associated with the translation of language pairs that have different word orders. In this chapter, we introduce some of these difficulties and describe two syntax-based approaches to addressing these problems. First, we describe an approach that exploits regularities in the differences of phrase head locations between Chinese and Japanese and formalize rules that reorder branches of constituency trees. Second, we propose an approach that compensates the differences in typical locations of the Subject (S), the Verb (V), and the Object (O) between Chinese (SVO) and Japanese (SOV), and devise rules that reorder word blocks from dependency trees. These approaches are implemented in the form of pre-reordering methods, and we evaluate their impact on a phrase-based machine translation system in terms of translation quality in news and patent domains. These approaches rely on syntactic structures that are automatically extracted by means of parsers, and as such, they are sensitive to parse errors. We analyze the effect of these parse errors, and obtain upper bounds in translation performance that can be achieved with these syntax-based pre-reordering methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
They produce target phrases that correspond to source phrases at a very different relative position.
- 2.
- 3.
In the text, we represent Chinese characters in Pinyin together with a tone number and its English translation in parentheses, e.g., 我(wo3, I). In total, there are 5 tones (i.e., 0, 1, 2, 3, and 4) in Chinese.
- 4.
- 5.
- 6.
However, it is still open for debate whether Chinese is a head-initial or a head-final language due to its flexible word order (Gao 2008). Nevertheless, the written form of Chinese behaves primarily as a head-initial language.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.
- 14.
- 15.
- 16.
References
Badr, Ibrahim, Rabih Zbib, and James Glass. 2009. Syntactic phrase reordering for English-to-Arabic statistical machine translation. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, 86–93. Association for Computational Linguistics.
Brown, Peter F, John Cocke, Stephen A Della Pietra, Vincent J Della Pietra, Fredrick Jelinek, John D Lafferty, Robert L Mercer, and Paul S Roossin. 1990. A statistical approach to machine translation. Computational Linguistics 16(2):79–85.
Chang, Pi-Chuan, Michel Galley, and Christopher D Manning. 2008. Optimizing Chinese word segmentation for machine translation performance. In Proceedings of the Third Workshop on Statistical Machine Translation, 224–232. Association for Computational Linguistics.
Chang, Pi-Chuan, Huihsin Tseng, Dan Jurafsky, and Christopher D Manning. 2009. Discriminative reordering with Chinese grammatical relations features. In Proceedings of the Third Workshop on Syntax and Structure in Statistical Translation, 51–59. Association for Computational Linguistics.
Collins, Michael, Philipp Koehn, and Ivona Kučerová. 2005. Clause restructuring for statistical machine translation. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, 531–540. Association for Computational Linguistics.
Costa-Jussà, Marta Ruiz, and José Adrián Rodríguez Fonollosa. 2006. Statistical machine reordering. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing (EMNLP), 70–76. Association for Computational Linguistics.
Fukui, Naoki. 1992. Theory of projection in syntax. Stanford, CA/Tokyo: CSLI Publisher/Kuroshio Publisher.
Gao, Qian. 2008. Word order in mandarin: Reading and speaking. In Proceedings of the 20th North American Conference on Chinese Linguistics (NACCL-20), vol. 2, pp. 611–626.
Gao, Qin, and Stephan Vogel. 2008. Parallel implementations of word alignment tool. In Proceedings of Software Engineering, Testing, and Quality Assurance for Natural Language Processing, 49–57. Association for Computational Linguistics.
Genzel, Dmitriy. 2010. Automatically learning source-side reordering rules for large scale machine translation. In Proceedings of the 23rd International Conference on Computational Linguistics (COLING), 376–384. Association for Computational Linguistics.
Goto, Isao, Masao Utiyama, and Eiichiro Sumita. 2012. Post-ordering by parsing for Japanese-English statistical machine translation. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2, 311–316. Association for Computational Linguistics.
Han, Dan, Katsuhito Sudoh, Xianchao Wu, Kevin Duh, Hajime Tsukada, and Masaaki Nagata. 2012. Head finalization reordering for Chinese-to-Japanese machine translation. In Proceedings of the Sixth Workshop on Syntax, Semantics and Structure in Statistical Translation (SSST-6), 57–66. Association for Computational Linguistics.
Han, Dan, Pascual Martínez-Gómez, Yusuke Miyao, Katsuhito Sudoh, and Masaaki Nagata. 2013a. Effects of parsing errors on pre-reordering performance for Chinese-to-Japanese SMT. In Proceedings of the 27th Pacific Asia Conference on Language Information and Computing (PACLIC). The PACLIC Steering Committee.
Han, Dan, Pascual Martínez-Gómez, Yusuke Miyao, Katsuhito Sudoh, and Masaaki Nagata. 2013b. Using unlabeled dependency parsing for pre-reordering for Chinese-to-Japanese statistical machine translation. In Proceedings of the 2nd Workshop on Hybrid Approaches to Translation (HyTra), 25–33. Association for Computational Linguistics.
Hatori, Jun, Takuya Matsuzaki, Yusuke Miyao, and Jun’ichi Tsujii. 2011. Incremental joint POS tagging and dependency parsing in Chinese. In Proceedings of the 5th International Joint Conference on Natural Language Processing (IJCNLP), 1216–1224. Asian Federation of Natural Language Processing.
Isozaki, Hideki, Tsutomu Hirao, Kevin Duh, Katsuhito Sudoh, and Hajime Tsukada. 2010a. Automatic evaluation of translation quality for distant language pairs. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, 944–952. Association for Computational Linguistics.
Isozaki, Hideki, Katsuhito Sudoh, Hajime Tsukada, and Kevin Duh. 2010b. Head finalization: A simple reordering rule for SOV languages. In Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and Metrics MATR, 244–251. Association for Computational Linguistics.
Isozaki, Hideki, Katsuhito Sudoh, Hajime Tsukada, and Kevin Duh. 2012. HPSG-based preprocessing for English-to-Japanese translation. ACM Transactions on Asian Language Information Processing (TALIP) 11(3):8:1–8:16.
Kendall, Maurice G. 1938. A new measure of rank correlation. Biometrika 30(1/2):81–93.
Koehn, Philipp, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, and Richard Zens, et al. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics on Interactive Poster and Demonstration Sessions, 177–180. Association for Computational Linguistics.
Kudo, Taku, and Yuji Matsumoto. 2000. Japanese dependency structure analysis based on support vector machines. In Proceedings of the 2000 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora: Held in Conjunction with the 38th Annual Meeting of the Association for Computational Linguistics-Volume 13, 18–25. Association for Computational Linguistics.
Lee, Young-Suk, Bing Zhao, and Xiaoqiang Luo. 2010. Constituent reordering and syntax models for English-to-Japanese statistical machine translation. In Proceedings of the 23rd International Conference on Computational Linguistics (COLING), 626–634. Association for Computational Linguistics.
Li, Charles N., and Sandra Annear Thompson. 1989. Mandarin Chinese: A functional reference grammar. Linguistics-Asian studies. Berkeley, CA: University of California Press.
Li, Chi-Ho, Minghui Li, Dongdong Zhang, Mu Li, Ming Zhou, and Yi Guan. 2007. A probabilistic approach to syntax-based reordering for statistical machine translation. In Proceedings of the 45th Annual Meeting on Association for Computational Linguistics (ACL), vol. 45(1), pp. 720–727. Association for Computational Linguistics.
Ma, Xiaoyi. 2006. Champollion: A robust parallel text sentence aligner. In Proceedings of 5th International Conference on Language Resources and Evaluation (LREC-5), 489–492. Citeseer.
Miller, James Edward, and Jim Miller. 2011. A critical introduction to syntax. New York: Continuum International Publishing Group.
Miyao, Yusuke, and Jun’ichi Tsujii. 2008. Feature forest models for probabilistic HPSG parsing. Computational Linguistics 34(1):35–80.
Neubig, Graham, Taro Watanabe, and Shinsuke Mori. 2012. Inducing a discriminative parser to optimize machine translation reordering. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, 843–853. Association for Computational Linguistics.
Och, Franz Josef. 2003. Minimum error rate training in statistical machine translation. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics-Volume 1, 160–167. Association for Computational Linguistics.
Och, Franz Josef, and Hermann Ney. 2003. A systematic comparison of various statistical alignment models. Computational Linguistics 29(1):19–51.
Papineni, Kishore, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, 311–318. Association for Computational Linguistics.
Petrov, Slav, Leon Barrett, Romain Thibaux, and Dan Klein. 2006. Learning accurate, compact, and interpretable tree annotation. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, 433–440. Association for Computational Linguistics.
Pollard, Carl Jesse, and Ivan Andrew Sag. 1994. Head-driven phrase structure grammar. Chicago and Stanford, CA: The University of Chicago Press and CSLI Publications.
Ramanathan, Ananthakrishnan, Hansraj Choudhary, Avishek Ghosh, and Pushpak Bhattacharyya. 2009. Case markers and morphology: Addressing the crux of the fluency problem in English-Hindi SMT. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing, 800–808. Association for Computational Linguistics.
Rottmann, Kay, and Stephan Vogel. 2007. Word reordering in statistical machine translation with a pos-based distortion model. In Proceedings of the 11th International Conference on Theoretical and Methodological Issues in Machine Translation (TMI), 171–180.
Snover, Matthew, Bonnie Dorr, Richard Schwartz, Linnea Micciulla, and John Makhoul. 2006. A study of translation edit rate with targeted human annotation. In Proceedings of Association for Machine Translation in the Americas (AMTA), 223–231. The Association for Machine Translation in the Americas.
Sudoh, Katsuhito, Xianchao Wu, Kevin Duh, Hajime Tsukada, and Masaaki Nagata. 2011. Post-ordering in statistical machine translation. In Proceedings of the 13th Machine Translation Summit, 316–323. The International Association for Machine Translation (IAMT).
Tillmann, Christoph, Stephan Vogel, Hermann Ney, Alex Zubiaga, and Hassan Sawaf. 1997. Accelerated dp based search for statistical translation. In Proceedings of the 5th European Conference on Speech Communication and Technology, 2667–2670.
Tromble, Roy, and Jason Eisner. 2009. Learning linear ordering problems for better translation. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, vol. 2, pp. 1007–1016. Association for Computational Linguistics.
Tsunakawa, Takashi, Naoaki Okazaki, Xiao Liu, and Jun’ichi Tsujii. 2009. A Chinese-Japanese lexical machine translation through a pivot language. ACM Transactions on Asian Language Information Processing 8(2):9:1–9:21.
Visweswariah, Karthik, Jiri Navratil, Jeffrey Sorensen, Vijil Chenthamarakshan, and Nanda Kambhatla. 2010. Syntax based reordering with automatically derived rules for improved statistical machine translation. In Proceedings of the 23rd International Conference on Computational Linguistics (COLING), 1119–1127. Association for Computational Linguistics.
Visweswariah, Karthik, Rajakrishnan Rajkumar, Ankur Gandhe, Ananthakrishnan Ramanathan, and Jiri Navratil. 2011. A word reordering model for improved machine translation. In Proceedings of Empirical Methods in Natural Language Processing, 486–496. Association for Computational Linguistics.
Wang, Chao, Michael Collins, and Philipp Koehn. 2007. Chinese syntactic reordering for statistical machine translation. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), 737–745. Association for Computational Linguistics.
Wu, Hua, and Haifeng Wang. 2007. Pivot language approach for phrase-based statistical machine translation. Machine Translation 21(3):165–181.
Wu, Xianchao, Katsuhito Sudoh, Kevin Duh, Hajime Tsukada, and Masaaki Nagata. 2011. Extracting pre-ordering rules from predicate-argument structures. In Proceedings of 5th International Joint Conference on Natural Language Processing (IJCNLP), November 2011, 29–37. Chiang Mai: Asian Federation of Natural Language Processing. http://www.aclweb.org/anthology/I111004.
Xia, Fei. 2000. The part-of-speech tagging guidelines for the Penn Chinese Treebank 3.0. Technical Report IRCS0007 (October 2000). Institute of Research and Cognitive Science (IRCS). Pennsylvania: University of Pennsylvania. http://repository.upenn.edu/ircs_reports/38/.
Xia, Fei, and Michael McCord. 2004. Improving a statistical MT system with automatically learned rewrite patterns. In Proceedings of the 20th International Conference on Computational Linguistics (COLING), 508–514. Association for Computational Linguistics.
Xu, Peng, Jaeho Kang, Michael Ringgaard, and Franz Och. 2009. Using a dependency parser to improve SMT for subject-object-verb languages. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, 245–253. Association for Computational Linguistics.
Yu, Kun, Yusuke Miyao, Takuya Matsuzaki, Xiangli Wang, and Junichi Tsujii. 2011. Analysis of the difficulties in Chinese deep parsing. In Proceedings of the 12th International Conference on Parsing Technologies, 48–57. Association for Computational Linguistics.
Zhao, Hong-Mei, Ya-Juan Lv, Guo-Sheng Ben, Yun Huang, and Qun Liu. 2011. Evaluation report for the 7th China workshop on machine translation (CWMT2011). In The 7th China Workshop on Machine Translation (CWMT2011). http://mt.xmu.edu.cn/cwmt2011/document/papers/e00.pdf.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix: Summary of Part-of-Speech Tag Set in Penn Chinese Treebank
Appendix: Summary of Part-of-Speech Tag Set in Penn Chinese Treebank
See Table 6.
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Han, D., Martínez-Gómez, P., Miyao, Y. (2016). Syntax-Based Pre-reordering for Chinese-to-Japanese Statistical Machine Translation. In: Costa-jussà, M., Rapp, R., Lambert, P., Eberle, K., Banchs, R., Babych, B. (eds) Hybrid Approaches to Machine Translation. Theory and Applications of Natural Language Processing. Springer, Cham. https://doi.org/10.1007/978-3-319-21311-8_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-21311-8_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-21310-1
Online ISBN: 978-3-319-21311-8
eBook Packages: Computer ScienceComputer Science (R0)