Abstract
When hierarchical phrase-based statistical machine translation systems are used for language translation, sometimes the translations’ content words were lost: source-side content words is empty when translated into target texts during decoding. Although the translations’ BLEU score is very high, it is difficult to understand the translations because of the loss of the content words. In this paper, we propose a basic and efficient method for phrase filtering, with which the phrase’ content words translation are checked to decide whether to use the phrase in decoding or not. The experimental results show that the proposed method alleviates the problem of the loss content words’ and improves the BLEU scores.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Chiang, D.: A hierarchical phrase-based model for statistical machine translation. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 263–270. Association for Computational Linguistics (2005)
Koehn, P., Och, F.J., Marcu, D.: Statistical phrase-based translation. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, pp. 48–54 (2003)
Liu, Y., Liu, Q., Lin, S.: Tree-to-string alignment template for statistical machine translation. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, pp. 609–616 (2006)
Xie, J., Mi, H., Liu, Q.: A novel dependency-to-string model for statistical machine translation. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 216–226 (2011)
Och, F.J., Ney, H.: Improved Statistical Alignment Models. In: Proceedings of the 38th Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics (2000)
Stolcke, A.: SRILM-an extensible language modeling toolkit. In: Proceedings of the 7th International Conference on Spoken Language Processing 2002, pp. 901–905 (2002)
Och, F.J.: Minimum error rate training in statistical machine translation. In: Proceedings of 41st Annual Meeting on Association for Computational Linguistics, pp. 160–167. Association for Computational Linguistics
Papineni, K., Roukos, S., Ward, T., et al.: Bleu: a Method for Automatic Evaluation of Machine Translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 311–318. Association for Computational Linguistics (2002)
Miller, G.A.: WordNet: A Lexical Database for English. Communications of the ACM 38(11), 39–41 (1995)
Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, X., Xie, J., Song, L., Lv, Y., Yao, J. (2013). Phrase Filtering for Content Words in Hierarchical Phrase-Based Model. In: Liu, P., Su, Q. (eds) Chinese Lexical Semantics. CLSW 2013. Lecture Notes in Computer Science(), vol 8229. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-45185-0_51
Download citation
DOI: https://doi.org/10.1007/978-3-642-45185-0_51
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-45184-3
Online ISBN: 978-3-642-45185-0
eBook Packages: Computer ScienceComputer Science (R0)