A vector-space dynamic feature for phrase-based statistical machine translation
- 167 Downloads
In this paper, we propose and evaluate a novel dynamic feature function for log-linear model combinations in phrase-based statistical machine translation. The feature function is inspired on the popularly known vector-space model which is typically used in information retrieval and text mining applications, and it aims at improving translation unit selection at decoding time by incorporating context information from the source language. Significant improvements on an English-Spanish experimental corpus are presented and discussed.
KeywordsStatistical machine translation Source context information Vector-space model
The authors would like to thank Barcelona Media Innovation Center and Institute for Infocomm Research for its support and permission to publish this research. We would also like to thank Bart Mellebeek for his helpful contribution. We would like to give credit to the anonymous reviewers of this paper for their valuable suggestions.
This work has been partially funded by the Spanish Department of Education and Science through the Juan de la Cierva fellowship program and the Spanish Government under the BUCEADOR project (TEC2009-14094-C04-01).
- Carpuat, M., & Wu, D. (2007). Improving statistical machine translation using word sense disambiguation. In Empirical methods in natural language processing (EMNLP) (pp. 61–72). Prague.Google Scholar
- Chew, P. A., Verzi, S. J., Bauer, T. L., & McClain, J. T. (2006). Evaluation of the bible as a resource for cross-language information retrieval. In Proceedings of the workshop on multilingual language resources and interoperability (pp. 68–74). Sydney, Australia.Google Scholar
- Haque, R., Kumar Naskar, S., Ma, Y., & Way, A. (2009). Using supertags as source language context in smt. In 13th annual conference of the European association for machine translation (EAMT) (pp. 234–241). Barcelona.Google Scholar
- Koehn, K., & Knight, K. (2003). Empirical methods for compound splitting. In Proc. of the 10th conf. of the European chapter of the association for computational linguistics (pp. 347–354). Budapest, Hungary.Google Scholar
- Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., et al. (2007). Moses: Open source toolkit for statistical machine translation. In Proc. of the 45th annual meeting of the association for computational linguistics (pp. 177–180). Prague, Czech Republic.Google Scholar
- Och, F. J. (1999). An efficient method for determining bilingual word classes. In Proc. of the 9th conf. of the European chapter of the association for computational linguistics (pp. 71–76). Bergen, Norway.Google Scholar
- Och, F. J. (2003). Minimum error rate training in statistical machine translation. In Proc. of the 41th annual meeting of the association for computational linguistics (pp. 160–167). Sapporo.Google Scholar
- Och, F. J., & Ney, H. (2000). A comparison of alignment models for statistical machine translation. In Proc. of the 18th conference on computational linguistics (pp. 1086–1090). Morristown, USA.Google Scholar
- Och, F. J., & Ney, H. (2002). Discriminative training and maximum entropy models for statistical machine translation. In Proc. of the 40th annual meeting of the association for computational linguistics (pp. 295–302). Philadelphia, USA.Google Scholar
- Papineni, K., Roukos, S., Ward, T., & Zhu, W.-J. (2002). Bleu: A method for automatic evaluation of machine translation. In Proc. of the 40th annual meeting of the association for computational linguistics (pp. 311–318). Philadelphia, PA.Google Scholar
- Salton, G., & McGill, M. (1983). Introduction to modern information retrieval. McGraw-Hill.Google Scholar
- Schwenk, H., Costa-jussà, M. R., & Fonollosa, J. A. R. (2007). Smooth bilingual translation. In Empirical methods in natural language processing (EMNLP) (pp. 430–438). Prague.Google Scholar
- Stolcke, A. (2002). SRILM—an extensible language modeling toolkit. In Proc. of the 7th int. conf. on spoken language processing, ICSLP’02 (pp. 901–904). Denver, USA.Google Scholar
- Stroppa, N., van de Bosch, A., & Way, A. (2007). Exploiting source similarity for smt using context-informed features. In 11th conference on theoretical and methodological issues in machine translation (TMI) (pp. 231–240). Skövde.Google Scholar