Syntactic discriminative language model rerankers for statistical machine translation

Carter, Simon; Monz, Christof

doi:10.1007/s10590-011-9108-7

Syntactic discriminative language model rerankers for statistical machine translation

Open access
Published: 01 September 2011

Volume 25, pages 317–339, (2011)
Cite this article

Download PDF

You have full access to this open access article

Machine Translation

Syntactic discriminative language model rerankers for statistical machine translation

Download PDF

Simon Carter¹ &
Christof Monz¹

800 Accesses
4 Citations
3 Altmetric
Explore all metrics

Abstract

This article describes a method that successfully exploits syntactic features for n-best translation candidate reranking using perceptrons. We motivate the utility of syntax by demonstrating the superior performance of parsers over n-gram language models in differentiating between Statistical Machine Translation output and human translations. Our approach uses discriminative language modelling to rerank the n-best translations generated by a statistical machine translation system. The performance is evaluated for Arabic-to-English translation using NIST’s MT-Eval benchmarks. While deep features extracted from parse trees do not consistently help, we show how features extracted from a shallow Part-of-Speech annotation layer outperform a competitive baseline and a state-of-the-art comparative reranking approach, leading to significant BLEU improvements on three different test sets.

Article PDF

A hybrid machine translation architecture guided by syntax

Article 16 September 2014

An Investigation on Statistical Machine Translation with Neural Language Models

Learning local word reorderings for hierarchical phrase-based statistical machine translation

Article 12 March 2016

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Arun A, Koehn P (2007) Online learning methods for discriminative training of phrase based statistical machine translation. In: Machine translation summit XI: proceedings, Copenhagen, pp 15–20
Bikel DM (2002) Design of a multi-lingual, parallel-processing statistical parsing engine. In: HLT 2002: human language technology conference, proceedings of the second international conference on human language technology research, San Diego, pp 178–182
Bilmes JA, Kirchhoff K (2003) Factored language models and generalized parallel backoff. In: HLT-NAACL 2003: conference combining human language technology conference series and the North American chapter of the Association for Computational Linguistics conference series, Edmonton, pp 4–6
Birch A, Osborne M, Koehn P (2007) CCG supertags in factored statistical machine translation. In: Proceedings of the second workshop on statistical machine translation (WMT 2007), Prague, pp 9–16
Blunsom P, Cohn T, Osborne M (2008) A discriminative latent variable model for statistical machine translation. In: ACL-08: HLT, 46th annual meeting of the Association for Computational Linguistics: human language technologies, proceedings of the conference, Columbus, pp 200–208
Brown PF, Pietra VJ, de Souza PV, Lai JC, Mercer RL (1992) Class-based n-gram models of natural language. Comput Linguist 18(4): 467–479
Google Scholar
Callison-Burch C, Osborne M, Koehn P (2006) Re-evaluating the role of BLEU in machine translation research. In: EACL-2006: 11th conference of the European chapter of the Association for Computational Linguistics, Proceedings of the conference, Trento, pp 249–256
Carter S, Monz C (2009) Parsing statistical machine translation output. In: Proceedings of the language & technology conference (LTC 2009), Poznań, pp 270–274
Carter S, Monz C (2010) Discriminative syntactic reranking for statistical machine translation. In: AMTA 2010: proceedings of the ninth conference of the Association for Machine Translation in the Americas, Denver, pp 3–12
Chang PC, Toutanova K (2007) A discriminative syntactic word order model for machine translation. In: proceedings of the 45th annual meeting of the Association for Computational Linguistics (ACL 2007), Prague, pp 9–16
Chen SF, Goodman J (1998) An empirical study of smoothing methods for language modelling. Tech. Rep. TR-10-98. University of Harvard, Cambridge
Google Scholar
Chen X, Wang H, Lin X (2009) Learning to rank with a novel kernel perceptron method. In: Proceedings of the 18th ACM conference on information and knowledge management (CIKM 2009), Hong Kong, pp 505–512
Chiang D (2005) A hierarchical phrase-based model for statistical machine translation. In: 43rd annual meeting of the Association for Computational Linguistics (ACL 2005), Ann Arbor, pp 263–270
Chiang D (2007) Hierarchical phrase-based translation. Comput Linguist 33(2): 201–228
Article Google Scholar
Chiang D, Marton Y, Resnik P (2008) Online large-margin training of syntactic and structural translation features. In: EMNLP 2008: 2008 conference on empirical methods in natural language processing, Proceedings of the conference, Honolulu, pp 224–233
Chiang D, Wang W, Knight K (2009) 11,001 new features for statistical machine translations. In: Human language technologies: the 2009 annual conference of the North American chapter of the Association for Computational Linguistics, proceedings of the conference, Boulder, pp 218–226
Collins M (1997) Three generative, lexicalized models for statistical parsing. In: Cohen PR, Wahlster W (eds) 35th annual meeting of the Association for Computational Linguistics and 8th conference of the European chapter of the Association for Computational Linguistics, proceedings of the conference, Madrid, pp. 16–23
Collins M (1999) Head-driven statistical models for natural language parsing. PhD thesis, University of Pennsylvania, Philadelphia, Pennsylvania
Collins M, Duffy N (2002) New ranking algorithms for parsing and tagging: kernels over discrete structures, and the voted perceptron. In: 40th annual meeting of the Association for Computational Linguistics, proceedings of the conference, Philadelphia, pp 263–270
Collins M, Roark B, Saraclar M (2005) Discriminative syntactic language modeling for speech recognition. In: 43rd annual meeting of the Association for Computational Linguistics (ACL 2005), Ann Arbor, pp 507–514
Crammer K, Singer Y (2001) Pranking with ranking. In: Proceedings of the twenty-fifth annual conference on advances in neural information processing systems (NIPS 2001), Vancouver, pp 641–647
Elsas JL, Carvalho VR, Carbonell JG (2008) Fast learning of document ranking functions with the committee perceptron. In: Proceedings of the international conference on web search and web data mining (WSDM 2008), Stanford, pp 55–64
Emami A, Papineni K, Sorensen J (2007) Large-scale distributed language modeling. In: Proceedings of the international conference on acoustics, speech and signal processing (ICASSP 2007), Honolulu, pp 37–40
Freund Y, Schapire RE (1999) Large margin classification using the perceptron algorithm. Mach Learn 37(3): 277–296
Article MATH Google Scholar
Gallant SI (1999) Perceptron based learning algorithms. IEEE Trans Neural Netw 1(2): 179–191
Article Google Scholar
Hasan S, Bender O, Ney H (2006) Reranking translation hypotheses using structural properties. In: EACL-2006: 11th conference of the European chapter of the Association for Computational Linguistics, proceedings of the conference, Trento, pp 41–48
Koehn P (2004) Statistical significance tests for machine translation evaluation. In: Proceedings of the 2004 conference on empirical methods in natural language processing, Barcelona, pp 388–395
Koehn P, Hoang H (2007) Factored translation models. In: Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CONLL 2007), Prague, pp 868–876
Koehn P, Och FJ, Marcu D (2003) Statistical phrase-based translation. In: HLT-NAACL 2003: conference combining human language technology conference series and the North American chapter of the Association for Computational Linguistics conference series, Edmonton, pp 48–54
Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: open source toolkit for statistical machine translation. In: ACL 2007, proceedings of the interactive poster and demonstration sessions, Prague, pp 177–180
Kulesza A, Shieber, S (2004) A learning approach to improving sentence-level MT evaluation. In: TMI-2004: proceedings of the tenth conference on theoretical and methodological issues in machine translation, Baltimore, pp 75–84
Li Z, Khudanpur S (2008) Large-scale discriminative n-gram language models for statistical machine translation. In: AMTA-2008: MT at work: proceedings of the eighth conference of the Association for Machine Translation in the Americas, Waikiki, pp 133–142
Liang P, Bouchard-Côté A, Klein D, Taskar B (2006) An end-to-end discriminative approach to machine translation. In: COLING ACL 2006, 21st international conference on computational linguistics and 44th annual meeting of the Association for Computational Linguistics, proceedings of the conference, Sydney, pp 761–768
Lin CY, Och FJ (2004) Orange: a method for evaluating automatic evaluation metrics for machine translation. In: 20th international conference on computational linguistics, proceedings, vol I, Geneva, pp 501–507
Marcus M, Kim G, Marcinkiewicz MA, Macintyre R, Bies A, Ferguson M, Katz K, Schasberger B (1994) The Penn Treebank: annotating predicate argument structure. In: Human language technology, proceedings of a workshop, Plainsboro, pp 114–119
McDonald R (2007) Characterizing the errors of data-driven dependency parsing models. In: EMNLP-CoNLL 2007: proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning, Prague, pp 121–131
Mohit B, Hwa R (2007) Localization of difficult-to-translate phrases. In: Proceedings of the second workshop on statistical machine translation (WMT 2007), Prague, pp 248–255
Och FJ (2003) Minimum error rate training in statistical machine translation. In: 41st annual meeting of the Association for Computational Linguistics, proceedings of the conference, Sapporo, pp 160–167
Och FJ, Ney H (2000) Improved statistical alignment models. In: 38th annual meeting of the Association for Computational Linguistics, proceedings of the conference, Hong Kong, pp 440–447
Och FJ, Gildea D, Khudanpur S, Sarkar A, Yamada K, Fraser A, Kumar S, Shen L, Smith D, Eng K, Jain V, Jin Z, Radev D (2003) Syntax for statistical machine translation. Tech. Rep. IRCS-00-07. Johns Hopkins 2003 Summer Workshop, Baltimore
Google Scholar
Och FJ, Gildea D, Khudanpur S, Sarkar A, Yamada K, Fraser A, Kumar S, Shen L, Smith D, Eng K, Jain V, Jin Z, Radev D (2004) A smorgasbord of features for statistical machine translation. In: HLT-NAACL 2004: human language technology conference of the North American chapter of the Association for Computational Linguistics, proceedings of the main conference, Boston, pp 161–168
Papineni K, Roukos S, Ward T, Zhu WJ (2002) Bleu: a method for automatic evaluation of machine translation. In: 40th annual meeting of the Association for Computational Linguistics, proceedings of the conference, Philadelphia, pp 311–318
Post M, Gildea D (2008) Parsers as language models for statistical machine translation. In: AMTA-2008: MT at work: proceedings of the Eighth conference of the Association for Machine Translation in the Americas, Waikiki, pp 172–181
Roark B, Saraclar M, Collins M (2004a) Corrective language modeling for large vocabulary ASR with the perceptron algorithms. In: Proceedings of the international conference on acoustics, speech and signal processing (ICASSP 2004), Montreal, pp 749–752
Roark B, Saraclar M, Collins M, Johnson M (2004b) Discriminative language modeling with conditional random fields and the perceptron algorithm. In: ACL-04, 42nd annual meeting of the Association for Computational Linguistics, proceedings of the conference, Barcelona, pp 47–54
Roark B, Saraclar M, Collins M (2007) Discriminative n-gram language modeling. Comput Speech Lang 21(2): 373–392
Article Google Scholar
Rosenblatt F (1958) The perceptron: a probabilistic model for information storage and organization in the brain. Neurocomput Found Res 65(6): 386–408
MathSciNet Google Scholar
Shen L, Sarkar A, Och FJ (2004) Discriminative reranking for machine translation. In: HLT-NAACL 2004: human language technology conference of the North American chapter of the Association for Computational Linguistics, proceedings of the main Conference, Boston, pp 177–184
Singh-Miller N, Collins C (2007) Trigger-based language modeling using a loss-sensitive perceptron algorithm. In: Proceedings of the international conference on acoustics, speech and signal processing (ICASSP 2007), Honolulu, pp 25–28
Stolcke A (2002) SRILM—an extensible language modeling toolkit. In: Proceedings of the international conference on spoken language processing (ICSLP 2002), Denver, pp 901–904
Tillmann C, Zhang T (2006) A discriminative global training algorithm for statistical MT. In: COLING ACL 2006, 21st international conference on computational linguistics and 44th annual meeting of the Association for Computational Linguistics, proceedings of the conference, Sydney, pp 721–728
Watanabe T, Suzuki J, Tsukada J, Isozaki H (2007) Online large-margin training for statistical machine translation. In: Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CONLL 2007), Prague, pp 764–773

Download references

Acknowledgments

The authors would like to thank Valentin Jijkoun, Sophia Katrenko and the anonymous reviewers for their insightful comments and helpful discussions. This work has been funded in part by the European Commission through the CoSyne project FP7-ICT-4-248531.

Open Access

This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Author information

Authors and Affiliations

ISLA, University of Amsterdam, Science Park 904, 1098 XH, Amsterdam, The Netherlands
Simon Carter & Christof Monz

Authors

Simon Carter
View author publications
You can also search for this author in PubMed Google Scholar
Christof Monz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Simon Carter.

Additional information

This work is a revised and substantially expanded version of (Carter and Monz 2009) and (Carter and Monz 2010).

Rights and permissions

Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Reprints and permissions

About this article

Cite this article

Carter, S., Monz, C. Syntactic discriminative language model rerankers for statistical machine translation. Machine Translation 25, 317–339 (2011). https://doi.org/10.1007/s10590-011-9108-7

Download citation

Received: 28 December 2010
Accepted: 12 August 2011
Published: 01 September 2011
Issue Date: December 2011
DOI: https://doi.org/10.1007/s10590-011-9108-7

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Syntactic discriminative language model rerankers for statistical machine translation

Abstract

Article PDF

Similar content being viewed by others

A hybrid machine translation architecture guided by syntax

An Investigation on Statistical Machine Translation with Neural Language Models

Learning local word reorderings for hierarchical phrase-based statistical machine translation

References

Acknowledgments

Open Access

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Syntactic discriminative language model rerankers for statistical machine translation

Abstract

Article PDF

Similar content being viewed by others

A hybrid machine translation architecture guided by syntax

An Investigation on Statistical Machine Translation with Neural Language Models

Learning local word reorderings for hierarchical phrase-based statistical machine translation

References

Acknowledgments

Open Access

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation