Machine Translation

, Volume 25, Issue 4, pp 317–339

Syntactic discriminative language model rerankers for statistical machine translation

Open Access
Article

Abstract

This article describes a method that successfully exploits syntactic features for n-best translation candidate reranking using perceptrons. We motivate the utility of syntax by demonstrating the superior performance of parsers over n-gram language models in differentiating between Statistical Machine Translation output and human translations. Our approach uses discriminative language modelling to rerank the n-best translations generated by a statistical machine translation system. The performance is evaluated for Arabic-to-English translation using NIST’s MT-Eval benchmarks. While deep features extracted from parse trees do not consistently help, we show how features extracted from a shallow Part-of-Speech annotation layer outperform a competitive baseline and a state-of-the-art comparative reranking approach, leading to significant BLEU improvements on three different test sets.

Keywords

Statistical machine translation Discriminative language models Syntax 

References

  1. Arun A, Koehn P (2007) Online learning methods for discriminative training of phrase based statistical machine translation. In: Machine translation summit XI: proceedings, Copenhagen, pp 15–20Google Scholar
  2. Bikel DM (2002) Design of a multi-lingual, parallel-processing statistical parsing engine. In: HLT 2002: human language technology conference, proceedings of the second international conference on human language technology research, San Diego, pp 178–182Google Scholar
  3. Bilmes JA, Kirchhoff K (2003) Factored language models and generalized parallel backoff. In: HLT-NAACL 2003: conference combining human language technology conference series and the North American chapter of the Association for Computational Linguistics conference series, Edmonton, pp 4–6Google Scholar
  4. Birch A, Osborne M, Koehn P (2007) CCG supertags in factored statistical machine translation. In: Proceedings of the second workshop on statistical machine translation (WMT 2007), Prague, pp 9–16Google Scholar
  5. Blunsom P, Cohn T, Osborne M (2008) A discriminative latent variable model for statistical machine translation. In: ACL-08: HLT, 46th annual meeting of the Association for Computational Linguistics: human language technologies, proceedings of the conference, Columbus, pp 200–208Google Scholar
  6. Brown PF, Pietra VJ, de Souza PV, Lai JC, Mercer RL (1992) Class-based n-gram models of natural language. Comput Linguist 18(4): 467–479Google Scholar
  7. Callison-Burch C, Osborne M, Koehn P (2006) Re-evaluating the role of BLEU in machine translation research. In: EACL-2006: 11th conference of the European chapter of the Association for Computational Linguistics, Proceedings of the conference, Trento, pp 249–256Google Scholar
  8. Carter S, Monz C (2009) Parsing statistical machine translation output. In: Proceedings of the language & technology conference (LTC 2009), Poznań, pp 270–274Google Scholar
  9. Carter S, Monz C (2010) Discriminative syntactic reranking for statistical machine translation. In: AMTA 2010: proceedings of the ninth conference of the Association for Machine Translation in the Americas, Denver, pp 3–12Google Scholar
  10. Chang PC, Toutanova K (2007) A discriminative syntactic word order model for machine translation. In: proceedings of the 45th annual meeting of the Association for Computational Linguistics (ACL 2007), Prague, pp 9–16Google Scholar
  11. Chen SF, Goodman J (1998) An empirical study of smoothing methods for language modelling. Tech. Rep. TR-10-98. University of Harvard, CambridgeGoogle Scholar
  12. Chen X, Wang H, Lin X (2009) Learning to rank with a novel kernel perceptron method. In: Proceedings of the 18th ACM conference on information and knowledge management (CIKM 2009), Hong Kong, pp 505–512Google Scholar
  13. Chiang D (2005) A hierarchical phrase-based model for statistical machine translation. In: 43rd annual meeting of the Association for Computational Linguistics (ACL 2005), Ann Arbor, pp 263–270Google Scholar
  14. Chiang D (2007) Hierarchical phrase-based translation. Comput Linguist 33(2): 201–228CrossRefGoogle Scholar
  15. Chiang D, Marton Y, Resnik P (2008) Online large-margin training of syntactic and structural translation features. In: EMNLP 2008: 2008 conference on empirical methods in natural language processing, Proceedings of the conference, Honolulu, pp 224–233Google Scholar
  16. Chiang D, Wang W, Knight K (2009) 11,001 new features for statistical machine translations. In: Human language technologies: the 2009 annual conference of the North American chapter of the Association for Computational Linguistics, proceedings of the conference, Boulder, pp 218–226Google Scholar
  17. Collins M (1997) Three generative, lexicalized models for statistical parsing. In: Cohen PR, Wahlster W (eds) 35th annual meeting of the Association for Computational Linguistics and 8th conference of the European chapter of the Association for Computational Linguistics, proceedings of the conference, Madrid, pp. 16–23Google Scholar
  18. Collins M (1999) Head-driven statistical models for natural language parsing. PhD thesis, University of Pennsylvania, Philadelphia, PennsylvaniaGoogle Scholar
  19. Collins M, Duffy N (2002) New ranking algorithms for parsing and tagging: kernels over discrete structures, and the voted perceptron. In: 40th annual meeting of the Association for Computational Linguistics, proceedings of the conference, Philadelphia, pp 263–270Google Scholar
  20. Collins M, Roark B, Saraclar M (2005) Discriminative syntactic language modeling for speech recognition. In: 43rd annual meeting of the Association for Computational Linguistics (ACL 2005), Ann Arbor, pp 507–514Google Scholar
  21. Crammer K, Singer Y (2001) Pranking with ranking. In: Proceedings of the twenty-fifth annual conference on advances in neural information processing systems (NIPS 2001), Vancouver, pp 641–647Google Scholar
  22. Elsas JL, Carvalho VR, Carbonell JG (2008) Fast learning of document ranking functions with the committee perceptron. In: Proceedings of the international conference on web search and web data mining (WSDM 2008), Stanford, pp 55–64Google Scholar
  23. Emami A, Papineni K, Sorensen J (2007) Large-scale distributed language modeling. In: Proceedings of the international conference on acoustics, speech and signal processing (ICASSP 2007), Honolulu, pp 37–40Google Scholar
  24. Freund Y, Schapire RE (1999) Large margin classification using the perceptron algorithm. Mach Learn 37(3): 277–296MATHCrossRefGoogle Scholar
  25. Gallant SI (1999) Perceptron based learning algorithms. IEEE Trans Neural Netw 1(2): 179–191CrossRefGoogle Scholar
  26. Hasan S, Bender O, Ney H (2006) Reranking translation hypotheses using structural properties. In: EACL-2006: 11th conference of the European chapter of the Association for Computational Linguistics, proceedings of the conference, Trento, pp 41–48Google Scholar
  27. Koehn P (2004) Statistical significance tests for machine translation evaluation. In: Proceedings of the 2004 conference on empirical methods in natural language processing, Barcelona, pp 388–395Google Scholar
  28. Koehn P, Hoang H (2007) Factored translation models. In: Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CONLL 2007), Prague, pp 868–876Google Scholar
  29. Koehn P, Och FJ, Marcu D (2003) Statistical phrase-based translation. In: HLT-NAACL 2003: conference combining human language technology conference series and the North American chapter of the Association for Computational Linguistics conference series, Edmonton, pp 48–54Google Scholar
  30. Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: open source toolkit for statistical machine translation. In: ACL 2007, proceedings of the interactive poster and demonstration sessions, Prague, pp 177–180Google Scholar
  31. Kulesza A, Shieber, S (2004) A learning approach to improving sentence-level MT evaluation. In: TMI-2004: proceedings of the tenth conference on theoretical and methodological issues in machine translation, Baltimore, pp 75–84Google Scholar
  32. Li Z, Khudanpur S (2008) Large-scale discriminative n-gram language models for statistical machine translation. In: AMTA-2008: MT at work: proceedings of the eighth conference of the Association for Machine Translation in the Americas, Waikiki, pp 133–142Google Scholar
  33. Liang P, Bouchard-Côté A, Klein D, Taskar B (2006) An end-to-end discriminative approach to machine translation. In: COLING ACL 2006, 21st international conference on computational linguistics and 44th annual meeting of the Association for Computational Linguistics, proceedings of the conference, Sydney, pp 761–768Google Scholar
  34. Lin CY, Och FJ (2004) Orange: a method for evaluating automatic evaluation metrics for machine translation. In: 20th international conference on computational linguistics, proceedings, vol I, Geneva, pp 501–507Google Scholar
  35. Marcus M, Kim G, Marcinkiewicz MA, Macintyre R, Bies A, Ferguson M, Katz K, Schasberger B (1994) The Penn Treebank: annotating predicate argument structure. In: Human language technology, proceedings of a workshop, Plainsboro, pp 114–119Google Scholar
  36. McDonald R (2007) Characterizing the errors of data-driven dependency parsing models. In: EMNLP-CoNLL 2007: proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning, Prague, pp 121–131Google Scholar
  37. Mohit B, Hwa R (2007) Localization of difficult-to-translate phrases. In: Proceedings of the second workshop on statistical machine translation (WMT 2007), Prague, pp 248–255Google Scholar
  38. Och FJ (2003) Minimum error rate training in statistical machine translation. In: 41st annual meeting of the Association for Computational Linguistics, proceedings of the conference, Sapporo, pp 160–167Google Scholar
  39. Och FJ, Ney H (2000) Improved statistical alignment models. In: 38th annual meeting of the Association for Computational Linguistics, proceedings of the conference, Hong Kong, pp 440–447Google Scholar
  40. Och FJ, Gildea D, Khudanpur S, Sarkar A, Yamada K, Fraser A, Kumar S, Shen L, Smith D, Eng K, Jain V, Jin Z, Radev D (2003) Syntax for statistical machine translation. Tech. Rep. IRCS-00-07. Johns Hopkins 2003 Summer Workshop, BaltimoreGoogle Scholar
  41. Och FJ, Gildea D, Khudanpur S, Sarkar A, Yamada K, Fraser A, Kumar S, Shen L, Smith D, Eng K, Jain V, Jin Z, Radev D (2004) A smorgasbord of features for statistical machine translation. In: HLT-NAACL 2004: human language technology conference of the North American chapter of the Association for Computational Linguistics, proceedings of the main conference, Boston, pp 161–168Google Scholar
  42. Papineni K, Roukos S, Ward T, Zhu WJ (2002) Bleu: a method for automatic evaluation of machine translation. In: 40th annual meeting of the Association for Computational Linguistics, proceedings of the conference, Philadelphia, pp 311–318Google Scholar
  43. Post M, Gildea D (2008) Parsers as language models for statistical machine translation. In: AMTA-2008: MT at work: proceedings of the Eighth conference of the Association for Machine Translation in the Americas, Waikiki, pp 172–181Google Scholar
  44. Roark B, Saraclar M, Collins M (2004a) Corrective language modeling for large vocabulary ASR with the perceptron algorithms. In: Proceedings of the international conference on acoustics, speech and signal processing (ICASSP 2004), Montreal, pp 749–752Google Scholar
  45. Roark B, Saraclar M, Collins M, Johnson M (2004b) Discriminative language modeling with conditional random fields and the perceptron algorithm. In: ACL-04, 42nd annual meeting of the Association for Computational Linguistics, proceedings of the conference, Barcelona, pp 47–54Google Scholar
  46. Roark B, Saraclar M, Collins M (2007) Discriminative n-gram language modeling. Comput Speech Lang 21(2): 373–392CrossRefGoogle Scholar
  47. Rosenblatt F (1958) The perceptron: a probabilistic model for information storage and organization in the brain. Neurocomput Found Res 65(6): 386–408MathSciNetGoogle Scholar
  48. Shen L, Sarkar A, Och FJ (2004) Discriminative reranking for machine translation. In: HLT-NAACL 2004: human language technology conference of the North American chapter of the Association for Computational Linguistics, proceedings of the main Conference, Boston, pp 177–184Google Scholar
  49. Singh-Miller N, Collins C (2007) Trigger-based language modeling using a loss-sensitive perceptron algorithm. In: Proceedings of the international conference on acoustics, speech and signal processing (ICASSP 2007), Honolulu, pp 25–28Google Scholar
  50. Stolcke A (2002) SRILM—an extensible language modeling toolkit. In: Proceedings of the international conference on spoken language processing (ICSLP 2002), Denver, pp 901–904Google Scholar
  51. Tillmann C, Zhang T (2006) A discriminative global training algorithm for statistical MT. In: COLING ACL 2006, 21st international conference on computational linguistics and 44th annual meeting of the Association for Computational Linguistics, proceedings of the conference, Sydney, pp 721–728Google Scholar
  52. Watanabe T, Suzuki J, Tsukada J, Isozaki H (2007) Online large-margin training for statistical machine translation. In: Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CONLL 2007), Prague, pp 764–773Google Scholar

Copyright information

© The Author(s) 2011

Authors and Affiliations

  1. 1.ISLAUniversity of AmsterdamAmsterdamThe Netherlands

Personalised recommendations