Machine Translation

, Volume 21, Issue 2, pp 95–119 | Cite as

Evaluating machine translation with LFG dependencies

  • Karolina Owczarzak
  • Josef van Genabith
  • Andy Way
Article

Abstract

In this paper we show how labelled dependencies produced by a Lexical-Functional Grammar parser can be used in Machine Translation evaluation. In contrast to most popular evaluation metrics based on surface string comparison, our dependency-based method does not unfairly penalize perfectly valid syntactic variations in the translation, shows less bias towards statistical models, and the addition of WordNet provides a way to accommodate lexical differences. In comparison with other metrics on a Chinese–English newswire text, our method obtains high correlation with human scores, both on a segment and system level.

Keywords

Machine translation Evaluation metrics Lexical-Functional Grammar Labelled dependencies 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Albrecht JS, Hwa R (2007) Regression for sentence-level MT evaluation with pseudo references. In: ACL 2007, proceedings of the 45th annual meeting of the Association of Computational Linguistics, Prague, Czech Republic, pp 296–303Google Scholar
  2. Banerjee S, Lavie A (2005) METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In: Intrinsic and extrinsic evaluation measures for MT and/or summarization, proceedings of the ACL-05 workshop, Ann Arbor, Michigan, pp 65–73Google Scholar
  3. Bresnan J (2001) Lexical-functional syntax. Blackwell, OxfordGoogle Scholar
  4. Cahill A, Burke M, O’Donovan R, van Genabith J, Way A (2004) Long-distance dependency resolution in automatically acquired wide-coverage PCFG-based LFG approximations. In: ACL-04, 42nd annual meeting of the Association of Computational Linguistics, Barcelona, Spain, pp 320–327Google Scholar
  5. Cahill A, Burke M, O’Donovan R, van Genabith J, Way A (2008) Wide-coverage deep statistical parsing using automatic dependency structure annotation. Comput Ling 34: 81–124CrossRefGoogle Scholar
  6. Callison-Burch C, Osborne M, Koehn P (2006) Re-evaluating the role of BLEU in machine translation research. In: EACL-2006, 11th conference of the European chapter of the Association of Computational Linguistics, Oslo, Norway, pp 249–256Google Scholar
  7. Callison-Burch C, Fordyce C, Koehn P, Monz C, Schroeder J (2007) (Meta-)evaluation of machine translation. In: Proceedings of the second workshop on statistical machine translation, Prague, Czech Republic, pp 136–158Google Scholar
  8. Charniak E (2000) A maximum entropy inspired parser. In: 1st meeting of the North American chapter of the Association for Computational Linguistics Seattle, Washington, pp 132–139Google Scholar
  9. Collins M (1999) Head-driven statistical models for natural language parsing. PhD thesis, University of Pennsylvania, Philadelphia, PAGoogle Scholar
  10. Doddington G (2002) Automatic evaluation of MT quality using n-gram co-occurrence statistics. In: Proceedings of human language technology conference 2002, San Diego, CA, pp 138–145Google Scholar
  11. Fisher RA (1990) Statistical methods, experimental design, and scientific inference: A re-issue of Statistical methods for research workers, the design of experiments, and Statistical methods and scientific inference. Oxford University Press, OxfordGoogle Scholar
  12. Giménez J, Màrquez L (2007) Linguistic features for automatic evaluation of heterogeneous MT systems. In: Proceedings of the second workshop on statistical machine translation, Prague, Czech Republic, pp 256–264Google Scholar
  13. Kaplan RM, Bresnan J (1982) Lexical-functional grammar: a formal system for grammatical representation. In: Bresnan J (ed) The mental representation of grammatical relations, MIT Press, Cambridge, MA, 173–281. Repr. in: Dalrymple M, Kaplan RM, Maxwell J, Zaenen A (eds) Formal issues in lexical-functional grammar, Center for the Study of Language and Information, Stanford, 1995, pp 29–130Google Scholar
  14. Kauchak D, Barzilay R (2006) Paraphrasing for automatic evaluation. In: HLT-NAACL 2006: human language technology conference of the North American chapter of the Association of Computational Linguistics conference, New York, NY, pp 455–462Google Scholar
  15. Koehn P (2004) Pharaoh: A beam search decoder for phrase-based statistical machine translation models. In: Frederking RE, Taylor KB (eds) Machine translation: from real users to research, 6th conference of the Association for machine translation in the Americas, AMTA 2004, Washington, DC, USA, Springer, Berlin, pp 115–124Google Scholar
  16. Koehn P (2005) Europarl: a parallel corpus for statistical machine translation. In: MT summit X, the tenth machine translation summit, Phuket, Thailand, pp 79–86Google Scholar
  17. Kulesza A, Shieber SM (2004) A learning approach to improving sentence-level MT evaluation. In: Proceedings of the tenth conference on theoretical and methodological issues in machine translation TMI-04, Baltimore, Maryland, pp 75–84Google Scholar
  18. Leusch G, Ueffing N, Ney H (2006) CDER: efficient MT evaluation Using block movements. In: EACL-2006, 11th conference of the European chapter of the Association of Computational Linguistics, Trento, Italy, pp 241–248Google Scholar
  19. Liu D, Gildea D (2005) Syntactic features for evaluation of machine translation. In: Intrinsic and extrinsic evaluation measures for MT and/or summarization, proceedings of the ACL-05 workshop, Ann Arbor, Michigan, pp 25–32Google Scholar
  20. Melamed ID, Green R, Turian JP (2003) Precision and recall of machine translation. In: HLT-NAACL 2003: Human language technology conference of the North American chapter of the Association of Computational Linguistics, companion volume: short papers, student research workshop, demonstrations, tutorial abstracts, Edmonton, Alberta, Canada, pp 61–63Google Scholar
  21. Noreen EW (1989) Computer-intensive methods for testing hypotheses: an introduction. Wiley-Interscience, New York, NYGoogle Scholar
  22. Och FJ, Ney H (2003) A systematic comparison of various statistical alignment modes. Comput Ling 29: 19–51CrossRefGoogle Scholar
  23. Owczarzak K, Groves D, van Genabith J, Way A (2006) Contextual bitext-derived paraphrases in automatic MT evaluation. In: HLT-NAACL 2006 [workshop on] statistical machine translation, New York, NY, pp 86–93Google Scholar
  24. Owczarzak K, van Genabith J, Way A (2007a) Dependency-based automatic evaluation for machine translation. In: Proceedings of SSST, HLT-NAACL 2007/AMTA workshop on syntax and structure in statistical machine translation, Rochester, New York, pp 86–93Google Scholar
  25. Owczarzak K, van Genabith J, Way A (2007b) Labelled dependencies in machine translation evaluation. In: Proceedings of the second workshop on statistical machine translation, Prague, Czech Republic, pp 104–111Google Scholar
  26. Papineni K, Roukos S, Ward T, Zhu W (2002) BLEU: a method for automatic evaluation of machine translation. In: 40th annual meeting of the Association of Computational Linguistics, Philadelphia, Pennsylvania, pp 311–318Google Scholar
  27. Russo-Lassner G, Lin J, Resnik P (2005) A paraphrase-based approach to machine translation evaluation. Technical report LAMP-TR-125/CS-TR-4754/UMIACS-TR-2005-57, University of Maryland, College Park, MDGoogle Scholar
  28. Snover M, Dorr B, Schwartz R, Micciulla L, Makhoul J (2006) A study of translation error rate with targeted human annotation. In: AMTA 2006, proceedings of the 7th conference of the Association for machine translation in the Americas: visions for the future of machine translation, Cambridge, MA, pp 223–231Google Scholar
  29. Stroppa N, Owczarzak K (2007) A cluster-based representation for multi-system MT evaluation. In: TMI 2007, proceedings of the 11th international conference on theoretical and methodological issues in machine translation, Skövde [Sweden], pp 221–230Google Scholar
  30. Turian JP, Shen L, Melamed ID (2003) Evaluation of machine translation and its evaluation. In: MT summit IX, proceedings of the ninth machine translation summit, New Orleans, USA, pp 386–393Google Scholar
  31. Yang Y, Zhou M, Lin C-Y (2007) Sentence level machine translation evaluation as a ranking problem: one step aside from BLEU. In: Proceedings of the second workshop on statistical machine translation, Prague, Czech Republic, pp 240–247Google Scholar

Copyright information

© Springer Science+Business Media B.V. 2008

Authors and Affiliations

  • Karolina Owczarzak
    • 1
  • Josef van Genabith
    • 1
  • Andy Way
    • 1
  1. 1.School of ComputingDublin City UniversityGlasnevin, Dublin 9Ireland

Personalised recommendations