BiMEANT: Integrating Cross-Lingual and Monolingual Semantic Frame Similarities in the MEANT Semantic MT Evaluation Metric

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8791)

Abstract

We present experimental results showing that integrating cross-lingual semantic frame similarity into the semantic frame based automatic MT evaluation metric MEANT improves its correlation with human judgment on evaluating translation adequacy. Recent work shows that MEANT more accurately reflects translation adequacy than other automatic MT evaluation metrics such as BLEU or TER, and that moreover, optimizing SMT systems against MEANT robustly improves translation quality across different output languages. However, in some cases the human reference translation employs different scoping strategies from the input sentence and thus standard monolingual MEANT, which only assesses translation quality via the semantic frame similarity between the reference and machine translations, fails to fairly and accurately reward the adequacy of the machine translation. To address this issue we propose a new bilingual metric, BiMEANT, that correlates with human judgment more closely than MEANT by incorporating new cross-lingual semantic frame similarity assessments into MEANT.

Notes

Acknowledgment

This material is based upon work supported in part by the Defense Advanced Research Projects Agency (DARPA) under BOLT contract nos. HR0011-12-C-0014 and HR0011-12-C-0016, and GALE contract nos. HR0011-06-C-0022 and HR0011-06-C-0023; by the European Union under the FP7 grant agreement no. 287658; and by the Hong Kong Research Grants Council (RGC) research grants GRF620811, GRF621008, and GRF612806. Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of DARPA, the EU, or RGC. Thanks to Markus Saers, Meriem Beloucif, and Karteek Addanki for supporting work, and to Pascale Fung, Yongsheng Yang and Zhaojun Wu for sharing the maximum entropy Chinese segmenter and C-ASSERT, the Chinese semantic parser.

References

  1. 1.
    Addanki, K., Lo, C., Saers, M., Wu, D.: LTG vs. ITG coverage of cross-lingual verb frame alternations. In: 16th Annual Conference of the European Association for Machine Translation (EAMT-2012), Trento, Italy, May 2012Google Scholar
  2. 2.
    Banerjee, S., Lavie, A.: METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In: Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, Ann Arbor, Michigan, June 2005Google Scholar
  3. 3.
    Brown, P.F., Della, P., Stephen, A., Della, P., Vincent, J., Mercer, R.L.: The mathematics of machine translation: parameter estimation. Comput. Linguist. 19(2), 263–311 (1993)Google Scholar
  4. 4.
    Callison-Burch, C., Fordyce, C., Koehn, P., Monz, C., Schroeder, J.: (meta-) evaluation of machine translation. In: Second Workshop on Statistical Machine Translation (WMT-07) (2007)Google Scholar
  5. 5.
    Callison-Burch, C., Fordyce, C., Koehn, P., Monz, C., Schroeder, J.: Further meta-evaluation of machine translation. In: Third Workshop on Statistical Machine Translation (WMT-08) (2008)Google Scholar
  6. 6.
    Callison-Burch, C., Osborne, M., Koehn, P.: Re-evaluating the role of BLEU in machine translation research. In: 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL-2006) (2006)Google Scholar
  7. 7.
    Castillo, J., Estrella, P.: Semantic textual similarity for MT evaluation. In: 7th Workshop on Statistical Machine Translation (WMT 2012) (2012)Google Scholar
  8. 8.
    Doddington, G.: Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In: The Second International Conference on Human Language Technology Research (HLT ’02), San Diego, California (2002)Google Scholar
  9. 9.
    Fung, P., Ngai, G., Yang, Y., Chen, B.: A maximum-entropy chinese parser augmented by transformation-based learning. ACM Trans. Asian Lang. Inf. Process. (TALIP) 3(2), 159–168 (2004)CrossRefGoogle Scholar
  10. 10.
    Fung, P., Wu, Z., Yang, Y., Wu, D.: Learning bilingual semantic frames: shallow semantic parsing vs. semantic role projection. In: The 11th International Conference on Theoretical and Methodological Issues in Machine Translation (TMI-07), Skovde, Sweden, pp. 75–84 (2007)Google Scholar
  11. 11.
    Giménez, J., Màrquez, L.: Linguistic features for automatic evaluation of heterogenous MT systems. In: Second Workshop on Statistical Machine Translation (WMT-07), Prague, Czech Republic, June 2007, pp. 256–264 (2007)Google Scholar
  12. 12.
    Giménez, J., Màrquez, L.: A smorgasbord of features for automatic MT evaluation. In: Third Workshop on Statistical Machine Translation (WMT-08), Columbus, Ohio, June 2008Google Scholar
  13. 13.
    Koehn, P., Monz, C.: Manual and automatic evaluation of machine translation between european languages. In: Workshop on Statistical Machine Translation (WMT-06) (2006)Google Scholar
  14. 14.
    Leusch, G., Ueffing, N., Ney, H.: CDer: Efficient MT evaluation using block movements. In: 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL-2006) (2006)Google Scholar
  15. 15.
    Liu, D., Gildea, D.: Syntactic features for evaluation of machine translation. In: Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, Ann Arbor, Michigan, June 2005Google Scholar
  16. 16.
    Lo, C., Addanki, K., Saers, M., Wu, D.: Improving machine translation by training against an automatic semantic frame based evaluation metric. In: 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013) (2013)Google Scholar
  17. 17.
    Lo, C., Beloucif, M., Saers, M., Wu, D.: XMEANT: better semantic MT evaluation without reference translations. In: 52nd Annual Meeting of the Association for Computational Linguistics (ACL 2014) (2014)Google Scholar
  18. 18.
    Lo, C., Beloucif, M., Wu, D.: Improving machine translation into Chinese by tuning against Chinese MEANT. In: International Workshop on Spoken Language Translation (IWSLT 2013) (2013)Google Scholar
  19. 19.
    Lo, C., Tumuluru, A.K., Wu, D.: Fully automatic semantic MT evaluation. In: 7th Workshop on Statistical Machine Translation (WMT 2012) (2012)Google Scholar
  20. 20.
    Lo, C., Wu, D.: MEANT: an inexpensive, high-accuracy, semi-automatic metric for evaluating translation utility based on semantic roles. In: 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL HLT 2011) (2011)Google Scholar
  21. 21.
    Lo, C., Wu, D.: SMT vs. AI redux: how semantic frames evaluate MT more accurately. In: 22nd International Joint Conference on Artificial Intelligence (IJCAI-11) (2011)Google Scholar
  22. 22.
    Lo, C., Wu, D.: Unsupervised vs. supervised weight estimation for semantic MT evaluation metrics. In: Sixth Workshop on Syntax, Semantics and Structure in Statistical Translation (SSST-6) (2012)Google Scholar
  23. 23.
    Lo, C., Wu, D.: Can informal genres be better translated by tuning on automatic semantic metrics? In: 14th Machine Translation Summit (MT Summit XIV) (2013)Google Scholar
  24. 24.
    Lo, C., Wu, D.: MEANT at WMT 2013: a tunable, accurate yet inexpensive semantic frame based MT evaluation metric. In: 8th Workshop on Statistical Machine Translation (WMT 2013) (2013)Google Scholar
  25. 25.
    Macháček, M., Bojar, O.: Results of the WMT13 metrics shared task. In: 8th Workshop on Statistical Machine Translation (WMT 2013), Sofia, Bulgaria, August 2013Google Scholar
  26. 26.
    Mihalcea, R., Corley, C., Strapparava, C.: Corpus-based and knowledge-based measures of text semantic similarity. In: The 21st National Conference on Artificial Intelligence (AAAI-06), vol. 21 (2006)Google Scholar
  27. 27.
    Nießen, S., Och, F. J., Leusch, G., Ney, H.: A evaluation tool for machine translation: fast evaluation for MT research. In: The 2nd International Conference on Language Resources and Evaluation (LREC 2000) (2000)Google Scholar
  28. 28.
    Owczarzak, K., van Genabith, J., Way, A.: Dependency-based automatic evaluation for machine translation. In: Syntax and Structure in Statistical Translation (SSST) (2007)Google Scholar
  29. 29.
    Owczarzak, K., van Genabith, J., Way, A.: Evaluating machine translation with LFG dependencies. Mach. Transl. 21, 95–119 (2007)CrossRefGoogle Scholar
  30. 30.
    Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: 40th Annual Meeting of the Association for Computational Linguistics (ACL-02), Philadelphia, Pennsylvania, July 2002, pp. 311–318 (2002)Google Scholar
  31. 31.
    Pradhan, S., Ward, W., Hacioglu, K., Martin, J. H., Jurafsky, D.: Shallow semantic parsing using support vector machines. In: Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL 2004) (2004)Google Scholar
  32. 32.
    Rios, M., Aziz, W., Specia, L.: TINE: a metric to assess MT adequacy. In: 6th Workshop on Statistical Machine Translation (WMT 2011) (2011)Google Scholar
  33. 33.
    Saers, M., Nivre, J., Wu, D.: Learning stochastic bracketing inversion transduction grammars with a cubic time biparsing algorithm. In: 11th International Conference on Parsing Technologies (IWPT’09), Paris, France, October 2009, pp. 29–32 (2009)Google Scholar
  34. 34.
    Saers, M., Wu, D.: Improving phrase-based translation via word alignments from stochastic inversion transduction grammars. In: Third Workshop on Syntax and Structure in Statistical Translation (SSST-3), Boulder, Colorado, June 2009, pp. 28–36 (2009)Google Scholar
  35. 35.
    Snover, M., Dorr, B., Schwartz, R., Micciulla, L., Makhoul, J.: A study of translation edit rate with targeted human annotation. In: 7th Biennial Conference Association for Machine Translation in the Americas (AMTA 2006), Cambridge, Massachusetts, August 2006, pp. 223–231 (2006)Google Scholar
  36. 36.
    Tumuluru, A. K., Lo, C., Wu, D.: Accuracy and robustness in measuring the lexical similarity of semantic role fillers for automatic semantic MT evaluation. In: 26th Pacific Asia Conference on Language, Information, and Computation (PACLIC 26) (2012)Google Scholar
  37. 37.
    Wang, M., Manning, C.D.: SPEDE: probabilistic edit distance metrics for MT evaluation. In: 7th Workshop on Statistical Machine Translation (WMT 2012) (2012)Google Scholar
  38. 38.
    Wu, D.: Stochastic inversion transduction grammars and bilingual parsing of parallel corpora. Comput. Linguist. 23(3), 377–403 (1997)Google Scholar
  39. 39.
    Zens, R., Ney, H.: A comparative study on reordering constraints in statistical machine translation. In: 41st Annual Meeting of the Association for Computational Linguistics (ACL-2003), Stroudsburg, Pennsylvania, pp. 144–151 (2003)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  1. 1.Human Language Technology Center, Department of Computer Science and EngineeringHong Kong University of Science and Technology (HKUST)KowloonHong Kong

Personalised recommendations