Skip to main content

ENTF: An Entropy-Based MT Evaluation Metric

  • 451 Accesses

Part of the Communications in Computer and Information Science book series (CCIS,volume 787)

Abstract

The widely-used automatic evaluation metrics cannot adequately reflect the fluency of the translations. The n-gram-based metrics, like BLEU, limit the maximum length of matched fragments to n and cannot catch the matched fragments longer than n, so they can only reflect the fluency indirectly. METEOR, which is not limited by n-gram, uses the number of matched chunks but it does not consider the length of each chunk. In this paper, we propose an entropy-based metric (ENTF), which can sufficiently reflect the fluency of translations through the distribution of matched words. To evaluate the accuracy, we also introduce the unigram F-score into the new metric. Experiment shows that ENTF obtains state-of-the-art performance on system level, and is comparable with METEOR on sentence level on into English direction on WMT 2012, WMT 2013 and WMT 2014.

Keywords

  • Automatic evaluation metric
  • Machine translation
  • Entropy-based metric

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-981-10-7134-8_7
  • Chapter length: 10 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   59.99
Price excludes VAT (USA)
  • ISBN: 978-981-10-7134-8
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   74.99
Price excludes VAT (USA)

Notes

  1. 1.

    The words in each chunk are in adjacent positions in the hypothesis, and are also mapped to unigrams that are in adjacent positions in the reference.

  2. 2.

    http://www.cs.cmu.edu/~alavie/METEOR/.

  3. 3.

    http://wordnet.princeton.edu/.

  4. 4.

    ftp://jaguar.ncsl.nist.gov/mt/resources/mteval-v13a.pl.

  5. 5.

    http://www.cs.umd.edu/~snover/tercom.

  6. 6.

    http://www.cs.cmu.edu/~alavie/METEOR/download/meteor-1.4.tgz.

  7. 7.

    ftp://jaguar.ncsl.nist.gov/mt/resources/mteval-v13a.pl.

  8. 8.

    http://www.cs.cmu.edu/~alavie/METEOR/download/meteor-1.4.tgz.

References

  1. Chan, Y.S., Ng, H.T.: Maxsim: a maximum similarity metric for machine translation evaluation. In: Proceedings of ACL 2008: HLT, pp. 55–62 (2008)

    Google Scholar 

  2. Chen, B., Kuhn, R.: Amber: a modified bleu, enhanced ranking metric. In: Proceedings of the Sixth Workshop on Statistical Machine Translation, pp. 71–77. Association for Computational Linguistics, Edinburgh, Scotland, July 2011. http://www.aclweb.org/anthology/W11-2105

  3. Chen, B., Kuhn, R., Foster, G.: Improving amber, an MT evaluation metric. In: Proceedings of the Seventh Workshop on Statistical Machine Translation, WMT 2012, pp. 59–63. Association for Computational Linguistics, Stroudsburg, PA, USA (2012). http://dl.acm.org/citation.cfm?id=2393015.2393021

  4. Comelles, E., Atserias, J.: Verta participation in the WMT14 metrics task. In: Proceedings of the Ninth Workshop on Statistical Machine Translation, pp. 368–375. Association for Computational Linguistics, Baltimore, Maryland, USA, June 2014. http://www.aclweb.org/anthology/W14-3347

  5. Doddington, G.: Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In: Proceedings of the Second International Conference on Human Language Technology Research, HLT 2002, pp. 138–145. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (2002). http://dl.acm.org/citation.cfm?id=1289189.1289273

  6. Gautam, S., Bhattacharyya, P.: Layered: metric for machine translation evaluation. In: Proceedings of the Ninth Workshop on Statistical Machine Translation, pp. 387–393. Association for Computational Linguistics, Baltimore, Maryland, USA, June 2014. http://www.aclweb.org/anthology/W14-3350

  7. Gonzàlez, M., Barrón-Cedeño, A., Màrquez, L.: Ipa and stout: leveraging linguistic and source-based features for machine translation evaluation. In: Proceedings of the Ninth Workshop on Statistical Machine Translation, pp. 394–401. Association for Computational Linguistics, Baltimore, Maryland, USA, June 2014. http://www.aclweb.org/anthology/W14-3351

  8. Joty, S., Guzmán, F., Màrquez, L., Nakov, P.: Discotk: using discourse structure for machine translation evaluation. In: Proceedings of the Ninth Workshop on Statistical Machine Translation, pp. 402–408. Association for Computational Linguistics, Baltimore, Maryland, USA, June 2014. http://www.aclweb.org/anthology/W14-3352

  9. Kendall, M.G.: A new measure of rank correlation. Biometrika 30(1/2), 81–93 (1938)

    CrossRef  MATH  Google Scholar 

  10. Lavie, A., Agarwal, A.: Meteor: an automatic metric for MT evaluation with high levels of correlation with human judgments. In: Proceedings of the Second Workshop on Statistical Machine Translation. StatMT 2007, pp. 228–231. Association for Computational Linguistics, Stroudsburg, PA, USA (2007). http://dl.acm.org/citation.cfm?id=1626355.1626389

  11. Liu, D., Gildea, D.: Syntactic features for evaluation of machine translation. J. Colloid Interface Sci. 332(2), 291–297 (2005)

    Google Scholar 

  12. Lo, C.k., Wu, D.: Meant: an inexpensive, high-accuracy, semi-automatic metric for evaluating translation utility based on semantic roles. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 220–229. Association for Computational Linguistics, Portland, Oregon, USA, June 2011. http://www.aclweb.org/anthology/P11-1023

  13. Macháček, M., Bojar, O.: Approximating a deep-syntactic metric for MT evaluation and tuning. In: Proceedings of the Sixth Workshop on Statistical Machine Translation, pp. 92–98. Association for Computational Linguistics (2011)

    Google Scholar 

  14. Mehay, D., Brew, C.: BLEUÂTRE: flattening syntactic dependencies for MT evaluation. In: Proceedings of the 11th Conference on Theoretical and Methodological Issues in Machine Translation (TMI) (2007)

    Google Scholar 

  15. Och, F.: Minimum error rate training in statistical machine translation. In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, vol. 1, pp. 160–167. Association for Computational Linguistics (2003)

    Google Scholar 

  16. Owczarzak, K., van Genabith, J., Way, A.: Labelled dependencies in machine translation evaluation. In: Proceedings of the Second Workshop on Statistical Machine Translation, StatMT 2007, pp. 104–111. Association for Computational Linguistics, Stroudsburg, PA, USA (2007). http://dl.acm.org/citation.cfm?id=1626355.1626369

  17. Papineni, K., Roukos, S., Ward, T., Zhu, W.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting on association for computational linguistics, pp. 311–318. Association for Computational Linguistics (2002)

    Google Scholar 

  18. Pirie, W.: Spearman rank correlation coefficient. Encyclopedia of statistical sciences (1988)

    Google Scholar 

  19. Porter, M.F.: Snowball: a language for stemming algorithms (2001)

    Google Scholar 

  20. Shannon, C.E.: Communication theory of secrecy systems*. Bell Syst. Tech. J. 28(4), 656–715 (1949)

    CrossRef  MATH  MathSciNet  Google Scholar 

  21. Snover, M., Dorr, B., Schwartz, R., Micciulla, L., Makhoul, J.: A study of translation edit rate with targeted human annotation. In: Proceedings of Association For Machine Translation in the Americas, pp. 223–231 (2006)

    Google Scholar 

  22. Zhu, J., Yang, M., Wang, B., Li, S., Zhao, T.: All in strings: a powerful string-based automatic mt evaluation metric with multiple granularities. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, COLING 2010, pp. 1533–1540. Association for Computational Linguistics, Stroudsburg, PA, USA (2010). http://dl.acm.org/citation.cfm?id=1944566.1944741

Download references

Acknowledgements

This work is supported by National Natural Science Foundation of P. R. China under Grant Nos. 61379086, 61602284, 61602285, 61602282 and Shandong Provincial Natural Science Foundation of China under Grant No. ZR2015FQ009. Qun Liu’s work is partially supported by the Science Foundation Ireland (Grant 13/RC/2106) as part of the ADAPT Centre at Dublin City University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qun Liu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2017 Springer Nature Singapore Pte Ltd.

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Yu, H., Xu, W., Lin, S., Liu, Q. (2017). ENTF: An Entropy-Based MT Evaluation Metric. In: Wong, D., Xiong, D. (eds) Machine Translation. CWMT 2017. Communications in Computer and Information Science, vol 787. Springer, Singapore. https://doi.org/10.1007/978-981-10-7134-8_7

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-7134-8_7

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-7133-1

  • Online ISBN: 978-981-10-7134-8

  • eBook Packages: Computer ScienceComputer Science (R0)