ENTF: An Entropy-Based MT Evaluation Metric

Yu, Hui; Xu, Weizhi; Lin, Shouxun; Liu, Qun

doi:10.1007/978-981-10-7134-8_7

Hui Yu¹¹,
Weizhi Xu¹¹,
Shouxun Lin¹² &
…
Qun Liu^12,13

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 787))

Included in the following conference series:

China Workshop on Machine Translation

493 Accesses

Abstract

The widely-used automatic evaluation metrics cannot adequately reflect the fluency of the translations. The n-gram-based metrics, like BLEU, limit the maximum length of matched fragments to n and cannot catch the matched fragments longer than n, so they can only reflect the fluency indirectly. METEOR, which is not limited by n-gram, uses the number of matched chunks but it does not consider the length of each chunk. In this paper, we propose an entropy-based metric (ENTF), which can sufficiently reflect the fluency of translations through the distribution of matched words. To evaluate the accuracy, we also introduce the unigram F-score into the new metric. Experiment shows that ENTF obtains state-of-the-art performance on system level, and is comparable with METEOR on sentence level on into English direction on WMT 2012, WMT 2013 and WMT 2014.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The words in each chunk are in adjacent positions in the hypothesis, and are also mapped to unigrams that are in adjacent positions in the reference.
2.
http://www.cs.cmu.edu/~alavie/METEOR/.
3.
http://wordnet.princeton.edu/.
4.
ftp://jaguar.ncsl.nist.gov/mt/resources/mteval-v13a.pl.
5.
http://www.cs.umd.edu/~snover/tercom.
6.
http://www.cs.cmu.edu/~alavie/METEOR/download/meteor-1.4.tgz.
7.
ftp://jaguar.ncsl.nist.gov/mt/resources/mteval-v13a.pl.
8.
http://www.cs.cmu.edu/~alavie/METEOR/download/meteor-1.4.tgz.

References

Chan, Y.S., Ng, H.T.: Maxsim: a maximum similarity metric for machine translation evaluation. In: Proceedings of ACL 2008: HLT, pp. 55–62 (2008)
Google Scholar
Chen, B., Kuhn, R.: Amber: a modified bleu, enhanced ranking metric. In: Proceedings of the Sixth Workshop on Statistical Machine Translation, pp. 71–77. Association for Computational Linguistics, Edinburgh, Scotland, July 2011. http://www.aclweb.org/anthology/W11-2105
Chen, B., Kuhn, R., Foster, G.: Improving amber, an MT evaluation metric. In: Proceedings of the Seventh Workshop on Statistical Machine Translation, WMT 2012, pp. 59–63. Association for Computational Linguistics, Stroudsburg, PA, USA (2012). http://dl.acm.org/citation.cfm?id=2393015.2393021
Comelles, E., Atserias, J.: Verta participation in the WMT14 metrics task. In: Proceedings of the Ninth Workshop on Statistical Machine Translation, pp. 368–375. Association for Computational Linguistics, Baltimore, Maryland, USA, June 2014. http://www.aclweb.org/anthology/W14-3347
Doddington, G.: Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In: Proceedings of the Second International Conference on Human Language Technology Research, HLT 2002, pp. 138–145. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (2002). http://dl.acm.org/citation.cfm?id=1289189.1289273
Gautam, S., Bhattacharyya, P.: Layered: metric for machine translation evaluation. In: Proceedings of the Ninth Workshop on Statistical Machine Translation, pp. 387–393. Association for Computational Linguistics, Baltimore, Maryland, USA, June 2014. http://www.aclweb.org/anthology/W14-3350
Gonzàlez, M., Barrón-Cedeño, A., Màrquez, L.: Ipa and stout: leveraging linguistic and source-based features for machine translation evaluation. In: Proceedings of the Ninth Workshop on Statistical Machine Translation, pp. 394–401. Association for Computational Linguistics, Baltimore, Maryland, USA, June 2014. http://www.aclweb.org/anthology/W14-3351
Joty, S., Guzmán, F., Màrquez, L., Nakov, P.: Discotk: using discourse structure for machine translation evaluation. In: Proceedings of the Ninth Workshop on Statistical Machine Translation, pp. 402–408. Association for Computational Linguistics, Baltimore, Maryland, USA, June 2014. http://www.aclweb.org/anthology/W14-3352
Kendall, M.G.: A new measure of rank correlation. Biometrika 30(1/2), 81–93 (1938)
Article MATH Google Scholar
Lavie, A., Agarwal, A.: Meteor: an automatic metric for MT evaluation with high levels of correlation with human judgments. In: Proceedings of the Second Workshop on Statistical Machine Translation. StatMT 2007, pp. 228–231. Association for Computational Linguistics, Stroudsburg, PA, USA (2007). http://dl.acm.org/citation.cfm?id=1626355.1626389
Liu, D., Gildea, D.: Syntactic features for evaluation of machine translation. J. Colloid Interface Sci. 332(2), 291–297 (2005)
Google Scholar
Lo, C.k., Wu, D.: Meant: an inexpensive, high-accuracy, semi-automatic metric for evaluating translation utility based on semantic roles. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 220–229. Association for Computational Linguistics, Portland, Oregon, USA, June 2011. http://www.aclweb.org/anthology/P11-1023
Macháček, M., Bojar, O.: Approximating a deep-syntactic metric for MT evaluation and tuning. In: Proceedings of the Sixth Workshop on Statistical Machine Translation, pp. 92–98. Association for Computational Linguistics (2011)
Google Scholar
Mehay, D., Brew, C.: BLEUÂTRE: flattening syntactic dependencies for MT evaluation. In: Proceedings of the 11th Conference on Theoretical and Methodological Issues in Machine Translation (TMI) (2007)
Google Scholar
Och, F.: Minimum error rate training in statistical machine translation. In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, vol. 1, pp. 160–167. Association for Computational Linguistics (2003)
Google Scholar
Owczarzak, K., van Genabith, J., Way, A.: Labelled dependencies in machine translation evaluation. In: Proceedings of the Second Workshop on Statistical Machine Translation, StatMT 2007, pp. 104–111. Association for Computational Linguistics, Stroudsburg, PA, USA (2007). http://dl.acm.org/citation.cfm?id=1626355.1626369
Papineni, K., Roukos, S., Ward, T., Zhu, W.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting on association for computational linguistics, pp. 311–318. Association for Computational Linguistics (2002)
Google Scholar
Pirie, W.: Spearman rank correlation coefficient. Encyclopedia of statistical sciences (1988)
Google Scholar
Porter, M.F.: Snowball: a language for stemming algorithms (2001)
Google Scholar
Shannon, C.E.: Communication theory of secrecy systems*. Bell Syst. Tech. J. 28(4), 656–715 (1949)
Article MATH MathSciNet Google Scholar
Snover, M., Dorr, B., Schwartz, R., Micciulla, L., Makhoul, J.: A study of translation edit rate with targeted human annotation. In: Proceedings of Association For Machine Translation in the Americas, pp. 223–231 (2006)
Google Scholar
Zhu, J., Yang, M., Wang, B., Li, S., Zhao, T.: All in strings: a powerful string-based automatic mt evaluation metric with multiple granularities. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, COLING 2010, pp. 1533–1540. Association for Computational Linguistics, Stroudsburg, PA, USA (2010). http://dl.acm.org/citation.cfm?id=1944566.1944741

Download references

Acknowledgements

This work is supported by National Natural Science Foundation of P. R. China under Grant Nos. 61379086, 61602284, 61602285, 61602282 and Shandong Provincial Natural Science Foundation of China under Grant No. ZR2015FQ009. Qun Liu’s work is partially supported by the Science Foundation Ireland (Grant 13/RC/2106) as part of the ADAPT Centre at Dublin City University.

Author information

Authors and Affiliations

School of Management Science and Engineering, Shandong Normal University, Jinan, China
Hui Yu & Weizhi Xu
Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Shouxun Lin & Qun Liu
ADAPT Centre, School of Computing, Dublin City University, Dublin, Ireland
Qun Liu

Authors

Hui Yu
View author publications
You can also search for this author in PubMed Google Scholar
Weizhi Xu
View author publications
You can also search for this author in PubMed Google Scholar
Shouxun Lin
View author publications
You can also search for this author in PubMed Google Scholar
Qun Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qun Liu .

Editor information

Editors and Affiliations

University of Macau, Macau SAR, China
Derek F. Wong
Soochow University, Suzhou, China
Deyi Xiong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yu, H., Xu, W., Lin, S., Liu, Q. (2017). ENTF: An Entropy-Based MT Evaluation Metric. In: Wong, D., Xiong, D. (eds) Machine Translation. CWMT 2017. Communications in Computer and Information Science, vol 787. Springer, Singapore. https://doi.org/10.1007/978-981-10-7134-8_7

Download citation

DOI: https://doi.org/10.1007/978-981-10-7134-8_7
Published: 14 November 2017
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-7133-1
Online ISBN: 978-981-10-7134-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics