Machine Translation Evaluation: Manual Versus Automatic—A Comparative Study

Maurya, Kaushal Kumar; Ravindran, Renjith P.; Anirudh, Ch Ram; Murthy, Kavi Narayana

doi:10.1007/978-981-15-1097-7_45

Kaushal Kumar Maurya¹⁸,
Renjith P. Ravindran¹⁸,
Ch Ram Anirudh¹⁸ &
…
Kavi Narayana Murthy¹⁸

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1079))

991 Accesses
1 Citations

Abstract

The quality of machine translation (MT) is best judged by humans well versed in both source and target languages. However, automatic techniques are often used as these are much faster, cheaper and language independent. The goal of this paper is to check for correlation between manual and automatic evaluation, specifically in the context of Indian languages. To the extent automatic evaluation methods correlate with the manual evaluations, we can get the best of both worlds. In this paper, we perform a comparative study of automatic evaluation metrics—BLEU, NIST, METEOR, TER and WER, against the manual evaluation metric (adequacy), for English-Hindi translation. We also attempt to estimate the manual evaluation score of a given MT output from its automatic evaluation score. The data for the study was sourced from the Workshop on Statistical Machine Translation WMT14.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://github.com/moses-smt/mosesdecoder/blob/master/scripts/generic/mteval-v13a.pl.
2.
http://www.cs.cmu.edu/~alavie/METEOR/.
3.
http://www.cs.umd.edu/~snover/tercom/.
4.
We used the tconfint_mean function available in the statsmodels package in Python.

References

Banerjee, S., Lavie, A.: METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, vol. 29. University of Michigan, Ann Arbor, pp. 65–72 (2005)
Google Scholar
Bojar, O. et al.: Findings of the 2013 workshop on statistical machine translation. In: Proceedings of the Eighth Workshop on Statistical Machine Translation. Association for Computational Linguistics, Sofia, Bulgaria, pp. 1–44 (2013)
Google Scholar
Bojar, O. et al.: Findings of the 2014 workshop on statistical machine translation. In: Proceedings of the Ninth Workshop on Statistical Machine Translation. Association for Computational Linguistics. Baltimore, Maryland, USA, pp. 12–58 (2014)
Google Scholar
Bojar, O. et al. Findings of the 2016 conference on machine translation (WMT16). In: Proceedings of the First Conference on Machine Translation (WMT), vol. 2. Berlin, Germany, pp. 131–198 (2016)
Google Scholar
Callison-Burch, C., Osborne, M., Koehn, P.: Re-evaluation the role of Bleu in machine translation research. In: EACL-2006: 11th Conference of the European Chapter of the Association for Computational Linguistics. Trento, Italy, pp. 249–256 (2006)
Google Scholar
Carroll, J.B.: An experiment in evaluating the quality of translations. Mech. Transl. Comput. Linguist. 9(3-4), 55–66 (1966)
Google Scholar
Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Measur. 20(1), 37–46 (1960)
Article Google Scholar
Costa-jussà, M.R., Farrús, M.: Towards human linguistic machine translation evaluation. Digit. Scholarsh. Humanit. 30(2), 157–166 (2015)
Google Scholar
Denkowski, M., Lavie, A.: Meteor universal: language specific translation evaluation for any target language. In: Proceedings of the Ninth Workshop on Statistical Machine Translation. Baltimore, Maryland, USA, pp. 376–380 (2014)
Google Scholar
Doddington, G.: Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In: Proceedings of the Second International Conference on Human Language Technology Research. Morgan Kaufmann Publishers Inc. San Diego, California, pp. 138–145 (2002)
Google Scholar
Dungarwal, P. et al.: The IIT Bombay Hindi-English translation system at WMT 2014. In: Proceedings of the Ninth Workshop on Statistical Machine Translation. Baltimore, Maryland, USA, pp. 90–96 (2014)
Google Scholar
Farrús, M., Costa-jussà, M.R., Popović Morse, M.: Study and correlation analysis of linguistic, perceptual, and automatic machine translation evaluations. J. Assoc. Inf. Sci. Technol. 63(1), 174–184 (2012)
Google Scholar
Fomicheva, M. et al.: CobaltF: a fluent metric for MT evaluation. In: Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers, vol. 2, pp. 483–490 (2016)
Google Scholar
Gautam, S., Bhattacharyya, P.: LAYERED: metric for machine translation evaluation. In: Proceedings of the Ninth Workshop on Statistical Machine Translation. Baltimore, Maryland, USA, pp. 387–393 (2014)
Google Scholar
Giménez, J., Màrquez, L.: A smorgasbord of features for automatic MT evaluation. In: Proceedings of the Third Workshop on Statistical Machine Translation. StatMT ’08. Association for Computational Linguistics, Columbus, Ohio, pp. 195–198 (2008). ISBN: 978-1-932432-09-1
Google Scholar
Giménez, J., Màrquez, L.: Linguistic features for automatic evaluation of heterogenous MT systems. In: Proceedings of the Second Workshop on Statistical Machine Translation. StatMT ’07. Association for Computational Linguistics, Prague, Czech Republic, pp. 256–264 (2007)
Google Scholar
Gupta, A., Venkatapathy, S., Sangal, R.: Meteor-Hindi: automatic MT evaluation metric for hindi as a target language. In: Proceedings of ICON-2010: 8th International Conference on Natural Language Processing. Macmillan Publishers, Kharagpur, India (2010)
Google Scholar
King, M., Popescu-Belis, A., Hovy, E.: FEMTI: creating and using a framework for MT evaluation. In: Proceedings of MT Summit IX. New Orleans, USA, pp. 224–231 (2003)
Google Scholar
Koehn, P.: Statistical Machine Translation. Cambridge University Press (2009). Chap. 8
Google Scholar
Koehn, P. et al.: Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions. Association for Computational Linguistics. Prague, Czech Republic, pp. 177–180 (2007)
Google Scholar
Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)
Article Google Scholar
Narayan, D. et al.: An experience in building the indo WordNet-a WordNet for Hindi. In: First International Conference on Global WordNet. Mysore, India (2002)
Google Scholar
Papineni, K. et al.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Philadelphia, pp. 311–318 (2002)
Google Scholar
Pearson, K.: On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. London, Edinburgh, Dublin Philos. Mag. J. Sci. 50(302), 157–175 (1900)
Article Google Scholar
Snover, M. et al. A study of translation edit rate with targeted human annotation. In: Proceedings of the 7th Conference of the Association for Machine Translation in the Americas, “Visions for the Future of Machine Translation”. Cambridge, Massachusetts, USA, pp. 223–231 (2006)
Google Scholar
Su, K.-Y., Wu, M.-W., Chang, J.-S.: A new quantitative quality measure for machine translation systems. In: Proceedings of the 14th Conference on Computational linguistics, vol. 2. Association for Computational Linguistics, Nantes, pp. 433–439 (1992)
Google Scholar
Tan, L., Pal, S.: Manawi: using multi-word expressions and named entities to improve machine translation. In: Proceedings of the Ninth Workshop on Statistical Machine Translation. Baltimore, Maryland, USA, pp. 201–206 (2014)
Google Scholar
White, J., O’Connell, T., O’Mara, F.: The ARPA MT evaluation methodologies: evolution, lessons, and future approaches. In: Technology Partnerships for Crossing the Language Barrier: Proceedings of the First Conference of the Association for Machine Translation in the Americas. Columbia, Maryland, USA, pp. 193–205 (1994)
Google Scholar
White, J.S., Taylor, K.B.: A task-oriented evaluation metric for machine translation. In: Proceedings of Language Resources and Evaluation Conference, LREC-98, vol. 1. Granada, Spain, pp. 21–27 (1998)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer and Information Sciences, University of Hyderabad, Hyderabad, India
Kaushal Kumar Maurya, Renjith P. Ravindran, Ch Ram Anirudh & Kavi Narayana Murthy

Authors

Kaushal Kumar Maurya
View author publications
You can also search for this author in PubMed Google Scholar
Renjith P. Ravindran
View author publications
You can also search for this author in PubMed Google Scholar
Ch Ram Anirudh
View author publications
You can also search for this author in PubMed Google Scholar
Kavi Narayana Murthy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kaushal Kumar Maurya .

Editor information

Editors and Affiliations

Professor & Head, Department of Computer Science & Engineering, CMR Technical Campus, Hyderabad, India
K. Srujan Raju
Faculty of Applied Informatics, Tomas Bata University in Zlín, Zlín, Czech Republic
Roman Senkerik
Stanley College of Engineering and Technology, Hyderabad, Telangana, India
Satya Prasad Lanka
Department of EEE, Stanley College of Engineering and Technology, Hyderabad, Telangana, India
V. Rajagopal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Maurya, K.K., Ravindran, R.P., Anirudh, C.R., Murthy, K.N. (2020). Machine Translation Evaluation: Manual Versus Automatic—A Comparative Study. In: Raju, K.S., Senkerik, R., Lanka, S.P., Rajagopal, V. (eds) Data Engineering and Communication Technology. Advances in Intelligent Systems and Computing, vol 1079. Springer, Singapore. https://doi.org/10.1007/978-981-15-1097-7_45

Download citation

DOI: https://doi.org/10.1007/978-981-15-1097-7_45
Published: 09 January 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-1096-0
Online ISBN: 978-981-15-1097-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics