Error Classification Using Automatic Measures Based on n-grams and Edit Distance

Benko, L’ubomír; Benkova, Lucia; Munkova, Dasa; Munk, Michal; Shulzenko, Danylo

doi:10.1007/978-3-031-20319-0_26

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1675))

Included in the following conference series:

International Conference on Advanced Research in Technologies, Information, Innovation and Sustainability

604 Accesses
1 Citations

Abstract

Machine translation (MT) evaluation plays an important task in the translation industry. The main issue in evaluating the MT quality is an unclear definition of translation quality. Several methods and techniques for measuring MT quality have been designed. Our study aims at interconnecting manual error classification with automatic metrics of MT evaluation. We attempt to determine the degrees of association between automatic MT metrics and error classes from English into inflectional Slovak. We created a corpus, which consists of English journalistic texts, taken from the British online newspaper The Guardian and their human and machine translations. The MT outputs, produced by Google translate, were manually annotated by three professionals using a categorical framework for error analysis and evaluated using reference proximity through the metrics of automated MT evaluation. The results showed that not all examined automatic metrics based on n-grams or edit distance should be implemented into a model for determining the MT quality. When determining the quality of machine translation in respect to syntactic-semantic correlativeness, it is sufficient to consider only the Recall, BLEU-4 or F-measure, ROUGE-L and NIST (based on n-grams) and the metric CharacTER, which is based on edit distance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Chow, J.: Lost in translation: fidelity-focused machine translation evaluation (2019). https://www.imperial.ac.uk/media/imperial-college/faculty-of-engineering/computing/public/1819-ug-projects/ChowJ-Lost-in-translation-fidelity-focused-machine-translation-evaluation.pdf
Castilho, S., Doherty, S., Gaspari, F., Moorkens, J.: Approaches to human and machine translation quality assessment. In: Moorkens, J., Castilho, S., Gaspari, F., Doherty, S. (eds.) Translation Quality Assessment. MTTA, vol. 1, pp. 9–38. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-91241-7_2
Chapter Google Scholar
Sepesy Maučec, M., Donaj, G.: Machine translation and the evaluation of its quality. In: Recent Trends in Computational Intelligence. IntechOpen (2020). https://doi.org/10.5772/intechopen.89063
Popović, M.: Error classification and analysis for machine translation quality assessment. In: Moorkens, J., Castilho, S., Gaspari, F., Doherty, S. (eds.) Machine Translation: Technologies and Applications. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-91241-7_7
Babych, B.: Automated MT evaluation metrics and their limitations. In: evista Tradumàtica: Tecnologies De La Traducció, 12 (2014). https://doi.org/10.5565/rev/tradumatica.70
Munk, M., Munková, D., Benko, Ľ: Identification of relevant and redundant automatic metrics for MT evaluation. In: Sombattheera, C., Stolzenburg, F., Lin, F., Nayak, A. (eds.) MIWAI 2016. LNCS (LNAI), vol. 10053, pp. 141–152. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49397-8_12
Chapter Google Scholar
Munk, M., Munkova, D.: Detecting errors in machine translation using residuals and metrics of automatic evaluation. J. Intell. Fuzzy Syst. 34, 3211–3223 (2018). https://doi.org/10.3233/JIFS-169504
Article Google Scholar
Papineni, K., Roukos, S., Ward, T., Zhu, W.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp. 311–318, Philadelphia (2002)
Google Scholar
Munk, M., Munkova, D., Benko, L.: Towards the use of entropy as a measure for the reliability of automatic MT evaluation metrics. J. Intell. Fuzzy Syst. 34, 3225–3233 (2018). https://doi.org/10.3233/JIFS-169505
Article Google Scholar
Banerjee, S., Lavie, A.: METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for MT and/or Summarization (ACL-05), pp. 65–72. Michigan (2005)
Google Scholar
Wołk, K., Koržinek, D.: Comparison and Adaptation of Automatic Evaluation Metrics for Quality Assessment of Re-Speaking (2016)
Google Scholar
Doddington, G.: Automatic evaluation of machine translation quality using n-gram co-occurrence statistics, pp. 138–145 (2002)
Google Scholar
Popović, M.: chrF: character n-gram F-score for automatic MT evaluation. In: Proceedings of the Tenth Workshop on Statistical Machine Translation. pp. 392–395. Association for Computational Linguistics, Stroudsburg, PA, USA (2015). https://doi.org/10.18653/v1/W15-3049
Popović, M.: chrF deconstructed: beta parameters and n-gram weights. In: Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers, pp. 499–504. Association for Computational Linguistics, Stroudsburg, PA, USA (2016). https://doi.org/10.18653/v1/W16-2341
Lin, C.-Y.: ROUGE: a package for automatic evaluation of summaries. In: Text Summarization Branches Out. pp. 74–81. Association for Computational Linguistics, Barcelona, Spain (2004)
Google Scholar
Jurafsky, D., Martin, J.: Speech and Language Processing (2020)
Google Scholar
Nießen, S., Och, F.J., Leusch, G., Ney, H.: An evaluation tool for machine translation: Fast evaluation for MT research. In: Proceedings of the 2nd International Conference on Language Resources and Evaluation (LREC-2000), pp. 39–45 (2000)
Google Scholar
Snover, M., Dorr, B., Schwartz, R., Micciulla, L., Makhoul, J.: A study of translation edit rate with targeted human annotation. In: Proceedings of Association for Machine Translation in the Americas, pp. 223–231 (2006)
Google Scholar
Wang, W., Peter, J.-T., Rosendahl, H., Ney, H.: CharacTer: translation edit rate on character level. In: Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers. pp. 505–510. Association for Computational Linguistics, Stroudsburg, PA, USA (2016). https://doi.org/10.18653/v1/W16-2342
Vaňko, J.: Kategoriálny rámec pre analýzu chýb strojového prekladu. In: Munkova, D. and Vaňko, J. (eds.) Mýliť sa je ľudské (ale aj strojové), pp. 83–100. UKF v Nitre, Nitra (2017)
Google Scholar
Lommel, A.: Metrics for translation quality assessment: a case for standardising error typologies. In: Moorkens, J., Castilho, S., Gaspari, F., Doherty, S. (eds.) Translation Quality Assessment. MTTA, vol. 1, pp. 109–127. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-91241-7_6
Chapter Google Scholar

Download references

Acknowledgements

This work was supported by the Slovak Research and Development Agency under contract No. APVV-18-0473 and Scientific Grant Agency of the Ministry of Education of the Slovak Republic (ME SR) and of Slovak Academy of Sciences (SAS) under the contract No. VEGA-1/0821/21.

Author information

Authors and Affiliations

Constantine the Philosopher University in Nitra, 949 01, Nitra, Slovakia
L’ubomír Benko, Lucia Benkova, Dasa Munkova, Michal Munk & Danylo Shulzenko

Authors

L’ubomír Benko
View author publications
You can also search for this author in PubMed Google Scholar
Lucia Benkova
View author publications
You can also search for this author in PubMed Google Scholar
Dasa Munkova
View author publications
You can also search for this author in PubMed Google Scholar
Michal Munk
View author publications
You can also search for this author in PubMed Google Scholar
Danylo Shulzenko
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to L’ubomír Benko .

Editor information

Editors and Affiliations

Universidad Estatal Península de Santa, La Libertad, Ecuador
Teresa Guarda
University of Minho, Guimarães, Portugal
Filipe Portela
BITrum Research Group, Leon, Spain
Maria Fernanda Augusto

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Benko, L., Benkova, L., Munkova, D., Munk, M., Shulzenko, D. (2022). Error Classification Using Automatic Measures Based on n-grams and Edit Distance. In: Guarda, T., Portela, F., Augusto, M.F. (eds) Advanced Research in Technologies, Information, Innovation and Sustainability. ARTIIS 2022. Communications in Computer and Information Science, vol 1675. Springer, Cham. https://doi.org/10.1007/978-3-031-20319-0_26

Download citation

DOI: https://doi.org/10.1007/978-3-031-20319-0_26
Published: 25 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20318-3
Online ISBN: 978-3-031-20319-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics