Advertisement

A Diagnostic Evaluation Approach for English to Hindi MT Using Linguistic Checkpoints and Error Rates

  • Renu Balyan
  • Sudip Kumar Naskar
  • Antonio Toral
  • Niladri Chatterjee
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7817)

Abstract

This paper addresses diagnostic evaluation of machine translation (MT) systems for Indian languages, English to Hindi MT in particular, assessing the performance of MT systems on relevant linguistic phenomena (checkpoints). We use the diagnostic evaluation tool DELiC4MT to analyze the performance of MT systems on various PoS categories (e.g. nouns, verbs). The current system supports only word level checkpoints which might not be as helpful in evaluating the translation quality as compared to using checkpoints at phrase level and checkpoints that deal with named entities (NE), inflections, word order, etc. We therefore suggest phrase level checkpoints and NEs as additional checkpoints for DELiC4MT. We further use Hjerson to evaluate checkpoints based on word order and inflections that are relevant for evaluation of MT with Hindi as the target language. The experiments conducted using Hjerson generate overall (document level) error counts and error rates for five error classes (inflectional errors, reordering errors, missing words, extra words, and lexical errors) to take into account the evaluation based on word order and inflections. The effectiveness of the approaches was tested on five English to Hindi MT systems.

Keywords

diagnostic evaluation automatic evaluation metrics DELiC4MT Hjerson checkpoints errors 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Snover, M., Madnani, N., Dorr, B.J., Schwartz, R.: Fluency, Adequacy, or HTER? Exploring Different Human Judgments with a Tunable MT Metric. In: Proceedings of the 4th EACL Workshop on Statistical Machine Translation, pp. 259–268. Association for Computational Linguistics, Athens (2009)Google Scholar
  2. 2.
    Callison-Burch, C., Fordyce, C., Koehn, P., Monz, C., Schroeder, J.: (Meta-) evaluation of machine translation. In: Proceedings of the Second Workshop on Statistical Machine Translation, Prague, Czech Republic, pp. 136–158 (2007)Google Scholar
  3. 3.
    Stymne, S., Ahrenberg, L.: On the practice of error analysis for machine translation evaluation. In: Proceedings of 8th International Conference on Language Resources and Evaluation (LREC 2012), Istanbul, Turkey, pp. 1785–1790 (2012)Google Scholar
  4. 4.
    Papineni, K., Roukos, S., Ward, T., Zhu, W.: BLEU: A method for automatic evaluation of machine translation. In: Proceedings of 40th Annual Meeting of the ACL, Philadelphia, PA, USA, pp. 311–318 (2002)Google Scholar
  5. 5.
    Doddington, G.: Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics. In: Proceedings of the Human Language Technology Conference (HLT), San Diego, CA, pp. 128–132 (2002)Google Scholar
  6. 6.
    Banerjee, S., Lavie, A.: METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the ACL 2005 Workshop on Intrinsic and Extrinsic Evaluation Measures for MT and/or Summarization, Ann Arbor, Michigan, pp. 65–72 (2005)Google Scholar
  7. 7.
    Lavie, A., Agarwal, A.: METEOR: An automatic metric for MT evaluation with high levels of correlation with human judgments. In: Proceedings of the Second ACL Workshop on Statistical Machine Translation, Prague, Czech Republic, pp. 228–231 (2007)Google Scholar
  8. 8.
    Snover, M., Dorr, B., Schwartz, R., Micciulla, L., Makhoul, J.: A study of translation edit rate with targeted human annotation. In: Proceedings of Association for Machine Translation in the Americas, AMTA 2006, Cambridge, MA, pp. 223–231 (2006)Google Scholar
  9. 9.
    Chatterjee, N., Balyan, R.: Towards Development of a Suitable Evaluation Metric for English to Hindi Machine Translation. International Journal of Translation 23(1), 7–26 (2011)Google Scholar
  10. 10.
    Gupta, A., Venkatapathy, S., Sangal, R.: METEOR-Hindi: Automatic MT Evaluation Metric for Hindi as a Target Language. In: Proceedings of ICON 2010: 8th International Conference on Natural Language Processing. Macmillan Publishers, India (2010)Google Scholar
  11. 11.
    Ananthakrishnan, R., Bhattacharyya, P., Sasikumar, M., Shah, R.: Some issues in automatic evaluation of English-Hindi MT: More blues for BLEU. In: Proceeding of 5th International Conference on Natural Language Processing (ICON 2007), Hyderabad, India (2007)Google Scholar
  12. 12.
    Chatterjee, N., Johnson, A., Krishna, M.: Some improvements over the BLEU metric for measuring the translation quality for Hindi. In: Proceedings of the International Conference on Computing: Theory and Applications, ICCTA 2007, Kolkata, India, pp. 485–490 (2007)Google Scholar
  13. 13.
    Moona, R.S., Sangal, R., Sharma, D.M.: MTeval: A Evaluation methodolgy for Machine Translation system. In: Proceedings of SIMPLE 2004, Kharagpur, India, pp. 15–19 (2004)Google Scholar
  14. 14.
    Toral, A., Naskar, S.K., Gaspari, F., Groves, D.: DELiC4MT: A Tool for Diagnostic MT Evaluation over User-defined Linguistic Phenomena. The Prague Bulletin of Mathematical Linguistics 98, 121–131 (2012)CrossRefGoogle Scholar
  15. 15.
    Popović, M.: Hjerson:An Open Source Tool for Automatic Error Classification of Machine Translation Output. The Prague Bulletin of Mathematical Linguistics 96, 59–68 (2011)Google Scholar
  16. 16.
    Vilar, D., Xu, J., Fernando, L., D’Haro, N.H.: Error analysis of statistical machine translation output. In: Proceedings of Fifth International Conference on Language Resources and Evaluation (LREC 2006), Genoa, Italy, pp. 697–702 (2006)Google Scholar
  17. 17.
    Farrús, M., Costa-jussà, M.R.: Mariño, J. B., Fonollosa, J. A. R.: Linguistic-based evaluation criteria to identify statistical machine translation errors. In: Proceedings of EAMT, Saint Raphaël, France, pp. 52–57 (2010)Google Scholar
  18. 18.
    Popović, M., Ney, H.: Towards automatic error analysis of machine translation output. Computational Linguistics 37(4), 657–688 (2011)MathSciNetCrossRefGoogle Scholar
  19. 19.
    Popović, M., Ney, H., Gispert, A.D., Mariño, J.B., Gupta, D., Federico, M., Lambert, P., Banchs, R.: Morpho-syntactic information for automatic error analysis of statistical machine translation output. In: StatMT 2006: Proceedings of the Workshop on Statistical Machine Translation, New York, pp. 1–6 (2006)Google Scholar
  20. 20.
    Popović, M., Burchardt, A.: From human to automatic error classification for machine translation output. In: Proceedings of EAMT 2011, Leuven, Belgium, pp. 265–272 (2011)Google Scholar
  21. 21.
    Popović, M.: rgbF: An Open Source Tool for n-gram Based Automatic Evaluation of Machine Translation Output. The Prague Bulletin of Mathematical Linguistics 98, 99–108 (2012)Google Scholar
  22. 22.
    Zeman, D., Fishel, M., Berka, J., Bojar, O.: Addicter: What Is Wrong with My Translations? The Prague Bulletin of Mathematical Linguistics 96, 79–88 (2011)CrossRefGoogle Scholar
  23. 23.
    Fishel, M., Sennrich, R., Popović, M., Bojar, O.: TerrorCat: a translation error categorization-based MT quality metric. In: WMT 2012 Proceedings of the Seventh Workshop on Statistical Machine Translation, Stroudsburg, PA, USA, pp. 64–70 (2012)Google Scholar
  24. 24.
    Xiong, D., Zhang, M., Li, H.: Error detection for statistical machine translation using linguistic features. In: Proceedings of ACL 2010, Uppsala, Sweden, pp. 604–611 (2010)Google Scholar
  25. 25.
    Zhou, M., Wang, B., Liu, S., Li, M., Zhang, D., Zhao, T.: Diagnostic Evaluation of Machine Translation Systems using Automatically Constructed Linguistic Checkpoints. In: Proceedings of 22nd International Conference on Computational Linguistics (COLING 2008), pp. 1121–1128. Manchester (2008)Google Scholar
  26. 26.
    Naskar, S.K., Toral, A., Gaspari, F., Ways, A.: A framework for Diagnostic Evaluation of MT based on Linguistic Checkpoints. In: Proceedings of the 13th Machine Translation Summit, Xiamen,China, pp. 529–536 (2011)Google Scholar
  27. 27.
    Popović, M., Ney, H.: Word Error Rates: Decomposition over POS classes and Applications for Error Analysis. In: Proceedings of the 2nd ACL 2007 Workshop on Statistical MachineTranslation (WMT 2007), Prague, Czech Republic, pp. 48–55 (2007)Google Scholar
  28. 28.
    Koehn, P.: Statistical Significance Tests for Machine Translation Evaluation. In: Proceedings of the Conference on Empirical Methods on Natural Language Processing, EMNLP, pp. 385–395 (2004)Google Scholar
  29. 29.
    Balyan, R., Naskar, S.K., Toral, A., Chatterjee, N.: A Diagnostic Evaluation Approach Targeting MT Systems for Indian Languages. In: Proceedings of the Workshop on Machine Translation and Parsing in Indian Languages (MTPIL-COLING 2012), Mumbai, India, pp. 61–72 (2012)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Renu Balyan
    • 1
  • Sudip Kumar Naskar
    • 2
  • Antonio Toral
    • 2
  • Niladri Chatterjee
    • 1
  1. 1.Indian Institute of Technology DelhiIndia
  2. 2.CNGL, School of ComputingDublin City UniversityDublinIreland

Personalised recommendations