A Fluency Error Categorization Scheme to Guide Automated Machine Translation Evaluation

  • Debbie Elliott
  • Anthony Hartley
  • Eric Atwell
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3265)

Abstract

Existing automated MT evaluation methods often require expert human translations. These are produced for every language pair evaluated and, due to this expense, subsequent evaluations tend to rely on the same texts, which do not necessarily reflect real MT use. In contrast, we are designing an automated MT evaluation system, intended for use by post-editors, purchasers and developers, that requires nothing but the raw MT output. Furthermore, our research is based on texts that reflect corporate use of MT. This paper describes our first step in system design: a hierarchical classification scheme of fluency errors in English MT output, to enable us to identify error types and frequencies, and guide the selection of errors for automated detection. We present results from the statistical analysis of 20,000 words of MT output, manually annotated using our classification scheme, and describe correlations between error frequencies and human scores for fluency and adequacy.

Keywords

Error Type Machine Translation Text Type Source Text Parallel Corpus 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    White, J.S.: How to evaluate machine translation. In: Somers, H. (ed.) Computers and translation: a translator’s guide, pp. 211–244. J. Benjamins, Amsterdam (2003)Google Scholar
  2. 2.
    FEMTI: A Framework for the Evaluation of Machine Translation in ISLE (2004), http://www.issco.unige.ch/projects/isle/femti/
  3. 3.
    Papineni, K., Roukos, S., Ward, T., Zhu, W.: Bleu: a Method for Automatic Evaluation of Machine Translation. IBM Research Report RC22176. IBM: Yorktown Heights, NY (2001)Google Scholar
  4. 4.
    Akiba, Y., Imamura, K., Sumita, E.: Using multiple edit distances to automatically rank machine translation output. In: Proceedings of MT Summit VIII, Santiago de Compostela, Spain (2001)Google Scholar
  5. 5.
    Akiba, Y., Sumita, E., Nakaiwa, H., Yamamoto, S., Okuno, H.G.: Experimental Comparison of MT Evaluation Methods: RED vs. BLEU. In: Proceedings of MT Summit IX, New Orleans, Louisiana (2003)Google Scholar
  6. 6.
    Takezawa, T., Sumita, E., Sugaya, F., Yamamoto, H., Yamamoto, S.: Toward a broadcoverage bilingual corpus for speech translation of travel conversations in the real world. In: Proceedings of the Third International Conference on Language Resources and Evaluation (LREC), Las Palmas, Canary Islands, Spain (2002)Google Scholar
  7. 7.
    White, J., O’Connell, T., O’Mara, F.: The ARPA MT evaluation methodologies: evolution, lessons, and future approaches. In: Proceedings of the 1994 Conference, Association for Machine Translation in the Americas, Columbia, Maryland (1994)Google Scholar
  8. 8.
    Rajman, M., Hartley, A.: Automatically predicting MT systems rankings compatible with Fluency, Adequacy or Informativeness scores. In: Proceedings of the Fourth ISLE Evaluation Workshop, MT Summit VIII, Santiago de Compostela, Spain (2001)Google Scholar
  9. 9.
    Rajman, M., Hartley, A.: Automatic Ranking of MT Systems. In: Proceedings of the Third International Conference on Language Resources and Evaluation (LREC), Las Palmas, Canary Islands, Spain (2002)Google Scholar
  10. 10.
    Vanni, M., Miller, K.: Scaling the ISLE Framework: Validating Tests of Machine Translation Quality for Multi-Dimensional Measurement. In: Proceedings of the Fourth ISLE Evaluation Workshop, MT Summit VIII, Santiago de Compostela, Spain (2001)Google Scholar
  11. 11.
    Vanni, M., Miller, K.: Scaling the ISLE Framework: Use of Existing Corpus Resources for Validation of MT Evaluation Metrics across Languages. In: Proceedings of the Third International Conference on Language Resources and Evaluation (LREC), Las Palmas, Canary Islands, Spain (2002)Google Scholar
  12. 12.
    White, J., Forner, M.: Predicting MT fidelity from noun-compound handling. In: Proceedings of the Fourth ISLE Evaluation Workshop, MT Summit VIII, Santiago de Compostela, Spain (2001)Google Scholar
  13. 13.
    Reeder, F., Miller, K., Doyon, K., White, J.: The Naming of Things and the Confusion of Tongues. In: Proceedings of the Fourth ISLE Evaluation Workshop, MT Summit VIII, Santiago de Compostela, Spain (2001)Google Scholar
  14. 14.
    Elliott, D., Hartley, A., Atwell, E.: Rationale for a multilingual corpus for machine translation evaluation. In: Proceedings of CL 2003: International Conference on Corpus Linguistics, Lancaster University, UK (2003)Google Scholar
  15. 15.
    Elliott, D., Atwell, E., Hartley, A.: Compiling and Using a Shareable Parallel Corpus for Machine Translation Evaluation. In: Proceedings of the Workshop on The Amazing Utility of Parallel and Comparable Corpora, Fourth International Conference on Language Resources and Evaluation (LREC), Lisbon, Portugal (2004)Google Scholar
  16. 16.
    SAE J2450: Translation Quality Metric, Society of Automotive Engineers, Warrendale, USA (2001) Google Scholar
  17. 17.
    American Translators Association, Framework for Standard Error Marking, ATA Accreditation Program (2002), http://www.atanet.org/bin/view/fpl/12438.html
  18. 18.
    Correa, N.: A Fine-grained Evaluation Framework for Machine Translation System Development. In: Proceedings of MT Summit IX, New Orleans, Louisiana (2003)Google Scholar
  19. 19.
    Flanagan, M.: Error Classification for MT Evaluation. In: Technology Partnerships for Crossing the Language Barrier.In: Proceedings of the First Conference of the Association for Machine Translation in the Americas, Columbia, Maryland (1994)Google Scholar
  20. 20.
    Loffler-Laurian, A.-M.: Typologie des erreurs. In: La Traduction Automatique. Presses Universitaires Septentrion, Lille (1996)Google Scholar
  21. 21.
    Roudaud, B., Puerta, M.C., Gamrat, O.: A Procedure for the Evaluation and Improvement of an MT System by the End-User. In: Arnold, D., Humphreys, R.L., Sadler, L. (eds.) Special Issue on Evaluation of MT Systems. Machine Translation, vol. 8 (1993)Google Scholar
  22. 22.
    Van Slype, G.: Critical Methods for Evaluating the Quality of Machine Translation. Prepared for the European Commission Directorate General Scientific and Technical Information and Information Management. Report BR 19142. Bureau Marcel van Dijk (1979)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Debbie Elliott
    • 1
  • Anthony Hartley
    • 1
  • Eric Atwell
    • 1
  1. 1.School of Computing and Centre for Translation StudiesUniversity of LeedsUK

Personalised recommendations