AMTA 2004: Machine Translation: From Real Users to Research pp 64-73 | Cite as
A Fluency Error Categorization Scheme to Guide Automated Machine Translation Evaluation
Abstract
Existing automated MT evaluation methods often require expert human translations. These are produced for every language pair evaluated and, due to this expense, subsequent evaluations tend to rely on the same texts, which do not necessarily reflect real MT use. In contrast, we are designing an automated MT evaluation system, intended for use by post-editors, purchasers and developers, that requires nothing but the raw MT output. Furthermore, our research is based on texts that reflect corporate use of MT. This paper describes our first step in system design: a hierarchical classification scheme of fluency errors in English MT output, to enable us to identify error types and frequencies, and guide the selection of errors for automated detection. We present results from the statistical analysis of 20,000 words of MT output, manually annotated using our classification scheme, and describe correlations between error frequencies and human scores for fluency and adequacy.
Keywords
Error Type Machine Translation Text Type Source Text Parallel CorpusPreview
Unable to display preview. Download preview PDF.
References
- 1.White, J.S.: How to evaluate machine translation. In: Somers, H. (ed.) Computers and translation: a translator’s guide, pp. 211–244. J. Benjamins, Amsterdam (2003)Google Scholar
- 2.FEMTI: A Framework for the Evaluation of Machine Translation in ISLE (2004), http://www.issco.unige.ch/projects/isle/femti/
- 3.Papineni, K., Roukos, S., Ward, T., Zhu, W.: Bleu: a Method for Automatic Evaluation of Machine Translation. IBM Research Report RC22176. IBM: Yorktown Heights, NY (2001)Google Scholar
- 4.Akiba, Y., Imamura, K., Sumita, E.: Using multiple edit distances to automatically rank machine translation output. In: Proceedings of MT Summit VIII, Santiago de Compostela, Spain (2001)Google Scholar
- 5.Akiba, Y., Sumita, E., Nakaiwa, H., Yamamoto, S., Okuno, H.G.: Experimental Comparison of MT Evaluation Methods: RED vs. BLEU. In: Proceedings of MT Summit IX, New Orleans, Louisiana (2003)Google Scholar
- 6.Takezawa, T., Sumita, E., Sugaya, F., Yamamoto, H., Yamamoto, S.: Toward a broadcoverage bilingual corpus for speech translation of travel conversations in the real world. In: Proceedings of the Third International Conference on Language Resources and Evaluation (LREC), Las Palmas, Canary Islands, Spain (2002)Google Scholar
- 7.White, J., O’Connell, T., O’Mara, F.: The ARPA MT evaluation methodologies: evolution, lessons, and future approaches. In: Proceedings of the 1994 Conference, Association for Machine Translation in the Americas, Columbia, Maryland (1994)Google Scholar
- 8.Rajman, M., Hartley, A.: Automatically predicting MT systems rankings compatible with Fluency, Adequacy or Informativeness scores. In: Proceedings of the Fourth ISLE Evaluation Workshop, MT Summit VIII, Santiago de Compostela, Spain (2001)Google Scholar
- 9.Rajman, M., Hartley, A.: Automatic Ranking of MT Systems. In: Proceedings of the Third International Conference on Language Resources and Evaluation (LREC), Las Palmas, Canary Islands, Spain (2002)Google Scholar
- 10.Vanni, M., Miller, K.: Scaling the ISLE Framework: Validating Tests of Machine Translation Quality for Multi-Dimensional Measurement. In: Proceedings of the Fourth ISLE Evaluation Workshop, MT Summit VIII, Santiago de Compostela, Spain (2001)Google Scholar
- 11.Vanni, M., Miller, K.: Scaling the ISLE Framework: Use of Existing Corpus Resources for Validation of MT Evaluation Metrics across Languages. In: Proceedings of the Third International Conference on Language Resources and Evaluation (LREC), Las Palmas, Canary Islands, Spain (2002)Google Scholar
- 12.White, J., Forner, M.: Predicting MT fidelity from noun-compound handling. In: Proceedings of the Fourth ISLE Evaluation Workshop, MT Summit VIII, Santiago de Compostela, Spain (2001)Google Scholar
- 13.Reeder, F., Miller, K., Doyon, K., White, J.: The Naming of Things and the Confusion of Tongues. In: Proceedings of the Fourth ISLE Evaluation Workshop, MT Summit VIII, Santiago de Compostela, Spain (2001)Google Scholar
- 14.Elliott, D., Hartley, A., Atwell, E.: Rationale for a multilingual corpus for machine translation evaluation. In: Proceedings of CL 2003: International Conference on Corpus Linguistics, Lancaster University, UK (2003)Google Scholar
- 15.Elliott, D., Atwell, E., Hartley, A.: Compiling and Using a Shareable Parallel Corpus for Machine Translation Evaluation. In: Proceedings of the Workshop on The Amazing Utility of Parallel and Comparable Corpora, Fourth International Conference on Language Resources and Evaluation (LREC), Lisbon, Portugal (2004)Google Scholar
- 16.SAE J2450: Translation Quality Metric, Society of Automotive Engineers, Warrendale, USA (2001) Google Scholar
- 17.American Translators Association, Framework for Standard Error Marking, ATA Accreditation Program (2002), http://www.atanet.org/bin/view/fpl/12438.html
- 18.Correa, N.: A Fine-grained Evaluation Framework for Machine Translation System Development. In: Proceedings of MT Summit IX, New Orleans, Louisiana (2003)Google Scholar
- 19.Flanagan, M.: Error Classification for MT Evaluation. In: Technology Partnerships for Crossing the Language Barrier.In: Proceedings of the First Conference of the Association for Machine Translation in the Americas, Columbia, Maryland (1994)Google Scholar
- 20.Loffler-Laurian, A.-M.: Typologie des erreurs. In: La Traduction Automatique. Presses Universitaires Septentrion, Lille (1996)Google Scholar
- 21.Roudaud, B., Puerta, M.C., Gamrat, O.: A Procedure for the Evaluation and Improvement of an MT System by the End-User. In: Arnold, D., Humphreys, R.L., Sadler, L. (eds.) Special Issue on Evaluation of MT Systems. Machine Translation, vol. 8 (1993)Google Scholar
- 22.Van Slype, G.: Critical Methods for Evaluating the Quality of Machine Translation. Prepared for the European Commission Directorate General Scientific and Technical Information and Information Management. Report BR 19142. Bureau Marcel van Dijk (1979)Google Scholar