Multimedia and Network Information Systems pp 241-249 | Cite as
Semi-automatic and Human-Aided Translation Evaluation Metric (HMEANT) for Polish Language in Re-speaking and MT Assessment
Abstract
In this article we report the initial results of experiments using HMEANT metric (semi-automatic evaluation metric used for scoring translation quality by matching semantic role fillers) on the Polish language. The metric is evaluated in the task of Machine Translation (MT) and in re-speaking quality assessment. GUI-based annotation interface was developed and with this tool (https://github.com/krzwolk/HMEANT-metric-for-Polish) evaluation was conducted practically by not IT-related personnel. Reliability, correlation with automatic metrics, language independence and time costs were analysed as well. Role labelling and alignment using GUI interface were done by two annotators with no related background (they were only instructed for about 10 min). The results of our experiments showed high inter-annotator agreement as far as role labelling was concerned and a good correlation of the HMEANT metric with human judgements based on re-speaking evaluation.
Keywords
HMEANT metric HMEANT polish Machine translation evaluation Text quality assessment Re-speaking evaluationNotes
Acknowledgments
This research was supported by Polish-Japanese Academy of Information Technology statutory resources (ST/MUL/2016) and resources for young researchers.
References
- 1.Papineni, K., et al.: BLEU: a method for automatic evaluation of machine translation. Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 311–318. Association for Computational Linguistics (2002)Google Scholar
- 2.Doddington, G.: Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In: Proceedings of the Second International Conference on Human Language Technology Research, pp. 138–145. Morgan Kaufmann Publishers Inc. (2002)Google Scholar
- 3.Koehn, P., et al.: Moses: Open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, pp. 177–180. Association for Computational Linguistics (2007)Google Scholar
- 4.Banerjee, S., Lavie, A.: METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pp. 65–72 (2005)Google Scholar
- 5.Lo, C., Wu, D.: MEANT: an inexpensive, high-accuracy, semi-automatic metric for evaluating translation utility via semantic frames. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 220–229. Association for Computational Linguistics (2011)Google Scholar
- 6.Snover, M., et al.: A study of translation edit rate with targeted human annotation. In: Proceedings of Association for Machine Translation in the Americas, pp. 223–231 (2006)Google Scholar
- 7.Birch, A., et al.: The feasibility of HMEANT as a human MT evaluation metric. In: Proceedings of the Eighth Workshop on Statistical Machine Translation, pp. 52–61 (2013)Google Scholar
- 8.Bojar, O., Wu, D.: Towards a predicate-argument evaluation for MT. In: Proceedings of the Sixth Workshop on Syntax, Semantics and Structure in Statistical Translation, pp. 30–38. Association for Computational Linguistics (2012)Google Scholar
- 9.Pradhan, S., et al.: Shallow Semantic Parsing using Support Vector Machines. HLT-NAACL, pp. 233–240 (2004)Google Scholar
- 10.Wołk, K., Korzinek, D.: Comparison and Adaptation of Automatic Evaluation Metrics for Quality Assessment of Re-Speaking. arXiv:1601.02789 (2016)