Quality evaluation in community post-editing

Mitchell, Linda; O’Brien, Sharon; Roturier, Johann

doi:10.1007/s10590-014-9160-1

Quality evaluation in community post-editing

Published: 21 November 2014

Volume 28, pages 237–262, (2014)
Cite this article

Machine Translation

Linda Mitchell¹,
Sharon O’Brien¹ &
Johann Roturier²

995 Accesses
9 Citations
Explore all metrics

Abstract

Machine translation is increasingly being deployed to translate user-generated content (UGC). In many situations, post-editing is required to ensure that the translations are correct and comprehensible for the users. Post-editing by professional translators is not always feasible in the context of UGC within online communities and so members of such communities are sometimes asked to translate or post-edit content on behalf of the community. How should we measure the quality of UGC that has been post-edited by community members? Is quality evaluation by community members a feasible alternative to professional evaluation techniques? This paper describes the outcomes of three quality evaluation methods for community post-edited content: (1) an error annotation performed by a trained linguist; (2) evaluation of fluency and fidelity by domain specialists; (3) evaluation of fluency by community members. The study finds that there are correlations of evaluation results between the domain specialist evaluation and the community evaluation for content machine translated from English into German in an online technical support community. Interestingly, the community evaluators were more critical in their ratings for fluency than the domain experts. Although the results of the error annotation seem to contradict those obtained in the domain specialist evaluation, a higher number of errors in the error annotation appear to result in lower scores in the domain specialist evaluation. We conclude that, within the context of this evaluation, post-editing by community members is feasible, though with considerable variation across individuals, and that evaluation by the community is also a feasible proposition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Quality Assessment of Subtitles – Challenges and Strategies

Human Post-editing in Hybrid Machine Translation Systems: Automatic and Manual Analysis and Evaluation

Measuring Difficulty in Translation and Post-editing: A Review

Notes

It is of importance to distinguish ‘community translation’, which refers to translation in an online community, from the concept of community translation that is also known as ‘public service translation’. In this paper, the term community translation will refer to online community translation.
https://www.evaluation.taus.net/about.
http://www.qt21.eu/launchpad/.
All evaluators in this study were domain specialists. Four of the seven evaluators (57%) were also trained linguists. See Sect. 2.2.2 for more detail.
ST: source text.
QC: quality control.
http://www.de.community.norton.com/.
http://www.accept-portal.eu.
A “task” in this context denotes a combination of a subject line, the initial question posted in the forum of one thread followed by the post of the thread that was marked as the solution to the problem, as presented in the left panel of Fig. 1.
http://www.community.norton.com/.
Note that the guidelines were provided in German, the working language of the community. Original guidelines are available in the Appendix.
http://www.accept-portal.eu.
Available for download at http://www.ida.liu.se/~sarst/blast/.
https://www.github.com/cfedermann/Appraise.
In de Almeida’s categorisation, the latter is an extra category. Here it is integrated into accuracy errors, as mistranslations affect accuracy and thus the fidelity of the translation.
There were two groups of post-editors editing comparable content (cf. Sect. 2.1).
This includes errors present in the raw MT output that remained uncorrected and errors introduced by the post-editors.
This was done to increase exposure of these sentences, starting from sentence 21, which was then the first sentence to be displayed, sentence 22 the second etc. This was deemed appropriate, as the average number of ratings in one sitting was eight.

References

Baer N (2010) Trends in crowdsourcing: case studies from not-for-profit and for-profit organisations. ATA 2010, Oct 27–30, 2010, Denver, Colorado, USA. http://www.wiki-translation.com/tiki-download_wiki_attachment.php?attId=62. Accessed 6 Jan 2014
Banerjee P (2013) Domain adaptation for statistical machine translation of corporate and user-generated content. Dissertation, Dublin City University
Blanchon H, Boitet C, Huynh C (2009) A web service enabling gradable post-edition of pre-translations produced by existing translation tools: practical use to provide high-quality translation of an online encyclopedia. MT Summit XII, Beyond translation memories: new tools for translators workshop. Ottawa, Canada, pp 20–27
de Almeida G (2013) Translating the post-editor: an investigation of post-editing changes and correlations with professional experience across two Romance languages. Dissertation, Dublin City University
Désilets A, van der Meer J (2011) Co-creating a repository of best-practices for collaborative translators. In: O’Hagan, M (ed) Linguistica antverpienisia new series: themes in translation studies. Translation as a social activity, 10/2011, pp 27–45
Drugan J (2013) Quality in professional translation: assessment and improvement. Bloomsbury, London
Google Scholar
Dugast L, Senellart J, Koehn P (2007) Statistical post-editing on SYSTRAN’s rule-based translation system. In: Proceedings of the second workshop on statistical machine translation. StatMT ’07. Association for Computational Linguistics. Prague, Czech Republic, pp 220–223
Flournoy RS, Callison-Burch C (2000) Reconciling user expectations and translation technology to create a useful real-world application. In: Proceedings of the Twenty-second International Conference on Translating and the Computer, 16–17 November 2000, London, United Kingdom
Federmann C (2010) Appraise: an open-source toolkit for manual phrase-based evaluation of translations. In: Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC ’10), Valetta, Malta, May 2010
Guerberof A (2009) Productivity and quality in the postediting of outputs from translation memories and machine translation. Localis Focus 7(1):11–21
Google Scholar
Hu C, Bederson BB, Resnik P (2010) Translation by iterative collaboration between monolingual users. In: Proceedings of Graphics Interface. GI ’10. Ottawa, Ontario, Canada, pp 39–46
Hu C, Bederson BB, Resnik P, Kronrod Y (2011) MonoTrans2: a new human computation system to support monolingual translation. In: Proceedings of the SIG-CHI Conference on Human Factors in Computing Systems. CHI ‘11. Vancouver, BC, Canada, pp 1133–1136
Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: open source toolkit for statistical machine translation. In: ACL 2007 Proceedings of demo and poster sessions. Czech Republic, Prague, pp 177–180
Koehn P (2010) Enabling monolingual translators: post-editing vs. options. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. HLT ’10. Los Angeles, California: Association for Computation Linguistics, pp 537–545
Koskinen K, Suojanen T, Tuominen T (forthcoming) User-centered translation, Translation Practices Explained series. Routledge, London
Kumaran A, Saravanan K, Maurice S (2008) wikiBABEL: community creation of multilingual data. In: Proceedings of the 4th International Symposium on Wikis. ACM, New York, NY, USA, pp 14:1–14:11
Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 33:159–174
Article MATH MathSciNet Google Scholar
LDC (2005) Linguistic data annotation specification: assessment of fluency and adequacy in translations. Revision 1:5
MacQueen, JB (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5\(^{\rm th}\) Berkeley Symposium on Mathematical Statistics and Probability. University of California Press, pp 281–297
McDonough Dolmaya J (2012) Analyzing the crowdsourcing model and its impact on public perceptions of translation. Translator 18(2):167–191
Article Google Scholar
Mesipuu M (2012) Translation crowdsourcing and user-translator motivation at Facebook and Skype. Translation spaces, vol 1. John Benjamins, Amsterdam, pp 33–53
Google Scholar
Mitchell L, Roturier J (2012) Evaluation of Machine-translated user generated content: a pilot study based on user ratings. In: EAMT 2012: Proceedings of the 16\(^{\rm th}\) Annual Meeting of the European Association for Machine Translation. Trento, Italy, pp 61–64
Mitchell L, Roturier J, O’Brien S (2013) Community-based post-editing of machine-translated content: monolingual vs. bilingual, In: Machine Translation Summit XIV, Workshop on Post-editing Technology and Practice. Nice, France, pp. 35–43
O’Brien S (2011) Towards predicting post-editing productivity. Mach Transl 25(3):197–215
Article Google Scholar
O’Brien S (2012) Towards a dynamic quality evaluation model for translation. J Specialised Transl (17):55–77
O’Hagan M (2011) Community translation: translation as a social activity and its possible consequences in the advent of Web 2.0 and beyond. In: O’Hagan M (ed) Linguistica antverpienisia new series: themes in translation studies. Translation as a social activity, 10/2011, pp 111–128
Perrino S (2009) User-generated translation: the future of translation in a Web 2.0 environment. J Specialised Transl 12:55–78
Google Scholar
Pielmeier, H, Kelly, N (2012) Translation production models, common sense advisory report. http://www.commonsenseadvisory.com/Portals/_default/Knowledgebase/ArticleImages/121130_R_Translation_Production_Models_Preview.pdf. Accessed 16 Jan 2014
Pym, A (2011) Translation research terms: a tentative glossary for moments of perplexity and dispute. In: Pym, A (ed.) From translation research projects 3 (Online). Tarragona: Intercultural Studies Group, pp 75–110. http://www.isg.urv.es/publicity/isg/publications/trp_3_2011/index.htm. Accessed 25 July 2014
Risku H, Windhager F, Apfelthaler M (2013) A dynamic network model of translatorial cognition and action. Transl Spaces 2:151–182
Article Google Scholar
Roturier J, Bensadoun A (2011) Evaluation of MT systems to translate user generated content. In: Proceedings of the Thirteenth Machine Translation Summit. Xiamen, China, pp 244–251
Roturier J, Mitchell L, Grabowski R, Siegel M (2012) Using automatic machine translation metrics to analyze the impact of source reformulations. AMTA-2012: the Tenth Biennial Conference of the Association for Machine Translation in the Americas. San Diego, CA, pp 138–144
Roturier J, Mitchell L, Silva D (2013) The ACCEPT Post-Editing environment: a flexible and customizable online tool to perform and analyse machine translation post-editing. Machine Translation Summit XIV. Workshop on Post-editing Technology and Practice. Nice, France, pp 119–128
StataCorp LP (2014) kappa: interrater agreement. http://www.stata.com/manuals13/rkappa.pdf. Accessed 10 Jan 2014
Stymne S (2011) BLAST: a tool for error analysis of machine translated output. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Student Session. Portland, Oregon, pp 56–61
Symantec (2013) Browser-based client demonstrator and adapted post-editing environment and evaluation portal prototype. Deliverable 5.6, The ACCEPT Project (FP7/2007-2013 grant agreement \(n^{\circ }\) 288769). http://www.accept.unige.ch/Products/D_5_6_Browser-based_client_demonstrator_and_adapted_post-editing_environment_and_evaluation_portal_prototypes.pdf. Accessed 15 Jan 2014
Tatsumi M, Aikawa T, Yamamoto K, Isahara H, (2012) How good is crowd post-editing? Its potential and limitations. In: AMTA 2012 Workshop on Post-Editing Technology and Practice (WPTP, (2012) Association for Machine Translation in the Americas (AMTA). San Diego, California, USA, pp 69–77
TAUS (2011) MT post-editing guidelines. https://www.taus.net/post-editing/machine-translation-post-editing-guidelines. Accessed 20 Apr 2013
Zaidan OF, Callison-Burch C (2011) Crowdsourcing translation: professional quality from non-professionals. In: Proceedings of the 49\(^{\rm th}\) Annual Meeting of the Association for Computational Linguistics. Portland, Oregon, pp 1220–1229

Download references

Acknowledgments

This research was funded by the European Union’s 7th Framework Programme via the ACCEPT Project (Grant agreement: 288769).

Author information

Authors and Affiliations

School of Applied Language and Intercultural Studies, Dublin City University, Dublin, Ireland
Linda Mitchell & Sharon O’Brien
Symantec Ltd., Ballycoolin Business Park, Blanchardstown, Dublin 15, Ireland
Johann Roturier

Authors

Linda Mitchell
View author publications
You can also search for this author in PubMed Google Scholar
Sharon O’Brien
View author publications
You can also search for this author in PubMed Google Scholar
Johann Roturier
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Linda Mitchell.

Appendices

Appendix 1: post-editing guidelines (German)

Tipps zum Nachbearbeiten:

Bearbeiten Sie den Text nach Ihrer Interpretation so, dass er flüssiger und klarer wird.
Versuchen Sie z.B. die Wortfolge und Rechtschreibung zu korrigieren, wenn durch diese der Text schwer oder nicht zu verstehen ist.
Verwenden Sie Wörter, Satzteile oder Zeichensetzung unbearbeitet, falls diese akzeptabel sind.
Wenn Sie mit Referenz zum Originaltext arbeiten, achten Sie darauf, dass keine Informationen hinzugefügt oder gelöscht wurden.

Appendix 2: fluency and fidelity scales for evaluators (German)

Sprachfluss	Vollständigkeit
Wie würden Sie die sprachliche Qualität der Übersetzung einschätzen? (Sprachfluss) It is:	Wie viel der in der Quellübersetzung enthaltenen Bedeutung wird auch in der Zielübersetzung ausgedrückt ? (Vollständigkeit)
5 perfekt	5 alles
4 gut	4 das Meiste
3 nicht muttersprachlich	3 vieles
2 zusammenhangslos	2 wenig
1 unverständlich	1 nichts

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mitchell, L., O’Brien, S. & Roturier, J. Quality evaluation in community post-editing. Machine Translation 28, 237–262 (2014). https://doi.org/10.1007/s10590-014-9160-1

Download citation

Received: 20 January 2014
Accepted: 24 September 2014
Published: 21 November 2014
Issue Date: December 2014
DOI: https://doi.org/10.1007/s10590-014-9160-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Quality evaluation in community post-editing

Abstract

Access this article

Similar content being viewed by others

Quality Assessment of Subtitles – Challenges and Strategies

Human Post-editing in Hybrid Machine Translation Systems: Automatic and Manual Analysis and Evaluation

Measuring Difficulty in Translation and Post-editing: A Review

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: post-editing guidelines (German)

Appendix 2: fluency and fidelity scales for evaluators (German)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Quality evaluation in community post-editing

Abstract

Access this article

Similar content being viewed by others

Quality Assessment of Subtitles – Challenges and Strategies

Human Post-editing in Hybrid Machine Translation Systems: Automatic and Manual Analysis and Evaluation

Measuring Difficulty in Translation and Post-editing: A Review

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: post-editing guidelines (German)

Appendix 2: fluency and fidelity scales for evaluators (German)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation