Abstract
Despite much research on machine translation (MT) evaluation, there is surprisingly little work that directly measures users’ intuitive or emotional preferences regarding different types of MT errors. However, the elicitation and modeling of user preferences is an important prerequisite for research on user adaptation and customization of MT engines. In this paper we explore the use of conjoint analysis as a formal quantitative framework to assess users’ relative preferences for different types of translation errors. We apply our approach to the analysis of MT output from translating public health documents from English into Spanish. Our results indicate that word order errors are clearly the most dispreferred error type, followed by word sense, morphological, and function word errors. The conjoint analysis-based model is able to predict user preferences more accurately than a baseline model that chooses the translation with the fewest errors overall. Additionally we analyze the effect of using a crowd-sourced respondent population versus a sample of domain experts and observe that main preference effects are remarkably stable across the two samples.
Similar content being viewed by others
References
Al-Maskari A, Sanderson M (2006) The affect [sic] of machine translation on the performance of Arabic-English QA system. In: EACL-2006, 11th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the workshop on Multilingual Question Answering MLQA06, Trento, Italy, pp 9–14
Altman D (1991) Practical statistics for medical research. Chapman & Hall, London
Boutilier C, Brafman R, Geib C, Poole D (1997) A constraint-based approach to preference elicitation and decision making. In: AAAI Spring Symposium on Qualitative Preferences in Deliberation and Practical Reasoning, Stanford, CA, pp 19–28
Braziunas D (2006) Computational approaches to preference elicitation. Tech. rep., Department of Computer Science, University of Toronto, Canada
Callison-Burch C, Fordyce C, Koehn P, Monz C, Schroeder J (2007) (Meta-)evaluation of machine translation. In: ACL 2007: Proceedings of the Second Workshop on Statistical Machine Translation, Prague, Czech Republic, pp 136–158
Chen L, Pu P (2004) Survey of preference elicitation methods. Tech. Rep. IC/2004/67, Human Computer Interaction Group, Ecole Politechnique Fédérale de Lausanne, Switzerland
Christiadi, Cushing B (2007) Conditional logit, IIA, and alternatives for estimating models of interstate migration. In: 46th Annual Meeting of the Southern Regional Science Association, Charleston, SC, available online at http://rri.wvu.edu/wp-content/uploads/2012/11/wpcushing2007-4.pdf, Accessed 23 April 2013
Condon S, Parvaz D, Aberdeen J, Doran C, Freeman A, Awad M (2010) Evaluation of machine translation errors in English and Iraqi Arabic. In: LREC 2010: proceedings of the seventh international conference on Language Resources and Evaluation, Valetta, Malta, pp 729–735
Denkowski M, Lavie A (2010) Choosing the right evaluation for machine translation: an examination of annotator and automatic metric performance on human judgment tasks. In: AMTA 2010: Proceedings of the Ninth Conference of the Association for Machine Translation in the Americas, Denver, CO., USA, available online at http://amta2010.amtaweb.org/, Accessed 23 April 2013
Doyle J, Thomason R (1999) Background to qualitative decision theory. AI Magazine 20(2):55–68
Farrús M, Costa-Jussà M, Popovic M (2012) Study and correlation analysis of linguistic, perceptual, and automatic machine translation evaluations. J Am Soc Inf Sci Technol 63(1):174–184
Fleiss J (1971) Measuring nominal scale agreement among many raters. Psychol Bull 76(5):378–382
Goodman L (1974) Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika 61(2):215–231
Green P, Rao V (1971) Conjoint measurement for quantifying judgmental data. J Mark Res 8(3):355–363
Green P, Srinivasan V (1978) Conjoint analysis in consumer research: issues and outlook. J Consumer Res 5:103–123
Hui B (2002) Measuring user acceptability of machine translations to diagnose system errors: An experience report. In: Coling-2002 workshop “Machine translation in Asia”, Taipei, Taiwan, pp 63–70
Kirchhoff K, Turner A, Axelrod A, Saavedra F (2011) Application of statistical machine translation to public health information: a feasibility study. J Am Med Inform Assoc 18:472–482
Kirchhoff K, Capurro D, Turner A (2012) Evaluating user preferences in machine translation using conjoint analysis. In: EAMT 2012: Proceedings of the 16th Annual Conference of the European Association for Machine Translation, Trento, Italy, pp 119–126
Krings H (2001) Empirical investigations of machine translation post-editing processes. Kent State University Press, Kent, OH
Landis J, Koch G (1977) The measurement of observer agreement for categorical data. Biometrics 33:159174
Lavie A, Agarwal A (2007) METEOR: An automatic metric for MT evaluation with high levels of correlation with human judgments. In: ACL 2007: proceedings of the Second Workshop on Statistical Machine Translation, Prague, Czech Republic, pp 228–231
LDC (2005) Linguistic data annotation specification: Assessment of fluency and adequacy in translations. revision 1.5. Tech. rep., Linguistic Data Consortium, Philadelphia, PA
Louviere J, Woodworth G (1983) Design and analysis of simulated consumer choice experiments: an approach based on aggregate data. J Market Res 20(4):350–367
Maier G, Edward M (2002) Modelling preferences and stability among transport alternatives. Transportation Research Part E 38:319–334
McFadden D (1974) Conditional logit analysis of qualitative choice behavior. In: Zarembka P (ed) Frontiers in Econometrics. Academic Press, New York, pp 105–142
O’Brien S (ed) (2011) Cognitive Explorations of Translation: Eyes, Keys, Taps. Continuum, London/New York
Papineni K, Roukos S, Ward T, Zhu WJ (2002) BLEU: a method for automatic evaluation of machine translation. In: 40th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, Philadelphia, PA, USA, pp 311–318
Parton K, McKeown K (2010) MT error detection for cross-lingual question answering. In: Coling 2010, 23rd International Conference on Computational Linguistics, Proceedings of the Conference, Beijing, China, pp 946–954
Parton K, Habash N, McKeown K, Iglesias G, Gispert A (2012) Can automatic post-editing make MT more meaningful? In: Proceedings of the 16th Annual Conference of the European Association for Machine Translation (EAMT), Trento, Italy, pp 111–118
Philips K, Maddala T, Johnson F (2002) Measuring preferences for health care interventions using conjoint analysis. Health Serv Res 37(6):1681–1705
Popovic M, Ney H (2011) Towards automatic error analysis of machine translation output. Comput Linguistics 37(4):657–688
Saaty T (1977) A scaling method for priorities in hierarchical structure. J Math Psychol 15:234–281
Snover M, Dorr B, Schwartz R, Micciulla L, Makhoul J (2006) A study of translation edit rate with targeted human annotation. In: AMTA 2006: Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Visions for the Future of Machine Translation, Cambridge, MA, USA, pp 223–231
Specia L, Raj D, Turchi M (2012) Machine translation evaluation versus quality estimation. Mach Transl 24(1):39–50
Vilar D, Xiu J, D’Haro L, Ney H (2006) Error analysis of statistical machine translation output. In: LREC-2006: Fifth International Conference on Language Resources and Evaluation, Proceedings, Genoa, Italy, pp 697–702
Yamashita N, Ishida T (2006) Effects of machine translation on collaborative work. In: Proceedings of the ACM Conference on Computer Supported Cooperative Work (CSCW), Banff, Alberta, Canada, pp 515–523
Ypma T (1995) Historical development of the Newton-Raphson method. SIAM Rev 37(4):531–551
Acknowledgments
We are grateful to Aurora Salvador Sanchis and Lorena Ruiz Marcos for providing the error annotations and corrections, to Megumu Brownstein for recruiting the domain experts, and to Kate Cole for comments on an earlier draft of this paper. This study was funded by Grant #1R01LM010811-01 from the National Library of Medicine (NLM). Its content is solely the responsibility of the authors and does not necessarily represent the view of the NLM.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kirchhoff, K., Capurro, D. & Turner, A.M. A conjoint analysis framework for evaluating user preferences in machine translation. Machine Translation 28, 1–17 (2014). https://doi.org/10.1007/s10590-013-9140-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10590-013-9140-x