Skip to main content
Log in

A conjoint analysis framework for evaluating user preferences in machine translation

  • Published:
Machine Translation

Abstract

Despite much research on machine translation (MT) evaluation, there is surprisingly little work that directly measures users’ intuitive or emotional preferences regarding different types of MT errors. However, the elicitation and modeling of user preferences is an important prerequisite for research on user adaptation and customization of MT engines. In this paper we explore the use of conjoint analysis as a formal quantitative framework to assess users’ relative preferences for different types of translation errors. We apply our approach to the analysis of MT output from translating public health documents from English into Spanish. Our results indicate that word order errors are clearly the most dispreferred error type, followed by word sense, morphological, and function word errors. The conjoint analysis-based model is able to predict user preferences more accurately than a baseline model that chooses the translation with the fewest errors overall. Additionally we analyze the effect of using a crowd-sourced respondent population versus a sample of domain experts and observe that main preference effects are remarkably stable across the two samples.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. http://translate.google.com.

  2. http://www.mturk.com.

  3. http://www.r-project.org.

References

  • Al-Maskari A, Sanderson M (2006) The affect [sic] of machine translation on the performance of Arabic-English QA system. In: EACL-2006, 11th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the workshop on Multilingual Question Answering MLQA06, Trento, Italy, pp 9–14

  • Altman D (1991) Practical statistics for medical research. Chapman & Hall, London

    Google Scholar 

  • Boutilier C, Brafman R, Geib C, Poole D (1997) A constraint-based approach to preference elicitation and decision making. In: AAAI Spring Symposium on Qualitative Preferences in Deliberation and Practical Reasoning, Stanford, CA, pp 19–28

  • Braziunas D (2006) Computational approaches to preference elicitation. Tech. rep., Department of Computer Science, University of Toronto, Canada

  • Callison-Burch C, Fordyce C, Koehn P, Monz C, Schroeder J (2007) (Meta-)evaluation of machine translation. In: ACL 2007: Proceedings of the Second Workshop on Statistical Machine Translation, Prague, Czech Republic, pp 136–158

  • Chen L, Pu P (2004) Survey of preference elicitation methods. Tech. Rep. IC/2004/67, Human Computer Interaction Group, Ecole Politechnique Fédérale de Lausanne, Switzerland

  • Christiadi, Cushing B (2007) Conditional logit, IIA, and alternatives for estimating models of interstate migration. In: 46th Annual Meeting of the Southern Regional Science Association, Charleston, SC, available online at http://rri.wvu.edu/wp-content/uploads/2012/11/wpcushing2007-4.pdf, Accessed 23 April 2013

  • Condon S, Parvaz D, Aberdeen J, Doran C, Freeman A, Awad M (2010) Evaluation of machine translation errors in English and Iraqi Arabic. In: LREC 2010: proceedings of the seventh international conference on Language Resources and Evaluation, Valetta, Malta, pp 729–735

  • Denkowski M, Lavie A (2010) Choosing the right evaluation for machine translation: an examination of annotator and automatic metric performance on human judgment tasks. In: AMTA 2010: Proceedings of the Ninth Conference of the Association for Machine Translation in the Americas, Denver, CO., USA, available online at http://amta2010.amtaweb.org/, Accessed 23 April 2013

  • Doyle J, Thomason R (1999) Background to qualitative decision theory. AI Magazine 20(2):55–68

    Google Scholar 

  • Farrús M, Costa-Jussà M, Popovic M (2012) Study and correlation analysis of linguistic, perceptual, and automatic machine translation evaluations. J Am Soc Inf Sci Technol 63(1):174–184

    Article  Google Scholar 

  • Fleiss J (1971) Measuring nominal scale agreement among many raters. Psychol Bull 76(5):378–382

    Article  Google Scholar 

  • Goodman L (1974) Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika 61(2):215–231

    Article  MATH  MathSciNet  Google Scholar 

  • Green P, Rao V (1971) Conjoint measurement for quantifying judgmental data. J Mark Res 8(3):355–363

    Article  Google Scholar 

  • Green P, Srinivasan V (1978) Conjoint analysis in consumer research: issues and outlook. J Consumer Res 5:103–123

    Article  Google Scholar 

  • Hui B (2002) Measuring user acceptability of machine translations to diagnose system errors: An experience report. In: Coling-2002 workshop “Machine translation in Asia”, Taipei, Taiwan, pp 63–70

  • Kirchhoff K, Turner A, Axelrod A, Saavedra F (2011) Application of statistical machine translation to public health information: a feasibility study. J Am Med Inform Assoc 18:472–482

    Google Scholar 

  • Kirchhoff K, Capurro D, Turner A (2012) Evaluating user preferences in machine translation using conjoint analysis. In: EAMT 2012: Proceedings of the 16th Annual Conference of the European Association for Machine Translation, Trento, Italy, pp 119–126

  • Krings H (2001) Empirical investigations of machine translation post-editing processes. Kent State University Press, Kent, OH

    Google Scholar 

  • Landis J, Koch G (1977) The measurement of observer agreement for categorical data. Biometrics 33:159174

    MathSciNet  Google Scholar 

  • Lavie A, Agarwal A (2007) METEOR: An automatic metric for MT evaluation with high levels of correlation with human judgments. In: ACL 2007: proceedings of the Second Workshop on Statistical Machine Translation, Prague, Czech Republic, pp 228–231

  • LDC (2005) Linguistic data annotation specification: Assessment of fluency and adequacy in translations. revision 1.5. Tech. rep., Linguistic Data Consortium, Philadelphia, PA

  • Louviere J, Woodworth G (1983) Design and analysis of simulated consumer choice experiments: an approach based on aggregate data. J Market Res 20(4):350–367

    Article  Google Scholar 

  • Maier G, Edward M (2002) Modelling preferences and stability among transport alternatives. Transportation Research Part E 38:319–334

    Article  Google Scholar 

  • McFadden D (1974) Conditional logit analysis of qualitative choice behavior. In: Zarembka P (ed) Frontiers in Econometrics. Academic Press, New York, pp 105–142

    Google Scholar 

  • O’Brien S (ed) (2011) Cognitive Explorations of Translation: Eyes, Keys, Taps. Continuum, London/New York

  • Papineni K, Roukos S, Ward T, Zhu WJ (2002) BLEU: a method for automatic evaluation of machine translation. In: 40th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, Philadelphia, PA, USA, pp 311–318

  • Parton K, McKeown K (2010) MT error detection for cross-lingual question answering. In: Coling 2010, 23rd International Conference on Computational Linguistics, Proceedings of the Conference, Beijing, China, pp 946–954

  • Parton K, Habash N, McKeown K, Iglesias G, Gispert A (2012) Can automatic post-editing make MT more meaningful? In: Proceedings of the 16th Annual Conference of the European Association for Machine Translation (EAMT), Trento, Italy, pp 111–118

  • Philips K, Maddala T, Johnson F (2002) Measuring preferences for health care interventions using conjoint analysis. Health Serv Res 37(6):1681–1705

    Article  Google Scholar 

  • Popovic M, Ney H (2011) Towards automatic error analysis of machine translation output. Comput Linguistics 37(4):657–688

    Article  MathSciNet  Google Scholar 

  • Saaty T (1977) A scaling method for priorities in hierarchical structure. J Math Psychol 15:234–281

    Article  MATH  MathSciNet  Google Scholar 

  • Snover M, Dorr B, Schwartz R, Micciulla L, Makhoul J (2006) A study of translation edit rate with targeted human annotation. In: AMTA 2006: Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Visions for the Future of Machine Translation, Cambridge, MA, USA, pp 223–231

  • Specia L, Raj D, Turchi M (2012) Machine translation evaluation versus quality estimation. Mach Transl 24(1):39–50

    Article  Google Scholar 

  • Vilar D, Xiu J, D’Haro L, Ney H (2006) Error analysis of statistical machine translation output. In: LREC-2006: Fifth International Conference on Language Resources and Evaluation, Proceedings, Genoa, Italy, pp 697–702

  • Yamashita N, Ishida T (2006) Effects of machine translation on collaborative work. In: Proceedings of the ACM Conference on Computer Supported Cooperative Work (CSCW), Banff, Alberta, Canada, pp 515–523

  • Ypma T (1995) Historical development of the Newton-Raphson method. SIAM Rev 37(4):531–551

    Article  MATH  MathSciNet  Google Scholar 

Download references

Acknowledgments

We are grateful to Aurora Salvador Sanchis and Lorena Ruiz Marcos for providing the error annotations and corrections, to Megumu Brownstein for recruiting the domain experts, and to Kate Cole for comments on an earlier draft of this paper. This study was funded by Grant #1R01LM010811-01 from the National Library of Medicine (NLM). Its content is solely the responsibility of the authors and does not necessarily represent the view of the NLM.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Katrin Kirchhoff.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kirchhoff, K., Capurro, D. & Turner, A.M. A conjoint analysis framework for evaluating user preferences in machine translation. Machine Translation 28, 1–17 (2014). https://doi.org/10.1007/s10590-013-9140-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10590-013-9140-x

Keywords

Navigation