A conjoint analysis framework for evaluating user preferences in machine translation

Kirchhoff, Katrin; Capurro, Daniel; Turner, Anne M.

doi:10.1007/s10590-013-9140-x

A conjoint analysis framework for evaluating user preferences in machine translation

Published: 29 May 2013

Volume 28, pages 1–17, (2014)
Cite this article

Machine Translation

Katrin Kirchhoff¹,
Daniel Capurro² &
Anne M. Turner^3,4

623 Accesses
5 Citations
1 Altmetric
Explore all metrics

Abstract

Despite much research on machine translation (MT) evaluation, there is surprisingly little work that directly measures users’ intuitive or emotional preferences regarding different types of MT errors. However, the elicitation and modeling of user preferences is an important prerequisite for research on user adaptation and customization of MT engines. In this paper we explore the use of conjoint analysis as a formal quantitative framework to assess users’ relative preferences for different types of translation errors. We apply our approach to the analysis of MT output from translating public health documents from English into Spanish. Our results indicate that word order errors are clearly the most dispreferred error type, followed by word sense, morphological, and function word errors. The conjoint analysis-based model is able to predict user preferences more accurately than a baseline model that chooses the translation with the fewest errors overall. Additionally we analyze the effect of using a crowd-sourced respondent population versus a sample of domain experts and observe that main preference effects are remarkably stable across the two samples.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Translation Quality Assessment: Past and Present

A new deal for translation quality

Article 20 July 2020

Approaches to Human and Machine Translation Quality Assessment

Notes

References

Al-Maskari A, Sanderson M (2006) The affect [sic] of machine translation on the performance of Arabic-English QA system. In: EACL-2006, 11th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the workshop on Multilingual Question Answering MLQA06, Trento, Italy, pp 9–14
Altman D (1991) Practical statistics for medical research. Chapman & Hall, London
Google Scholar
Boutilier C, Brafman R, Geib C, Poole D (1997) A constraint-based approach to preference elicitation and decision making. In: AAAI Spring Symposium on Qualitative Preferences in Deliberation and Practical Reasoning, Stanford, CA, pp 19–28
Braziunas D (2006) Computational approaches to preference elicitation. Tech. rep., Department of Computer Science, University of Toronto, Canada
Callison-Burch C, Fordyce C, Koehn P, Monz C, Schroeder J (2007) (Meta-)evaluation of machine translation. In: ACL 2007: Proceedings of the Second Workshop on Statistical Machine Translation, Prague, Czech Republic, pp 136–158
Chen L, Pu P (2004) Survey of preference elicitation methods. Tech. Rep. IC/2004/67, Human Computer Interaction Group, Ecole Politechnique Fédérale de Lausanne, Switzerland
Christiadi, Cushing B (2007) Conditional logit, IIA, and alternatives for estimating models of interstate migration. In: 46th Annual Meeting of the Southern Regional Science Association, Charleston, SC, available online at http://rri.wvu.edu/wp-content/uploads/2012/11/wpcushing2007-4.pdf, Accessed 23 April 2013
Condon S, Parvaz D, Aberdeen J, Doran C, Freeman A, Awad M (2010) Evaluation of machine translation errors in English and Iraqi Arabic. In: LREC 2010: proceedings of the seventh international conference on Language Resources and Evaluation, Valetta, Malta, pp 729–735
Denkowski M, Lavie A (2010) Choosing the right evaluation for machine translation: an examination of annotator and automatic metric performance on human judgment tasks. In: AMTA 2010: Proceedings of the Ninth Conference of the Association for Machine Translation in the Americas, Denver, CO., USA, available online at http://amta2010.amtaweb.org/, Accessed 23 April 2013
Doyle J, Thomason R (1999) Background to qualitative decision theory. AI Magazine 20(2):55–68
Google Scholar
Farrús M, Costa-Jussà M, Popovic M (2012) Study and correlation analysis of linguistic, perceptual, and automatic machine translation evaluations. J Am Soc Inf Sci Technol 63(1):174–184
Article Google Scholar
Fleiss J (1971) Measuring nominal scale agreement among many raters. Psychol Bull 76(5):378–382
Article Google Scholar
Goodman L (1974) Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika 61(2):215–231
Article MATH MathSciNet Google Scholar
Green P, Rao V (1971) Conjoint measurement for quantifying judgmental data. J Mark Res 8(3):355–363
Article Google Scholar
Green P, Srinivasan V (1978) Conjoint analysis in consumer research: issues and outlook. J Consumer Res 5:103–123
Article Google Scholar
Hui B (2002) Measuring user acceptability of machine translations to diagnose system errors: An experience report. In: Coling-2002 workshop “Machine translation in Asia”, Taipei, Taiwan, pp 63–70
Kirchhoff K, Turner A, Axelrod A, Saavedra F (2011) Application of statistical machine translation to public health information: a feasibility study. J Am Med Inform Assoc 18:472–482
Google Scholar
Kirchhoff K, Capurro D, Turner A (2012) Evaluating user preferences in machine translation using conjoint analysis. In: EAMT 2012: Proceedings of the 16th Annual Conference of the European Association for Machine Translation, Trento, Italy, pp 119–126
Krings H (2001) Empirical investigations of machine translation post-editing processes. Kent State University Press, Kent, OH
Google Scholar
Landis J, Koch G (1977) The measurement of observer agreement for categorical data. Biometrics 33:159174
MathSciNet Google Scholar
Lavie A, Agarwal A (2007) METEOR: An automatic metric for MT evaluation with high levels of correlation with human judgments. In: ACL 2007: proceedings of the Second Workshop on Statistical Machine Translation, Prague, Czech Republic, pp 228–231
LDC (2005) Linguistic data annotation specification: Assessment of fluency and adequacy in translations. revision 1.5. Tech. rep., Linguistic Data Consortium, Philadelphia, PA
Louviere J, Woodworth G (1983) Design and analysis of simulated consumer choice experiments: an approach based on aggregate data. J Market Res 20(4):350–367
Article Google Scholar
Maier G, Edward M (2002) Modelling preferences and stability among transport alternatives. Transportation Research Part E 38:319–334
Article Google Scholar
McFadden D (1974) Conditional logit analysis of qualitative choice behavior. In: Zarembka P (ed) Frontiers in Econometrics. Academic Press, New York, pp 105–142
Google Scholar
O’Brien S (ed) (2011) Cognitive Explorations of Translation: Eyes, Keys, Taps. Continuum, London/New York
Papineni K, Roukos S, Ward T, Zhu WJ (2002) BLEU: a method for automatic evaluation of machine translation. In: 40th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, Philadelphia, PA, USA, pp 311–318
Parton K, McKeown K (2010) MT error detection for cross-lingual question answering. In: Coling 2010, 23rd International Conference on Computational Linguistics, Proceedings of the Conference, Beijing, China, pp 946–954
Parton K, Habash N, McKeown K, Iglesias G, Gispert A (2012) Can automatic post-editing make MT more meaningful? In: Proceedings of the 16th Annual Conference of the European Association for Machine Translation (EAMT), Trento, Italy, pp 111–118
Philips K, Maddala T, Johnson F (2002) Measuring preferences for health care interventions using conjoint analysis. Health Serv Res 37(6):1681–1705
Article Google Scholar
Popovic M, Ney H (2011) Towards automatic error analysis of machine translation output. Comput Linguistics 37(4):657–688
Article MathSciNet Google Scholar
Saaty T (1977) A scaling method for priorities in hierarchical structure. J Math Psychol 15:234–281
Article MATH MathSciNet Google Scholar
Snover M, Dorr B, Schwartz R, Micciulla L, Makhoul J (2006) A study of translation edit rate with targeted human annotation. In: AMTA 2006: Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Visions for the Future of Machine Translation, Cambridge, MA, USA, pp 223–231
Specia L, Raj D, Turchi M (2012) Machine translation evaluation versus quality estimation. Mach Transl 24(1):39–50
Article Google Scholar
Vilar D, Xiu J, D’Haro L, Ney H (2006) Error analysis of statistical machine translation output. In: LREC-2006: Fifth International Conference on Language Resources and Evaluation, Proceedings, Genoa, Italy, pp 697–702
Yamashita N, Ishida T (2006) Effects of machine translation on collaborative work. In: Proceedings of the ACM Conference on Computer Supported Cooperative Work (CSCW), Banff, Alberta, Canada, pp 515–523
Ypma T (1995) Historical development of the Newton-Raphson method. SIAM Rev 37(4):531–551
Article MATH MathSciNet Google Scholar

Download references

Acknowledgments

We are grateful to Aurora Salvador Sanchis and Lorena Ruiz Marcos for providing the error annotations and corrections, to Megumu Brownstein for recruiting the domain experts, and to Kate Cole for comments on an earlier draft of this paper. This study was funded by Grant #1R01LM010811-01 from the National Library of Medicine (NLM). Its content is solely the responsibility of the authors and does not necessarily represent the view of the NLM.

Author information

Authors and Affiliations

Department of Electrical Engineering, University of Washington, Seattle, WA, 98195, USA
Katrin Kirchhoff
Department of Internal Medicine, Pontificia Universidad Católica de Chile, Santiago, Chile
Daniel Capurro
Department of Health Services, University of Washington, Seattle, WA, 98195, USA
Anne M. Turner
Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, 98195, USA
Anne M. Turner

Authors

Katrin Kirchhoff
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Capurro
View author publications
You can also search for this author in PubMed Google Scholar
Anne M. Turner
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Katrin Kirchhoff.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kirchhoff, K., Capurro, D. & Turner, A.M. A conjoint analysis framework for evaluating user preferences in machine translation. Machine Translation 28, 1–17 (2014). https://doi.org/10.1007/s10590-013-9140-x

Download citation

Received: 21 December 2012
Accepted: 26 April 2013
Published: 29 May 2013
Issue Date: March 2014
DOI: https://doi.org/10.1007/s10590-013-9140-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A conjoint analysis framework for evaluating user preferences in machine translation

Abstract

Access this article

Similar content being viewed by others

Translation Quality Assessment: Past and Present

A new deal for translation quality

Approaches to Human and Machine Translation Quality Assessment

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A conjoint analysis framework for evaluating user preferences in machine translation

Abstract

Access this article

Similar content being viewed by others

Translation Quality Assessment: Past and Present

A new deal for translation quality

Approaches to Human and Machine Translation Quality Assessment

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation