Machine Translation

, Volume 28, Issue 3–4, pp 187–216 | Cite as

Indices of cognitive effort in machine translation post-editing

  • Lucas Nunes Vieira


Identifying indices of effort in post-editing of machine translation can have a number of applications, including estimating machine translation quality and calculating post-editors’ pay rates. Both source-text and machine-output features as well as subjects’ traits are investigated here in view of their impact on cognitive effort, which is measured with eye tracking and a subjective scale borrowed from the field of Educational Psychology. Data is analysed with mixed-effects models, and results indicate that the semantics-based automatic evaluation metric Meteor is significantly correlated with all measures of cognitive effort considered. Smaller effects are also observed for source-text linguistic features. Further insight is provided into the role of the source text in post-editing, with results suggesting that consulting the source text is only associated with how cognitively demanding the task is perceived in the case of those with a low level of proficiency in the source language. Subjects’ working memory capacity was also taken into account and a relationship with post-editing productivity could be noticed. Scaled-up studies into the construct of working memory capacity and the use of eye tracking in models for quality estimation are suggested as future work.


Post-editing Cognitive effort Eye tracking Meteor  Working memory capacity 



This research has been supported by the School of Modern Languages at Newcastle University. Particular gratitude is extended to research participants as well as Dr Francis Jones, Dr Michael Jin, and Dr Ya-Yun Chen.


  1. Aikawa T, Schwartz L, King R, Corston-Oliver M, Lozano C (2007) Impact of controlled language on translation quality and post-editing in a statistical machine translation environment. In: Maegaard B (ed) Proceedings of machine translation summit XI, October 2007, Copenhagen, pp 1–7Google Scholar
  2. Aziz W, Castilho S, Specia L (2012) PET: a tool for post-editing and assessing machine translation. In: Calzolari N, Choukri K, Declerck T, Doğan MU, Maegaard B, Mariani J, Moreno A, Odijk J, Piperidis S (eds) Proceedings of LREC 2012 eighth international conference on language resources and evaluation, 21–27 May 2012. Istanbul, pp 3982–3987Google Scholar
  3. Aziz W, Koponen M, Specia L (2014) Sub-sentence level analysis of machine translation post-editing effort. In: O’Brien S, Winther Balling L, Carl M, Simard M, Specia L (eds) Post-editing of machine translation: processes and applications. Cambridge Scholars Publishing, Newcastle upon Tyne, pp 170–199Google Scholar
  4. Baayen RH (2008) Analysing linguistic data: a practical introduction to statistics using R. Cambridge University Press, CambridgeCrossRefGoogle Scholar
  5. Baayen RH, Davidson DJ, Bates DM (2008) Mixed-effects modeling with crossed random effects for subjects and items. J Mem Lang 59(4):390–412CrossRefGoogle Scholar
  6. Balling LW (2008) A brief introduction to regression designs and mixed-effects modelling by a recent convert. In: Göpferich S, Jakobsen AL, Mees IM (eds) Looking at eyes: eye tracking studies of reading and translation processing, Copenhagen Studies in Language vol 36, pp 175–192Google Scholar
  7. Balling W, Baayen H (2008) Morphological effects in auditory word recognition: evidence from Danish. Lang Cognit Process 23(7–8):1159–1190. doi: 10.1080/01690960802201010 CrossRefGoogle Scholar
  8. Bates D, Maechler M, Bolker B, Walker S (2013) Linear mixed-effects models using Eigen and S4. R package version 1.0-5. Accessed 10 Apr 2014
  9. Bernth A, Gdaniec C (2002) MTranslatability. Mach Transl 16(3):175–218CrossRefGoogle Scholar
  10. Bjornsson CH (1968) Lasbarhet [readability]. Bokförlaget Liber, StockholmGoogle Scholar
  11. Blatz J, Fitzgerald E, Foster G, Gandrabur S, Goutte C, Kulesza A, Sanchis A, Ueffing N (2004) Confidence estimation for machine translation. Proceedings of the 20th international conference on computational linguistics, 23–27 Aug 2004, Geneva, pp 315–321Google Scholar
  12. Callison-Burch C, Fordyce C, Koehn P, Monz C, Schroeder J (2007) (Meta-)evaluation of machine translation. Proceedings of the second workshop on statistical machine translation, 23 June 2007, Prague, pp 136–158Google Scholar
  13. Caplan D, Waters G (2003) The relationship between age, processing speed, working memory capacity, and language comprehension. Memory 13(3–4):403–413. doi: 10.1080/09658210344000459 CrossRefGoogle Scholar
  14. Carl M, Dragsted B, Elming J, Hardt D, Jakobsen AL (2011) The process of post-editing: a pilot study. In: Sharp B, Zock M, Carl M, Jakobsen AL (eds) Proceedings of the 8th international NLPCS workshop. Special theme: human-machine interaction in translation. Copenhagen studies in language, vol 41, 20–21 August 2011, Copenhagen, Denmark, pp 131–142Google Scholar
  15. Carl M, Kay M (2011) Gazing and typing activities during translation: a comparative study of translation units of professional and student translators. Meta 56(4):952–975. doi: 10.7202/1011262ar CrossRefGoogle Scholar
  16. Christensen RHB (2010) ordinal–Regression models for ordinal data R package version 2013.9-30 Accessed 10 Apr 2014
  17. Cohen J (1968) Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit. Psychol Bull 70(4):213–220CrossRefGoogle Scholar
  18. De Almeida G (2013) Translating the post-editor: an investigation of post-editing changes and correlations with professional experience. Dissertation, Dublin City University, DublinGoogle Scholar
  19. Denkowski M, Lavie A (2011) Meteor 1.3: automatic metric for reliable optimization and evaluation of machine translation systems. In: Proceedings of the sixth workshop on statistical machine translation, 30–31 July 2011, Edinburgh, pp 85–91Google Scholar
  20. DeStefano D, LeFevre J-A (2007) Cognitive load in hypertext reading: a review. Comput Hum Behav 23(3):1616–1641. doi: 10.1016/j.chb.2005.08.012 CrossRefGoogle Scholar
  21. Doherty S, O’Brien S, Carl M (2010) Eye tracking as an MT evaluation technique. Mach Transl 24(1):1–13CrossRefGoogle Scholar
  22. Gamer M, Lemon J, Singh IFP (2012) irr: various coefficients of interrater reliability and agreement. R package version 0.84. Accessed 25 Apr 2014
  23. Graesser AC, McNamara DS, Louwerse MM, Cai Z (2004) Coh-Metrix: analysis of text on cohesion and language. Behav Res Methods Instrum Comput 36(2):193–202. doi: 10.3758/BF03195564 CrossRefGoogle Scholar
  24. Graesser AC, McNamara DS (2011) Computational analyses of multilevel discourse comprehension. Top Cognit Sci 3(2):371–398. doi: 10.1111/j.1756-8765.2010.01081.x CrossRefGoogle Scholar
  25. Green S, de Marneffe M-C, Bauer J, Manning CD (2011) Multiword expression identification with tree substitution grammars: a parsing tour de force with French. In: Proceedings of the 2011 conference on empirical methods in natural language processing, 27–31 July 2011, Edinburgh, pp 725–735Google Scholar
  26. Green S, Heer J, Manning CD (2013) The efficacy of human post-editing for language translation. In: Proceedings of the SIGCHI conference on human factors in computing systems, 27 Apr–2 May 2013, Paris, pp 439–448Google Scholar
  27. Guerberof A (2014) The role of professional experience in post-editing from a quality and productivity perspective. In: O’Brien S, Winther Balling L, Carl M, Specia L (eds) Post-editing of machine translation: processes and applications. Cambridge Scholars Publishing, Newcastle upon Tyne, pp 51–76Google Scholar
  28. Hamilton P (1979) Process entropy and cognitive control: mental load in internalized thought processes. In: Moray N (ed) Mental workload: its theory and measurement. Plenum Press, New York, pp 289–298CrossRefGoogle Scholar
  29. Holmqvist K, Nyström M, Andersson R, Dewhurst R, Jarodzka H, Van de Weijer J (2011) Eye tracking: a comprehensive guide to methods and measures. Oxford University Press, OxfordGoogle Scholar
  30. Hvelplund KT (2011) Allocation of cognitive resources in translation: An eye-tracking and key-logging study. Dissertation, Copenhagen Business School, CopenhagenGoogle Scholar
  31. Jakobsen AL (2003) Effects of think aloud on translation speed, revision and segmentation. In: Alves F (ed) Triangulating translation. perspectives in process oriented research. Benjamins Translation Library, vol 45. John Benjamins, Amsterdam, pp 69–95Google Scholar
  32. Jensen KTH (2009) Indicators of text complexity. In: Göpferich S, Jakobsen AL, Mees IM (eds) Behind the mind: methods, models and results in translation process research. Copenhagen Studies in Language vol 36. Samfundslitteratur, Copenhagen, pp 61–80Google Scholar
  33. Jones G (2000) Compiling french word frequency lists for the VAT: a feasibility study. Accessed 20 Dec 2013
  34. Kandel L, Moles A (1958) Application de l’indice de Flesch à la langue française. Cahiers Etudes de Radio-Télévision 19:253–274Google Scholar
  35. Koponen M (2012) Comparing human perceptions of post-editing effort with post-editing operations. In: Proceedings of the 7th workshop on statistical machine translation, 7–8 June 2012, Montreal, pp 181–190Google Scholar
  36. Koponen M, Aziz W, Ramos L, Specia L (2012) Post-editing time as a measure of cognitive effort. In: O’Brien S, Simard M, Specia L (eds) Proceedings of the AMTA 2012 Workshop on Post-editing Technology and Practice (WPTP 2012), San Diego, 28 Oct 2012Google Scholar
  37. Krings HP (2001) Repairing texts: empirical investigations of machine translation post-editing processes. Kent State University Press, KentGoogle Scholar
  38. Kuznetsova A, Brockhoff PB, Christensen R (2013) lmerTest: tests for random and fixed effects for linear mixed effect models (lmer objects of lme4 package). R package version 2.0-0. Accessed 10 Apr 2014
  39. Lacruz L, Shreve GM (2014) Pauses and cognitive effort in post-editing. In: O’Brien S, Winther Balling L, Carl M, Simard M, Specia L (eds) Post-editing of machine translation: processes and applications. Cambridge Scholars Publishing, Newcastle upon Tyne, pp 246–272Google Scholar
  40. LDC (2005) Linguistic data annotation specification: assessment of fluency and adequacy in translations. Revision 1.5Google Scholar
  41. Meara P, Buxton B (1987) An alternative to multiple choice vocabulary tests. Lang Test 4(2):142–154CrossRefGoogle Scholar
  42. McCutchen D (1996) A capacity theory of writing: working memory in composition. Educ Psychol Rev 8:299–325. doi: 10.1007/BF01464076 CrossRefGoogle Scholar
  43. Miller GA (1995) WordNet: a lexical database for English. Commun ACM 38(11):39–41CrossRefGoogle Scholar
  44. Mitchell L, Roturier J, O’Brien S (2013) Community-based post-editing of machine-translated content: monolingual vs. bilingual. In: O’Brien S, Simard M, Specia L (eds) Proceedings of machine tranlation summit XIV Workshop on Post-Editing Technology and Practice (WPTP2), 2 Sept 2013, Nice, pp 35–43Google Scholar
  45. O’Brien S (2004) Machine translatability and post-editing effort: How do they relate. In: Translating and the Computer 26, November 2004, Aslib, LondonGoogle Scholar
  46. O’Brien S (2005) Methodologies for measuring the correlations between post-editing effort and machine translatability. Mach Transl 19(1):37–58. doi: 10.1007/s10590-005-2467-1 CrossRefGoogle Scholar
  47. O’Brien S (2006a) Pauses as indicators of cognitive effort in post-editing machine translation output. Across Lang Cult 7(1):1–21. doi: 10.1556/Acr.7.2006.1.1
  48. O’Brien S (2006b) Controlled language and post-editing. MultiLingual, October/November issue, pp 17–19Google Scholar
  49. O’Brien S (2011) Towards predicting post-editing productivity. MachTransl 25(3):197–215. doi: 10.1007/s10590-011-9096-7 Google Scholar
  50. Paas F (1992) Training strategies for attaining transfer of problem-solving skill in statistics: a cognitive-load approach. J Educ Psychol 84(4):429–434CrossRefGoogle Scholar
  51. Paas F, Van Merriënboer JJG (1994) Variability of worked examples and transfer of geometrical problem-solving skills: a cognitive-load approach. J Educ Psychol 86(1):122–133CrossRefGoogle Scholar
  52. Paas F, Tuovinen JE, Tabbers H, Van Gerven PWM (2003) Cognitive load measurement as a means to advance cognitive load theory. Educ Psychologist 38(1):63–71CrossRefGoogle Scholar
  53. Plitt M, Masselot F (2010) A productivity test of statistical machine translation post-editing in a typical localization context. Prague Bull Math Linguist 93:7–16Google Scholar
  54. Rayner K (1998) Eye movements in reading and information processing: 20 years of research. Psychol Bull 124(3):372CrossRefGoogle Scholar
  55. Read J (2007) Second language vocabulary assessment: current practices and new directions. Int J Engl Stud 7(2):105–125Google Scholar
  56. Redick TS, Broadway JM, Meier ME, Kuriakose PS, Unsworth N, Kane MJ, Engle RW (2012) Measuring working memory capacity with automated complex span tasks. Eur J Psychol Assess 28(3):164CrossRefGoogle Scholar
  57. Roodenrys K, Agostinho S, Roodenrys S, Chandler P (2012) Managing one’s own cognitive load when evidence of split attention is present. Appl Cognit Psychol 26(6):878–886. doi: 10.1002/acp.2889 CrossRefGoogle Scholar
  58. Sanders AF (1979) Some remarks on mental workload. In: Moray N (ed) Mental workload: its theory and measurement. Plenum Press, New York, pp 41–77CrossRefGoogle Scholar
  59. Snijders T, Bosker R (1999) Multilevel analysis: an introduction to basic and advanced multilevel modeling. Sage Publications, Thousand OakszbMATHGoogle Scholar
  60. Snover M, Dorr B, Schwartz R, Micciulla L, Makhoul J (2006) A study of translation edit rate with targeted human annotation. In: Proceedings of the 7th conference of the association for machine translation of the Americas 2006, 8–12 Aug 2006, Cambridge, pp 223–231Google Scholar
  61. Specia L (2011) Exploiting objective annotations for measuring translation post-editing effort. In: Forcada ML, Depraetere H, Vandeghinste V (eds) Proceedings of the 15th international conference of the European association for machine translation, 30–31 May 2011, Leuven, pp 73–80Google Scholar
  62. Specia L, Shah K (2013) Deliverable D2. 1.1 Quality estimation baseline software. Accessed 7 May 2014
  63. Specia L, Turchi M, Cancedda N, Dymetman M, Cristianini N (2009) Estimating the sentence-level quality of machine translation systems. In: Màrques L, Somers H (eds) Proceedings of the 13th annual conference of the European association for machine translation, 14–15 May 2009, Barcelona pp 28–37Google Scholar
  64. Specia L, Raj D, Turchi M (2010) Machine translation evaluation versus quality estimation. Mach Transl 24(1):39–50CrossRefGoogle Scholar
  65. Tabbers HK, Martens RL, Van Merriënboer JJG (2004) Multimedia instructions and cognitive load theory: effects of modality and cueing. Br J Educ Psychol 74(1):71–81. doi: 10.1348/000709904322848824 CrossRefGoogle Scholar
  66. Tatsumi M (2009) Correlation between automatic evaluation scores, post-editing speed and some other factors. In: Proceedings of MT summit XII the twelfth machine yranslation summit, 26–30 Aug 2009, Ottawa, pp 332–339Google Scholar
  67. Tatsumi M (2010) Post-editing machine translated text in a commercial setting: Observation and statistical analysis. Dissertation, Dublin City University, DublinGoogle Scholar
  68. TAUS (2010) Machine translation post-editing guidelines. TAUS. Accessed 11 Apr 2014
  69. TAUS (2013) Pricing machine translation post-editing guidelines. TAUS. Accessed 18 Jan 2014
  70. Temnikova I (2010) A cognitive evaluation approach for a controlled language post-editing experiment. In: Calzolari N, Choukri K, Maegaard B, Mariani J, Odijk J, Piperidis S, Rosner M, Tapias D (eds) Proceedings of LREC 2010 seventh international conference on language resources and evaluation, 19–21 May 2010, Valetta, pp 3485–3490Google Scholar
  71. Tobii Technology (2012) Determining the Tobii I-VT fixation filter’s default values: method description and results discussion. Tobii Technology.’sDefaultValues.pdf. Accessed 20 Dec 2013
  72. Toglia MP, Battig WF (1978) Handbook of semantic word norms. Lawrence Erlbaum, HillsdaleGoogle Scholar
  73. Tyler SW, Hertel PT, McCallum MC, Hellis HC (1979) Cognitive effort and memory. J Exp Psychol 5(6):607–617Google Scholar
  74. Underwood N, Jongejan B (2001) Translatability checker: A tool to help decide whether to use MT. In: Maegaard B (ed) Proceedings of MT summit VIII machine translation in the information age, 18–22 Sept 2001, Santiago de Compostela, pp 363–368Google Scholar
  75. Unsworth N, Heitz RP, Schrock JC, Engle RW (2005) An automated version of the operation span task. Behav Res Methods 37(3):498–505CrossRefGoogle Scholar
  76. Unsworth N, Redick TS, Heitz RP, Broadway JM, Engle RW (2009) Complex working memory span tasks and higher-order cognition: a latent-variable analysis of the relationship between processing and storage. Memory 17(6):635–654CrossRefGoogle Scholar
  77. Van Gog T, Kester L, Nievelstein F, Giesbers B, Paas F (2009) Uncovering cognitive processes: different techniques that can contribute to cognitive load research and instruction. Comput Hum Behav 25(2):325–331. doi: 10.1016/j.chb.2008.12.021 CrossRefGoogle Scholar
  78. Warrens MJ (2012) Some paradoxical results for the quadratically weighted Kappa. Osychometrika 77(2):315–323. doi: 10.1007/s11336-012-9258-4 CrossRefzbMATHMathSciNetGoogle Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2014

Authors and Affiliations

  1. 1.School of Modern LanguagesNewcastle UniversityNewcastle upon TyneUK

Personalised recommendations