Advanced translation workbenches with detailed logging and eye-tracking capabilities greatly facilitate the recording of key strokes, mouse activity, or eye movement of translators and post-editors. The large-scale analysis of the resulting data logs, however, is still an open problem. In this chapter, we present and evaluate a statistical method to segment raw keylogging and eye-tracking data into distinct Human Translation Processes (HTPs), i.e., phases of specific human translation behavior, such as orientation, revision, or pause. We evaluate the performance of this automatic method against manual annotation by human experts with a background in Translation Process Research.
- Computer-aided translation
- Computer-assisted translation
- Quantitative data analysis
- Translation processes
- Unsupervised sequence modelling
This is a preview of subscription content, access via your institution.
Tax calculation will be finalised at checkout
Purchases are for personal use onlyLearn about institutional subscriptions
We acknowledge that this view is not shared by everyone. For example, Moritz Schaeffer (personal communication) argues that post-editing should be considered a task/skill distinct from translation per se. Not every effective translator is an effective post-editor, and vice versa.
Casmacat is based on the web-based MateCat workbench, which is deployed and actively used in production at several translation and IT companies (Federico et al. 2014).
The data are available from http://sourceforge.net/projects/tprdb/.
The data are available from http://www.casmacat.eu/?n=Main.Downloads.
Of course we could ask them, but that would interrupt precisely those mental processes that we want to eavesdrop on and force the translator to reflect on what might otherwise be a subconscious, automatic process. This is one of the main arguments against think-aloud experimental protocols.
In fact, a mixture of Poisson distributions would have been the appropriate choice here, as the action counts are not continuous but discrete data. The mixture model approach allows us to better fit the asymmetrically distributed data with the symmetric distributions such as the Poisson distribution, because of the skewedness of the actual data. An even better option would be to use more general two-parameter models such as the Conway-Maxwell-Poisson distribution, which allows a better fit to heavy or thin tails in the distribution (see Shmueli et al. 2005 for details on the Conway-Maxwell-Poisson distribution).
Details can be found in Appendix B of Läubli (2014).
Recall that each HMM state corresponds to an automatically learnt HTP in the models learnt through the unsupervised sequence modelling approach proposed in Sect. 8.4.
Alabau, V., Buck, C., Carl, M., Casacuberta, F., García-Martínez, M., Germann, U., et al. (2014). Casmacat: A computer-assisted translation workbench. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics (EACL), Gothenburg, Sweden (pp. 25–28).
Arthur, D., & Vassilvitskii, S. (2007). k-means++: The advantages of careful seeding. In Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms, New Orleans, LA (pp. 1027–1035).
Baum, L. E. (1972). An inequality and associated maximization technique in statistical estimation for probabilistic functions of Markov processes. In Proceedings of the 3rd Symposium on Inequalities, Los Angeles, CA (pp. 1–8).
Carl, M. (2010). A computational framework for a cognitive model of human translation processes. In Proceedings of ASLIB Translating and the Computer (Vol. 32), London, UK.
Carl, M. (2012). Translog-II: A program for recording user activity data for empirical reading and writing research. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC), Istanbul, Turkey (pp. 4108–4112).
Carl, M., & Jakobsen, A. L. (2009). Towards statistical modelling of translators’ activity data. International Journal of Speech Technology, 12(4), 125–138.
Carl, M., & Kay, M. (2011). Gazing and typing activities during translation: A comparative study of translation units of professional and student translators. Meta, 56(4), 952–975.
Carl, M., García, M. M., & Mesa-Lao, B. (2014). CFT13: A resource for research into the post-editing process. In Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC), Reykjavik, Iceland (pp. 1757–1764).
Chinea-Rios, M., Sanchis-Trilles, G., Ortiz-Martínez, D., & Casacuberta, F. (2014). Online optimisation of log-linear weights in interactive machine translation. In Proceedings of the 9th Language Resources and Evaluation Conference (LREC), Reykjavik, Iceland (pp. 3556–3559).
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, 39(1), 1–38.
Dragsted, B., & Carl, M. (2013). Towards a classification of translation styles based on eye-tracking and keylogging data. Journal of Writing Research, 5(1), 133–158.
Elming, J., Winther Balling, L., & Carl, M. (2014). Investigating user behaviour in post-editing and translation using the CASMACAT workbench. In S. O’Brien, L. Winther Balling, M. Carl, M. Simard, & L. Specia (Eds.), Post-editing of machine translation (pp. 147–169). Newcastle upon Tyne: Cambridge Scholars Publishing.
Federico, M., Bertoldi, N., Cettolo, M., Negri, M., Turchi, M., Trombetti, M., et al. (2014). The MateCat tool. In Proceedings of the 25th International Conference on Computational Linguistics (COLING), Dublin, Ireland (pp. 129–132).
Fleiss, J. L. (1971) Measuring nominal scale agreement among many raters. Psychological Bulletin, 76(5), 378.
Gerloff, P. (1986). Second language learners’ reports on the interpretive process: Talk-aloud protocols of translation. In J. House & S. Blum-Kulka (Eds.), Interlingual and intercultural communication: Discourse and cognition in translation and second language acquisition studies (pp. 243–262). Tübingen: Narr.
Green, S., Heer, J., & Manning, C. D. (2013). The efficacy of human post-editing for language translation. In Proceedings of the 2013 ACM SIGCHI Conference on Human Factors in Computing Systems (CHI), Paris, France
Guerberof, A. (2009). Productivity and quality in the post-editing of outputs from translation memories and machine translation. International Journal of Localisation, 7(1), 11–21.
Hvelplund, K. (2011). Allocation of cognitive resources in translation: An eye-tracking and key-logging study. Ph.D. thesis, Copenhagen Business School, Copenhagen, Denmark.
Jakobsen, A. L. (1999). Logging target text production with Translog. Copenhagen Studies in Language, 24, 9–20.
Jakobsen, A. L. (2003). Effects of think aloud on translation speed, revision and segmentation. In F. Alves (Ed.), Triangulating translation. Benjamins translation library (Vol. 45, pp. 69–95). Amsterdam, Netherlands: John Benjamins.
Jakobsen, A. L. (2011). Tracking translators’ keystrokes and eye movements with Translog. In C. Alvstad, A. Hild, & E. Tiselius (Eds.), Methods and strategies of process research: Integrative approaches in translation studies. Benjamins translation library (Vol. 94, pp. 37–56). Amsterdam, Netherlands: John Benjamins.
Just, M. A., & Carpenter, P. A. (1980). A theory of reading: From eye fixations to comprehension. Psychological Review, 87(4), 329–354.
Koehn, P., & Germann, U. (2014). The impact of machine translation quality on human post-editing. In Proceedings of the Workshop on Humans and Computer-assisted Translation (HaCaT), Gothenburg, Sweden (pp 38–46).
Koponen, M., Aziz, W., Ramos, L., & Specia, L. (2012). Post-editing time as a measure of cognitive effort. In Proceedings of the AMTA 2012 Workshop on Post-Editing Technology and Practice (WPTP), San Diego, CA (pp. 11–20).
Krings, H. P. (1995). Texte reparieren. Empirische Untersuchungen zum Prozeß der Nachredaktion von Maschinenübersetzungen. Habilitation thesis, University of Hildesheim, Hildesheim, Germany.
Krings, H. P. (2001). Repairing texts: Empirical investigations of machine translation post-editing processes. Kent, OH: Kent State University Press.
Krings, H. P. (2005). Wege ins Labyrinth – Fragestellungen und Methoden der Übersetzungsprozessforschung im Überblick. Meta, 50(2), 342–358.
Läubli, S. (2014). Statistical modelling of human translation processes. Master’s thesis, University of Edinburgh, Edinburgh, UK.
Läubli, S., Fishel, M., Massey, G., Ehrensberger-Dow, M., & Volk, M. (2013). Assessing post-editing efficiency in a realistic translation environment. In Proceedings of the 2nd Workshop on Post-Editing Technology and Practice (WPTP), Nice, France (pp. 83–91).
Martínez-Gómez, P., Minocha, A., Huang, J., Carl, M., Bangalore, S., & Aizawa, A. (2014). Recognition of translator expertise using sequences of fixations and keystrokes. In Proceedings of the Symposium on Eye Tracking Research and Applications (ETRA), Safety Harbor, FL (pp. 229–302).
Massey, G., & Ehrensberger-Dow, M. (2014). Looking beyond the text: The usefulness of translation process data. In D. Knorr, C. Heine, & J. Engberg (Eds.), Methods in writing process research. Frankfurt am Main, Germany: Peter Lang (pp. 81–89).
O’Brien, S. (2006). Pauses as indicators of cognitive effort in post-editing machine translation output. Across Languages and Cultures, 7(1), 1–21.
O’Brien, S. (2007). Eye-tracking and translation memory matches. Perspectives: Studies in Translatology, 14(3), 185–205.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., & Grisel, O. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.
Perrin, D. (2003). Progression analysis (PA): Investigating writing strategies at the workplace. Journal of Pragmatics, 35(6), 907–921.
Plitt, M., & Masselot, F. (2010). A productivity test of statistical machine translation post-editing in a typical localisation context. Prague Bulletin of Mathematical Linguistics, 93, 7–16.
Sanchis-Trilles, G., Ortiz-Martínez, D., & Casacuberta, F. (2014). Efficient wordgraph pruning for interactive translation prediction. In Proceedings of the 17th Annual Conference of the European Association for Machine Translation (EAMT), Dubrovnik, Croatia (pp. 27–34).
Schaeffer, M., & Carl, M. (2014). Measuring the cognitive effort of literal translation processes. In Proceedings of the Workshop on Humans and Computer-assisted Translation (HaCaT), Gothenburg, Sweden (pp. 29–37).
Shmueli, G., Minka, T. P., Kadane, J. B., Borle, S., & Boatwright, P. (2005). A useful distribution for fitting discrete data: Revival of the Conway–Maxwell–Poisson distribution. Journal of the Royal Statistical Society: Series C (Applied Statistics), 54(1), 127–142.
Sim, J., & Wright, C. C. (2005). The kappa statistic in reliability studies: Use, interpretation, and sample size requirements. Physical Therapy, 85(3), 257–268.
Toury, G. (1995). Descriptive translation studies and beyond. Benjamins translation library (Vol. 4). Amsterdam, Netherlands: John Benjamins.
Underwood, N., Mesa-Lao, B., Martinez, M. G., Carl, M., Alabau, V., Gonzalez-Rubio, J., et al. (2014). Evaluating the effects of interactivity in a post-editing workbench. In Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC), Reykjavik, Iceland (pp. 553–559).
Viterbi, A. J. (1967). Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Transactions on Information Theory, 13(2), 260–269.
This work was supported in part by the European Union Seventh Framework Programme for Research, Technological Development and Demonstration (FP7/2007–2013) under grant agreement no. 287576 (Casmacat).
Editors and Affiliations
© 2016 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Läubli, S., Germann, U. (2016). Statistical Modelling and Automatic Tagging of Human Translation Processes. In: Carl, M., Bangalore, S., Schaeffer, M. (eds) New Directions in Empirical Translation Process Research. New Frontiers in Translation Studies. Springer, Cham. https://doi.org/10.1007/978-3-319-20358-4_8
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-20357-7
Online ISBN: 978-3-319-20358-4