Skip to main content

Statistical Modelling and Automatic Tagging of Human Translation Processes

  • Chapter
New Directions in Empirical Translation Process Research

Part of the book series: New Frontiers in Translation Studies ((NFTS))

Abstract

Advanced translation workbenches with detailed logging and eye-tracking capabilities greatly facilitate the recording of key strokes, mouse activity, or eye movement of translators and post-editors. The large-scale analysis of the resulting data logs, however, is still an open problem. In this chapter, we present and evaluate a statistical method to segment raw keylogging and eye-tracking data into distinct Human Translation Processes (HTPs), i.e., phases of specific human translation behavior, such as orientation, revision, or pause. We evaluate the performance of this automatic method against manual annotation by human experts with a background in Translation Process Research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 129.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    We acknowledge that this view is not shared by everyone. For example, Moritz Schaeffer (personal communication) argues that post-editing should be considered a task/skill distinct from translation per se. Not every effective translator is an effective post-editor, and vice versa.

  2. 2.

    Also referred to as thinking-aloud protocols (Toury 1995) and talk-aloud protocols (Gerloff 1986).

  3. 3.

    Casmacat is based on the web-based MateCat workbench, which is deployed and actively used in production at several translation and IT companies (Federico et al. 2014).

  4. 4.

    The data are available from http://sourceforge.net/projects/tprdb/.

  5. 5.

    The data are available from http://www.casmacat.eu/?n=Main.Downloads.

  6. 6.

    Of course we could ask them, but that would interrupt precisely those mental processes that we want to eavesdrop on and force the translator to reflect on what might otherwise be a subconscious, automatic process. This is one of the main arguments against think-aloud experimental protocols.

  7. 7.

    In fact, a mixture of Poisson distributions would have been the appropriate choice here, as the action counts are not continuous but discrete data. The mixture model approach allows us to better fit the asymmetrically distributed data with the symmetric distributions such as the Poisson distribution, because of the skewedness of the actual data. An even better option would be to use more general two-parameter models such as the Conway-Maxwell-Poisson distribution, which allows a better fit to heavy or thin tails in the distribution (see Shmueli et al. 2005 for details on the Conway-Maxwell-Poisson distribution).

  8. 8.

    We have implemented the modelling approach described in this chapter in segcats, available at http://github.com/laeubli/segcats. The clustering and EM algorithms are based on scikit-learn version 14.1 (see Pedregosa et al. 2011, and http://scikit-learn.org/stable/).

  9. 9.

    Details can be found in Appendix B of Läubli (2014).

  10. 10.

    Recall that each HMM state corresponds to an automatically learnt HTP in the models learnt through the unsupervised sequence modelling approach proposed in Sect. 8.4.

References

  • Alabau, V., Buck, C., Carl, M., Casacuberta, F., García-Martínez, M., Germann, U., et al. (2014). Casmacat: A computer-assisted translation workbench. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics (EACL), Gothenburg, Sweden (pp. 25–28).

    Google Scholar 

  • Arthur, D., & Vassilvitskii, S. (2007). k-means++: The advantages of careful seeding. In Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms, New Orleans, LA (pp. 1027–1035).

    Google Scholar 

  • Baum, L. E. (1972). An inequality and associated maximization technique in statistical estimation for probabilistic functions of Markov processes. In Proceedings of the 3rd Symposium on Inequalities, Los Angeles, CA (pp. 1–8).

    Google Scholar 

  • Carl, M. (2010). A computational framework for a cognitive model of human translation processes. In Proceedings of ASLIB Translating and the Computer (Vol. 32), London, UK.

    Google Scholar 

  • Carl, M. (2012). Translog-II: A program for recording user activity data for empirical reading and writing research. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC), Istanbul, Turkey (pp. 4108–4112).

    Google Scholar 

  • Carl, M., & Jakobsen, A. L. (2009). Towards statistical modelling of translators’ activity data. International Journal of Speech Technology, 12(4), 125–138.

    Article  Google Scholar 

  • Carl, M., & Kay, M. (2011). Gazing and typing activities during translation: A comparative study of translation units of professional and student translators. Meta, 56(4), 952–975.

    Article  Google Scholar 

  • Carl, M., García, M. M., & Mesa-Lao, B. (2014). CFT13: A resource for research into the post-editing process. In Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC), Reykjavik, Iceland (pp. 1757–1764).

    Google Scholar 

  • Chinea-Rios, M., Sanchis-Trilles, G., Ortiz-Martínez, D., & Casacuberta, F. (2014). Online optimisation of log-linear weights in interactive machine translation. In Proceedings of the 9th Language Resources and Evaluation Conference (LREC), Reykjavik, Iceland (pp. 3556–3559).

    Google Scholar 

  • Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, 39(1), 1–38.

    MathSciNet  MATH  Google Scholar 

  • Dragsted, B., & Carl, M. (2013). Towards a classification of translation styles based on eye-tracking and keylogging data. Journal of Writing Research, 5(1), 133–158.

    Article  Google Scholar 

  • Elming, J., Winther Balling, L., & Carl, M. (2014). Investigating user behaviour in post-editing and translation using the CASMACAT workbench. In S. O’Brien, L. Winther Balling, M. Carl, M. Simard, & L. Specia (Eds.), Post-editing of machine translation (pp. 147–169). Newcastle upon Tyne: Cambridge Scholars Publishing.

    Google Scholar 

  • Federico, M., Bertoldi, N., Cettolo, M., Negri, M., Turchi, M., Trombetti, M., et al. (2014). The MateCat tool. In Proceedings of the 25th International Conference on Computational Linguistics (COLING), Dublin, Ireland (pp. 129–132).

    Google Scholar 

  • Fleiss, J. L. (1971) Measuring nominal scale agreement among many raters. Psychological Bulletin, 76(5), 378.

    Article  Google Scholar 

  • Gerloff, P. (1986). Second language learners’ reports on the interpretive process: Talk-aloud protocols of translation. In J. House & S. Blum-Kulka (Eds.), Interlingual and intercultural communication: Discourse and cognition in translation and second language acquisition studies (pp. 243–262). Tübingen: Narr.

    Google Scholar 

  • Green, S., Heer, J., & Manning, C. D. (2013). The efficacy of human post-editing for language translation. In Proceedings of the 2013 ACM SIGCHI Conference on Human Factors in Computing Systems (CHI), Paris, France

    Google Scholar 

  • Guerberof, A. (2009). Productivity and quality in the post-editing of outputs from translation memories and machine translation. International Journal of Localisation, 7(1), 11–21.

    Google Scholar 

  • Hvelplund, K. (2011). Allocation of cognitive resources in translation: An eye-tracking and key-logging study. Ph.D. thesis, Copenhagen Business School, Copenhagen, Denmark.

    Google Scholar 

  • Jakobsen, A. L. (1999). Logging target text production with Translog. Copenhagen Studies in Language, 24, 9–20.

    Google Scholar 

  • Jakobsen, A. L. (2003). Effects of think aloud on translation speed, revision and segmentation. In F. Alves (Ed.), Triangulating translation. Benjamins translation library (Vol. 45, pp. 69–95). Amsterdam, Netherlands: John Benjamins.

    Google Scholar 

  • Jakobsen, A. L. (2011). Tracking translators’ keystrokes and eye movements with Translog. In C. Alvstad, A. Hild, & E. Tiselius (Eds.), Methods and strategies of process research: Integrative approaches in translation studies. Benjamins translation library (Vol. 94, pp. 37–56). Amsterdam, Netherlands: John Benjamins.

    Google Scholar 

  • Just, M. A., & Carpenter, P. A. (1980). A theory of reading: From eye fixations to comprehension. Psychological Review, 87(4), 329–354.

    Article  Google Scholar 

  • Koehn, P., & Germann, U. (2014). The impact of machine translation quality on human post-editing. In Proceedings of the Workshop on Humans and Computer-assisted Translation (HaCaT), Gothenburg, Sweden (pp 38–46).

    Google Scholar 

  • Koponen, M., Aziz, W., Ramos, L., & Specia, L. (2012). Post-editing time as a measure of cognitive effort. In Proceedings of the AMTA 2012 Workshop on Post-Editing Technology and Practice (WPTP), San Diego, CA (pp. 11–20).

    Google Scholar 

  • Krings, H. P. (1995). Texte reparieren. Empirische Untersuchungen zum Prozeß der Nachredaktion von Maschinenübersetzungen. Habilitation thesis, University of Hildesheim, Hildesheim, Germany.

    Google Scholar 

  • Krings, H. P. (2001). Repairing texts: Empirical investigations of machine translation post-editing processes. Kent, OH: Kent State University Press.

    Google Scholar 

  • Krings, H. P. (2005). Wege ins Labyrinth – Fragestellungen und Methoden der Übersetzungsprozessforschung im Überblick. Meta, 50(2), 342–358.

    Article  Google Scholar 

  • Läubli, S. (2014). Statistical modelling of human translation processes. Master’s thesis, University of Edinburgh, Edinburgh, UK.

    Google Scholar 

  • Läubli, S., Fishel, M., Massey, G., Ehrensberger-Dow, M., & Volk, M. (2013). Assessing post-editing efficiency in a realistic translation environment. In Proceedings of the 2nd Workshop on Post-Editing Technology and Practice (WPTP), Nice, France (pp. 83–91).

    Google Scholar 

  • Martínez-Gómez, P., Minocha, A., Huang, J., Carl, M., Bangalore, S., & Aizawa, A. (2014). Recognition of translator expertise using sequences of fixations and keystrokes. In Proceedings of the Symposium on Eye Tracking Research and Applications (ETRA), Safety Harbor, FL (pp. 229–302).

    Google Scholar 

  • Massey, G., & Ehrensberger-Dow, M. (2014). Looking beyond the text: The usefulness of translation process data. In D. Knorr, C. Heine, & J. Engberg (Eds.), Methods in writing process research. Frankfurt am Main, Germany: Peter Lang (pp. 81–89).

    Google Scholar 

  • O’Brien, S. (2006). Pauses as indicators of cognitive effort in post-editing machine translation output. Across Languages and Cultures, 7(1), 1–21.

    Article  MathSciNet  Google Scholar 

  • O’Brien, S. (2007). Eye-tracking and translation memory matches. Perspectives: Studies in Translatology, 14(3), 185–205.

    MathSciNet  Google Scholar 

  • Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., & Grisel, O. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.

    MATH  Google Scholar 

  • Perrin, D. (2003). Progression analysis (PA): Investigating writing strategies at the workplace. Journal of Pragmatics, 35(6), 907–921.

    Article  MathSciNet  Google Scholar 

  • Plitt, M., & Masselot, F. (2010). A productivity test of statistical machine translation post-editing in a typical localisation context. Prague Bulletin of Mathematical Linguistics, 93, 7–16.

    Article  Google Scholar 

  • Sanchis-Trilles, G., Ortiz-Martínez, D., & Casacuberta, F. (2014). Efficient wordgraph pruning for interactive translation prediction. In Proceedings of the 17th Annual Conference of the European Association for Machine Translation (EAMT), Dubrovnik, Croatia (pp. 27–34).

    Google Scholar 

  • Schaeffer, M., & Carl, M. (2014). Measuring the cognitive effort of literal translation processes. In Proceedings of the Workshop on Humans and Computer-assisted Translation (HaCaT), Gothenburg, Sweden (pp. 29–37).

    Google Scholar 

  • Shmueli, G., Minka, T. P., Kadane, J. B., Borle, S., & Boatwright, P. (2005). A useful distribution for fitting discrete data: Revival of the Conway–Maxwell–Poisson distribution. Journal of the Royal Statistical Society: Series C (Applied Statistics), 54(1), 127–142.

    Article  MathSciNet  MATH  Google Scholar 

  • Sim, J., & Wright, C. C. (2005). The kappa statistic in reliability studies: Use, interpretation, and sample size requirements. Physical Therapy, 85(3), 257–268.

    Google Scholar 

  • Toury, G. (1995). Descriptive translation studies and beyond. Benjamins translation library (Vol. 4). Amsterdam, Netherlands: John Benjamins.

    Book  Google Scholar 

  • Underwood, N., Mesa-Lao, B., Martinez, M. G., Carl, M., Alabau, V., Gonzalez-Rubio, J., et al. (2014). Evaluating the effects of interactivity in a post-editing workbench. In Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC), Reykjavik, Iceland (pp. 553–559).

    Google Scholar 

  • Viterbi, A. J. (1967). Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Transactions on Information Theory, 13(2), 260–269.

    Article  MATH  Google Scholar 

Download references

Acknowledgements

This work was supported in part by the European Union Seventh Framework Programme for Research, Technological Development and Demonstration (FP7/2007–2013) under grant agreement no. 287576 (Casmacat).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Samuel Läubli .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Läubli, S., Germann, U. (2016). Statistical Modelling and Automatic Tagging of Human Translation Processes. In: Carl, M., Bangalore, S., Schaeffer, M. (eds) New Directions in Empirical Translation Process Research. New Frontiers in Translation Studies. Springer, Cham. https://doi.org/10.1007/978-3-319-20358-4_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-20358-4_8

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-20357-7

  • Online ISBN: 978-3-319-20358-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics