Statistical Modelling and Automatic Tagging of Human Translation Processes

Läubli, Samuel; Germann, Ulrich

doi:10.1007/978-3-319-20358-4_8

Samuel Läubli^5,6 &
Ulrich Germann⁵

Part of the book series: New Frontiers in Translation Studies ((NFTS))

2482 Accesses
5 Citations
1 Altmetric

Abstract

Advanced translation workbenches with detailed logging and eye-tracking capabilities greatly facilitate the recording of key strokes, mouse activity, or eye movement of translators and post-editors. The large-scale analysis of the resulting data logs, however, is still an open problem. In this chapter, we present and evaluate a statistical method to segment raw keylogging and eye-tracking data into distinct Human Translation Processes (HTPs), i.e., phases of specific human translation behavior, such as orientation, revision, or pause. We evaluate the performance of this automatic method against manual annotation by human experts with a background in Translation Process Research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Hardcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Editing Actions: A Missing Link Between Translation Process Research and Machine Translation Research

Word-Based Human Edit Rate (WHER) as an Indicator of Post-editing Effort

Outline for a Relevance Theoretical Model of Machine Translation Post-editing

Notes

1.
We acknowledge that this view is not shared by everyone. For example, Moritz Schaeffer (personal communication) argues that post-editing should be considered a task/skill distinct from translation per se. Not every effective translator is an effective post-editor, and vice versa.
2.
Also referred to as thinking-aloud protocols (Toury 1995) and talk-aloud protocols (Gerloff 1986).
3.
Casmacat is based on the web-based MateCat workbench, which is deployed and actively used in production at several translation and IT companies (Federico et al. 2014).
4.
The data are available from http://sourceforge.net/projects/tprdb/.
5.
The data are available from http://www.casmacat.eu/?n=Main.Downloads.
6.
Of course we could ask them, but that would interrupt precisely those mental processes that we want to eavesdrop on and force the translator to reflect on what might otherwise be a subconscious, automatic process. This is one of the main arguments against think-aloud experimental protocols.
7.
In fact, a mixture of Poisson distributions would have been the appropriate choice here, as the action counts are not continuous but discrete data. The mixture model approach allows us to better fit the asymmetrically distributed data with the symmetric distributions such as the Poisson distribution, because of the skewedness of the actual data. An even better option would be to use more general two-parameter models such as the Conway-Maxwell-Poisson distribution, which allows a better fit to heavy or thin tails in the distribution (see Shmueli et al. 2005 for details on the Conway-Maxwell-Poisson distribution).
8.
We have implemented the modelling approach described in this chapter in segcats, available at http://github.com/laeubli/segcats. The clustering and EM algorithms are based on scikit-learn version 14.1 (see Pedregosa et al. 2011, and http://scikit-learn.org/stable/).
9.
Details can be found in Appendix B of Läubli (2014).
10.
Recall that each HMM state corresponds to an automatically learnt HTP in the models learnt through the unsupervised sequence modelling approach proposed in Sect. 8.4.

References

Alabau, V., Buck, C., Carl, M., Casacuberta, F., García-Martínez, M., Germann, U., et al. (2014). Casmacat: A computer-assisted translation workbench. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics (EACL), Gothenburg, Sweden (pp. 25–28).
Google Scholar
Arthur, D., & Vassilvitskii, S. (2007). k-means++: The advantages of careful seeding. In Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms, New Orleans, LA (pp. 1027–1035).
Google Scholar
Baum, L. E. (1972). An inequality and associated maximization technique in statistical estimation for probabilistic functions of Markov processes. In Proceedings of the 3rd Symposium on Inequalities, Los Angeles, CA (pp. 1–8).
Google Scholar
Carl, M. (2010). A computational framework for a cognitive model of human translation processes. In Proceedings of ASLIB Translating and the Computer (Vol. 32), London, UK.
Google Scholar
Carl, M. (2012). Translog-II: A program for recording user activity data for empirical reading and writing research. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC), Istanbul, Turkey (pp. 4108–4112).
Google Scholar
Carl, M., & Jakobsen, A. L. (2009). Towards statistical modelling of translators’ activity data. International Journal of Speech Technology, 12(4), 125–138.
Article Google Scholar
Carl, M., & Kay, M. (2011). Gazing and typing activities during translation: A comparative study of translation units of professional and student translators. Meta, 56(4), 952–975.
Article Google Scholar
Carl, M., García, M. M., & Mesa-Lao, B. (2014). CFT13: A resource for research into the post-editing process. In Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC), Reykjavik, Iceland (pp. 1757–1764).
Google Scholar
Chinea-Rios, M., Sanchis-Trilles, G., Ortiz-Martínez, D., & Casacuberta, F. (2014). Online optimisation of log-linear weights in interactive machine translation. In Proceedings of the 9th Language Resources and Evaluation Conference (LREC), Reykjavik, Iceland (pp. 3556–3559).
Google Scholar
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, 39(1), 1–38.
MathSciNet MATH Google Scholar
Dragsted, B., & Carl, M. (2013). Towards a classification of translation styles based on eye-tracking and keylogging data. Journal of Writing Research, 5(1), 133–158.
Article Google Scholar
Elming, J., Winther Balling, L., & Carl, M. (2014). Investigating user behaviour in post-editing and translation using the CASMACAT workbench. In S. O’Brien, L. Winther Balling, M. Carl, M. Simard, & L. Specia (Eds.), Post-editing of machine translation (pp. 147–169). Newcastle upon Tyne: Cambridge Scholars Publishing.
Google Scholar
Federico, M., Bertoldi, N., Cettolo, M., Negri, M., Turchi, M., Trombetti, M., et al. (2014). The MateCat tool. In Proceedings of the 25th International Conference on Computational Linguistics (COLING), Dublin, Ireland (pp. 129–132).
Google Scholar
Fleiss, J. L. (1971) Measuring nominal scale agreement among many raters. Psychological Bulletin, 76(5), 378.
Article Google Scholar
Gerloff, P. (1986). Second language learners’ reports on the interpretive process: Talk-aloud protocols of translation. In J. House & S. Blum-Kulka (Eds.), Interlingual and intercultural communication: Discourse and cognition in translation and second language acquisition studies (pp. 243–262). Tübingen: Narr.
Google Scholar
Green, S., Heer, J., & Manning, C. D. (2013). The efficacy of human post-editing for language translation. In Proceedings of the 2013 ACM SIGCHI Conference on Human Factors in Computing Systems (CHI), Paris, France
Google Scholar
Guerberof, A. (2009). Productivity and quality in the post-editing of outputs from translation memories and machine translation. International Journal of Localisation, 7(1), 11–21.
Google Scholar
Hvelplund, K. (2011). Allocation of cognitive resources in translation: An eye-tracking and key-logging study. Ph.D. thesis, Copenhagen Business School, Copenhagen, Denmark.
Google Scholar
Jakobsen, A. L. (1999). Logging target text production with Translog. Copenhagen Studies in Language, 24, 9–20.
Google Scholar
Jakobsen, A. L. (2003). Effects of think aloud on translation speed, revision and segmentation. In F. Alves (Ed.), Triangulating translation. Benjamins translation library (Vol. 45, pp. 69–95). Amsterdam, Netherlands: John Benjamins.
Google Scholar
Jakobsen, A. L. (2011). Tracking translators’ keystrokes and eye movements with Translog. In C. Alvstad, A. Hild, & E. Tiselius (Eds.), Methods and strategies of process research: Integrative approaches in translation studies. Benjamins translation library (Vol. 94, pp. 37–56). Amsterdam, Netherlands: John Benjamins.
Google Scholar
Just, M. A., & Carpenter, P. A. (1980). A theory of reading: From eye fixations to comprehension. Psychological Review, 87(4), 329–354.
Article Google Scholar
Koehn, P., & Germann, U. (2014). The impact of machine translation quality on human post-editing. In Proceedings of the Workshop on Humans and Computer-assisted Translation (HaCaT), Gothenburg, Sweden (pp 38–46).
Google Scholar
Koponen, M., Aziz, W., Ramos, L., & Specia, L. (2012). Post-editing time as a measure of cognitive effort. In Proceedings of the AMTA 2012 Workshop on Post-Editing Technology and Practice (WPTP), San Diego, CA (pp. 11–20).
Google Scholar
Krings, H. P. (1995). Texte reparieren. Empirische Untersuchungen zum Prozeß der Nachredaktion von Maschinenübersetzungen. Habilitation thesis, University of Hildesheim, Hildesheim, Germany.
Google Scholar
Krings, H. P. (2001). Repairing texts: Empirical investigations of machine translation post-editing processes. Kent, OH: Kent State University Press.
Google Scholar
Krings, H. P. (2005). Wege ins Labyrinth – Fragestellungen und Methoden der Übersetzungsprozessforschung im Überblick. Meta, 50(2), 342–358.
Article Google Scholar
Läubli, S. (2014). Statistical modelling of human translation processes. Master’s thesis, University of Edinburgh, Edinburgh, UK.
Google Scholar
Läubli, S., Fishel, M., Massey, G., Ehrensberger-Dow, M., & Volk, M. (2013). Assessing post-editing efficiency in a realistic translation environment. In Proceedings of the 2nd Workshop on Post-Editing Technology and Practice (WPTP), Nice, France (pp. 83–91).
Google Scholar
Martínez-Gómez, P., Minocha, A., Huang, J., Carl, M., Bangalore, S., & Aizawa, A. (2014). Recognition of translator expertise using sequences of fixations and keystrokes. In Proceedings of the Symposium on Eye Tracking Research and Applications (ETRA), Safety Harbor, FL (pp. 229–302).
Google Scholar
Massey, G., & Ehrensberger-Dow, M. (2014). Looking beyond the text: The usefulness of translation process data. In D. Knorr, C. Heine, & J. Engberg (Eds.), Methods in writing process research. Frankfurt am Main, Germany: Peter Lang (pp. 81–89).
Google Scholar
O’Brien, S. (2006). Pauses as indicators of cognitive effort in post-editing machine translation output. Across Languages and Cultures, 7(1), 1–21.
Article MathSciNet Google Scholar
O’Brien, S. (2007). Eye-tracking and translation memory matches. Perspectives: Studies in Translatology, 14(3), 185–205.
MathSciNet Google Scholar
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., & Grisel, O. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.
MATH Google Scholar
Perrin, D. (2003). Progression analysis (PA): Investigating writing strategies at the workplace. Journal of Pragmatics, 35(6), 907–921.
Article MathSciNet Google Scholar
Plitt, M., & Masselot, F. (2010). A productivity test of statistical machine translation post-editing in a typical localisation context. Prague Bulletin of Mathematical Linguistics, 93, 7–16.
Article Google Scholar
Sanchis-Trilles, G., Ortiz-Martínez, D., & Casacuberta, F. (2014). Efficient wordgraph pruning for interactive translation prediction. In Proceedings of the 17th Annual Conference of the European Association for Machine Translation (EAMT), Dubrovnik, Croatia (pp. 27–34).
Google Scholar
Schaeffer, M., & Carl, M. (2014). Measuring the cognitive effort of literal translation processes. In Proceedings of the Workshop on Humans and Computer-assisted Translation (HaCaT), Gothenburg, Sweden (pp. 29–37).
Google Scholar
Shmueli, G., Minka, T. P., Kadane, J. B., Borle, S., & Boatwright, P. (2005). A useful distribution for fitting discrete data: Revival of the Conway–Maxwell–Poisson distribution. Journal of the Royal Statistical Society: Series C (Applied Statistics), 54(1), 127–142.
Article MathSciNet MATH Google Scholar
Sim, J., & Wright, C. C. (2005). The kappa statistic in reliability studies: Use, interpretation, and sample size requirements. Physical Therapy, 85(3), 257–268.
Google Scholar
Toury, G. (1995). Descriptive translation studies and beyond. Benjamins translation library (Vol. 4). Amsterdam, Netherlands: John Benjamins.
Book Google Scholar
Underwood, N., Mesa-Lao, B., Martinez, M. G., Carl, M., Alabau, V., Gonzalez-Rubio, J., et al. (2014). Evaluating the effects of interactivity in a post-editing workbench. In Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC), Reykjavik, Iceland (pp. 553–559).
Google Scholar
Viterbi, A. J. (1967). Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Transactions on Information Theory, 13(2), 260–269.
Article MATH Google Scholar

Download references

Acknowledgements

This work was supported in part by the European Union Seventh Framework Programme for Research, Technological Development and Demonstration (FP7/2007–2013) under grant agreement no. 287576 (Casmacat).

Author information

Authors and Affiliations

School of Informatics, The University of Edinburgh, Edinburgh, UK
Samuel Läubli & Ulrich Germann
Autodesk Development Sàrl, Neuchatel, Switzerland
Samuel Läubli

Authors

Samuel Läubli
View author publications
You can also search for this author in PubMed Google Scholar
Ulrich Germann
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Samuel Läubli .

Editor information

Editors and Affiliations

Center for Research and Innovation in Translation and Translation Technology, Department of International Business Communication, Copenhagen Business School, Frederiksberg, Denmark
Michael Carl
Interactions Corporation, New Providence, New Jersey, USA
Srinivas Bangalore
Center for Research and Innovation in Translation and Translation Technology, Department of International Business Communication, Copenhagen Business School, Frederiksberg, Denmark
Moritz Schaeffer

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Läubli, S., Germann, U. (2016). Statistical Modelling and Automatic Tagging of Human Translation Processes. In: Carl, M., Bangalore, S., Schaeffer, M. (eds) New Directions in Empirical Translation Process Research. New Frontiers in Translation Studies. Springer, Cham. https://doi.org/10.1007/978-3-319-20358-4_8

Download citation

DOI: https://doi.org/10.1007/978-3-319-20358-4_8
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-20357-7
Online ISBN: 978-3-319-20358-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Statistical Modelling and Automatic Tagging of Human Translation Processes

Abstract

Access this chapter

Similar content being viewed by others

Editing Actions: A Missing Link Between Translation Process Research and Machine Translation Research

Word-Based Human Edit Rate (WHER) as an Indicator of Post-editing Effort

Outline for a Relevance Theoretical Model of Machine Translation Post-editing

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Statistical Modelling and Automatic Tagging of Human Translation Processes

Abstract

Access this chapter

Similar content being viewed by others

Editing Actions: A Missing Link Between Translation Process Research and Machine Translation Research

Word-Based Human Edit Rate (WHER) as an Indicator of Post-editing Effort

Outline for a Relevance Theoretical Model of Machine Translation Post-editing

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation