Using texts generated by STR and CAT to facilitate student comprehension of lecture content in a foreign language


In this study, we applied a combination of speech-to-text recognition (STR) and computer-aided translation (CAT) technologies during lectures in English as a foreign language to facilitate student comprehension of the lecture content. The instructor lectured in English, the STR system generated texts from the voice input, and the CAT system then simultaneously translated the STR texts into the students’ native language. We aimed to test the feasibility of applying combined STR and CAT technologies to facilitate student comprehension of lecture content in a foreign language. To this end, we designed an experiment. Three groups with twenty students each were formed. All students attended the same lectures: (a) students in the control group attended lectures without any support, (b) students in experimental group 1 attended lectures with STR support (i.e., they were presented with texts in English generated from the instructor’s speech by STR), and (c) students in experimental group 2 attended lectures with STR and CAT support (i.e., they were presented with texts in their native language that were translated from English by STR and CAT). We compared the posttest results of the students in the three groups. We also explored the effects of our approach with respect to different levels of foreign language ability. Finally, we surveyed the perceptions of students in experimental group 2 about the usefulness of the translated texts for their learning. Our results showed that applying STR and CAT technologies together was a useful approach: the translated texts helped significantly improve student learning performance compared to that of the students in the control condition. Translated texts were beneficial for students, as the students were able (a) to confirm some words that were not clearly spoken by the instructor or to find the meaning of some words with which the students were not familiar and (b) to complement spoken lecture content with translated content to help information processing and enhance comprehension. When comparing students with different language abilities, we found that students with low language abilities benefited from the translated texts the most. The students’ language ability was significantly lower than that of the high-ability students before the experiment; however, the low-ability students’ learning performance showed no significant difference from the high-ability students after the experiment. Finally, most students perceived translated texts as useful for their learning, and they intended to use the texts in the future for learning purposes.

This is a preview of subscription content, log in to check access.


  1. Bloomfield, A., Wayland, S. C., Rhoades, E., Blodgett, A., Linck, J., & Ross, S. (2010). What makes listening difficult? Factors affecting second language listening comprehension. College Park: University of Maryland.

    Google Scholar 

  2. Camiciottoli, B. C. (2005). Adjusting a business lecture for an international audience: A case study. English for Specific Purposes, 24(2), 183–199.

    Article  Google Scholar 

  3. Chen, S. J. (2004). Linguistic dimensions of subtitling. Perspectives from Taiwan. Meta: Journal des traducteursMeta:/Translators’ Journal, 49(1), 115–124.

    Article  Google Scholar 

  4. Chen, I. J., & Chang, C. C. (2009). Cognitive load theory: An Empirical study of anxiety and task performance in language learning. Electronic Journal of Research in Educational Psychology, 7(2), 729–746.

    Google Scholar 

  5. Chen, I. J., Chang, C. C., & Yen, J. C. (2012). Effects of presentation mode on mobile language learning: A performance efficiency perspective. Australasian Journal of Educational Technology, 28(1), 122–137.

    Article  Google Scholar 

  6. Cheng, X. (2000). Asian students’ reticence revisited. System, 28(3), 435–446.

    Article  Google Scholar 

  7. Clark, R. C., & Mayer, R. E. (2016). E-learning and the science of instruction: Proven guidelines for consumers and designers of multimedia learning (4th ed.). San Francisco, CA: Wiley.

    Google Scholar 

  8. Debuse, J. C., Hede, A., & Lawley, M. (2009). Learning efficacy of simultaneous audio and on-screen text in online lectures. Australasian Journal of Educational Technology, 25(5), 748–762.

    Article  Google Scholar 

  9. Diao, Y., Chandler, P., & Sweller, J. (2007). The effect of written text on comprehension of spoken English as a foreign language. The American Journal of Psychology, 120(2), 237–261.

    Google Scholar 

  10. ElShiekh, A. A. A. (2012). Google translate service: transfer of meaning, distortion or simply a new creation? An investigation into the translation process & problems at google. English Language and Literature Studies, 2(1), 56–68.

    Google Scholar 

  11. Ferreira, F., Bailey, K. G., & Ferraro, V. (2002). Good-enough representations in language comprehension. Current Directions in Psychological Science, 11(1), 11–15.

    Article  Google Scholar 

  12. Ferreira, F., Engelhardt, P. E., & Jones, M. W. (2009). Good enough language processing: A satisficing approach. In Proceedings of the 31st annual conference of the cognitive science society. Austin: Cognitive Science Society.

  13. First, E. (2013). EF English proficiency index. Retrieved on March 17, 2017.

  14. Godwin-Jones, R. (2011). Emerging technologies: Mobile apps for language learning. Language Learning & Technology, 15(2), 2–11.

    Google Scholar 

  15. Graham, S. (2011). Self-efficacy and academic listening. Journal of English for Academic Purposes, 10(2), 113–117.

    Article  Google Scholar 

  16. Hermet, M., & Desilets, A. (2009). Using first and second language models to correct preposition errors in second language authoring. In Proceedings of the fourth workshop on innovative use of NLP for building educational applications (pp. 64–72). Association for Computational Linguistics.

  17. Hwang, W. Y., Shadiev, R., Kuo, T. C. T., & Chen, N. S. (2012). Effects of speech-to-text recognition application on learning performance in synchronous cyber classrooms. Educational Technology & Society, 15(1), 367–380.

    Google Scholar 

  18. Kalyuga, S. (2014). The expertise reversal principle in multimedia learning. In R. E. Mayer (Ed.), The Cambridge handbook of multimedia learning (pp. 576–597). New York: Cambridge University Press.

    Google Scholar 

  19. Kalyuga, S., Ayres, P., Chandler, P., & Sweller, J. (2003). The expertise reversal effect. Educational Psychologist, 38(1), 23–31.

    Article  Google Scholar 

  20. Keysar, B., Hayakawa, S. L., & An, S. G. (2012). The foreign-language effect: Thinking in a foreign tongue reduces decision biases. Psychological Science, 23(6), 661–668.

    Article  Google Scholar 

  21. Kheir, R., & Way, T. (2006, June). Improving speech recognition to assist real-time classroom note taking. In Rehabilitation Engineering Society of North America (RESNA). Conference, Atlanta.

  22. Krashen, S. D. (1985). The input hypothesis: Issues and implications. Reading: Addison-Wesley Longman Ltd.

    Google Scholar 

  23. Krashen, S. (2014). Case histories and the comprehension hypothesis. TESOL Journal, 9(1), 70–91.

    Google Scholar 

  24. Kuo, T. C., Shadiev, R., Hwang, W. Y., & Chen, N. S. (2012). Effects of applying STR for group learning activities on learning performance in a synchronous cyber classroom. Computers & Education, 58(1), 600–608.

    Article  Google Scholar 

  25. Kurz, I. (2009). The impact of non-native English on students’ interpreting performance. In G. Hansen, A. Chesterman, & H. Gerzymisch-Arbogast (Eds.), Efforts and models in interpreting and translation research (pp. 179–192). Amsterdam: John Benjamins.

    Google Scholar 

  26. Lee, C. H., & Kalyuga, S. (2011). Effectiveness of on-screen pinyin in learning Chinese: An expertise reversal for multimedia redundancy effect. Computers in Human Behavior, 27(1), 11–15.

    Article  Google Scholar 

  27. Mayer, R. E. (2009). Multimedia learning. New York: Cambridge University Press.

    Google Scholar 

  28. Mayer, R. E., & Moreno, R. (2003). Nine ways to reduce cognitive load in multimedia learning. Educational Psychologist, 38(1), 43–52.

    Article  Google Scholar 

  29. Mellebeek, B., Khasin, A., Owczarzak, K., Van Genabith, J., & Way, A. (2005). Improving online machine translation systems. In Proceedings of the X MT summit (pp. 290–297). Phuket: Thailand.

  30. Miller, L. (2007). Issues in lecturing in a second language: Lecturer’s behaviour and students’ perceptions. Studies in Higher Education, 32(6), 747–760.

    Article  Google Scholar 

  31. Nisbet, P., Wilson, A., & Aitken, S. (2005). Speech Recognition for Students with Disabilities. In Proceedings of the inclusive and supportive education congress, ISEC 2005 conference. Delph: Inclusive Technology.

  32. Omar, H., Embi, M. A., & Yunus, M. M. (2012). ESL learners’ interaction in an online discussion via Facebook. Asian Social Science, 8(11), 67–74.

    Article  Google Scholar 

  33. Paas, F., & Sweller, J. (2014). Implication of cognitive load theory for multimedia. In R. E. Mayer (Ed.), The Cambridge handbook of multimedia learning (pp. 27–42). New York: Cambridge University Press.

    Google Scholar 

  34. Parmar, J. R., Tejada, F. R., Lang, L. A., Purnell, M., Acedera, L., & Ngonga, F. (2015). Assessment of communications-related admissions criteria in a three-year pharmacy program. American Journal of Pharmaceutical Education.

    Article  Google Scholar 

  35. Pearce, K., & Scutter, S. (2010). Podcasting of health sciences lectures: Benefits for students from a non-English speaking background. Australasian Journal of Educational Technology, 26(7), 1028–1041.

    Article  Google Scholar 

  36. Ranchal, R., Taber-Doughty, T., Guo, Y., Bain, K., Martin, H., Robinson, J. P., et al. (2013). Using speech recognition for real-time captioning and lecture transcription in the classroom. IEEE Transactions on Learning Technologies, 6(4), 299–311.

    Article  Google Scholar 

  37. Ryba, K., McIvor, T., Shakir, M., & Paez, D. (2006). Liberated Learning: Analysis of University students’ perceptions and experiences with continuous automated speech recognition. Journal of Instructional Science and Technology, 9(1), 1–19.

    Google Scholar 

  38. Shadiev, R., & Huang, Y. M. (2016). Facilitating cross-cultural understanding with learning activities supported by speech-to-text recognition and computer-aided translation. Computers & Education, 98, 130–141.

    Article  Google Scholar 

  39. Shadiev, R., Hwang, W. Y., Chen, N. S., & Huang, Y. M. (2014). Review of Speech-to-Text Recognition technology for enhancing learning. Educational Technology & Society, 17(4), 65–84.

    Google Scholar 

  40. Shadiev, R., Hwang, W. Y., Huang, Y. M., & Liu, C. J. (2016). Investigating applications of speech to text recognition for face to face seminar to assist learning of non-native English participants. Technology, Pedagogy and Education, 25(1), 119–134.

    Article  Google Scholar 

  41. Shadiev, R., Wu, T. T., & Huang, Y. M. (2017). Enhancing learning performance, attention, and meditation using a speech-to-text recognition application: Evidence from multiple data sources. Interactive Learning Environments, 25(2), 249–261.

    Article  Google Scholar 

  42. Shadiev, R., Wu, T. T., Sun, A., & Huang, Y. M. (2018). Applications of speech-to-text recognition and computer-aided translation for enhancing cross-cultural learning: Issues and their solutions. Educational Technology Research and Development, 66(1), 191–214.

    Article  Google Scholar 

  43. Sweller, J. (1994). Cognitive load theory, learning difficulty, and instructional design. Learning and Instruction, 4(4), 295–312.

    Article  Google Scholar 

  44. Sweller, J. (2017). Cognitive load theory and teaching English as a second language to adult learners. Contact Magazine, 43(1), 10–14.

    Google Scholar 

  45. Sweller, J., Ayres, P., & Kalyuga, S. (Eds.) (2011a). The redundancy effect. In Cognitive load theory (pp. 141–154). Springer New York.

  46. Sweller, J., Ayres, P., & Kalyuga, S. (Eds.) (2011b). Cognitive load theory. New York: Springer.

    Google Scholar 

  47. Venkatesh, V., & Davis, F. D. (2000). A theoretical extension of the technology acceptance model: Four longitudinal field studies. Management Science, 46(2), 186–204.

    Article  Google Scholar 

  48. Wald, M., & Bain, K. (2008). Universal access to communication and learning: The role of automatic speech recognition. International Journal Universal Access in the Information Society, 6(4), 435–447.

    Article  Google Scholar 

  49. Yeung, A. S. (1999). Cognitive load and learner expertise: Split-attention and redundancy effects in reading comprehension tasks with vocabulary definitions. The Journal of Experimental Education, 67(3), 197–217.

    Article  Google Scholar 

Download references


There is no funding to report for this study.

Author information



Corresponding author

Correspondence to Ai Sun.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.


Appendix 1

  1. 1.

    The treatment improves my understanding of a lecture.

  2. 2.

    The treatment increases my productivity during a lecture.

  3. 3.

    The treatment enhances my learning effectiveness during a lecture.

  4. 4.

    The treatment improves my learning performance during a lecture.

  5. 5.

    The treatment helps me accomplish a learning task more quickly.

  6. 6.

    Overall, I found the treatment to be useful during a lecture.

  7. 7.

    I intend to continue using the treatment for learning in the future.

  8. 8.

    I plan to use the treatment for learning often.

  9. 9.

    I will strongly recommend others to use the treatment for learning.

Appendix 2



Hello everyone, today I am going to talk about photography. Do you have a camera? Do you enjoy taking pictures? Daniel and Winnie are taking pictures today, so we are learning about photography.

Anyone can be a photographer. You just need a camera. Daniel has a small camera. Winnie has a big camera.



Пpивeт вceм, ceгoдня я coбиpaюcь пoгoвopить o фoтoгpaфии. У тeбя ecть кaмepa? Baм нpaвитcя фoтoгpaфиpoвaть? Дэниeл и Bинни ceгoдня фoтoгpaфиpyютcя, пoэтoмy мы yчимcя фoтoгpaфии.

Любoй мoжeт быть фoтoгpaфoм. Baм пpocтo нyжнa кaмepa. У Дaниэля мaлeнькaя кaмepa. У Bинни бoльшaя кaмepa.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Shadiev, R., Sun, A. Using texts generated by STR and CAT to facilitate student comprehension of lecture content in a foreign language. J Comput High Educ 32, 561–581 (2020).

Download citation


  • Comprehension
  • Lecture
  • Foreign language
  • STR
  • CAT