Skip to main content

AI-Based Visualization of Voice Characteristics in Lecture Videos’ Captions

  • Conference paper
  • First Online:
Artificial Intelligence in Education Technologies: New Development and Innovative Practices (AIET 2022)

Part of the book series: Lecture Notes on Data Engineering and Communications Technologies ((LNDECT,volume 154))

  • 622 Accesses

Abstract

More and more educational institutions are making lecture videos available online. Since 100+ empirical studies document that captioning a video improves comprehension of, attention to, and memory for the video [1], it makes sense to provide those lecture videos with captions. However, studies also show that the words themselves contribute only 7% and how we say those words with our tone, intonation, and verbal pace contributes 38% to making messages clear in human communication [2]. Consequently, in this paper, we address the question of whether an AI-based visualization of voice characteristics in captions helps students further improve the watching and learning experience in lecture videos. For the AI-based visualization of the speaker’s voice characteristics in the captions we use the WaveFont technology [35], which processes the voice signal and intuitively displays loudness, speed and pauses in the subtitle font. In our survey of 48 students, it could be shown that in all surveyed categories—visualization of voice characteristics, understanding the content, following the content, linguistic understanding, and identifying important words—always a significant majority of the participants prefers the WaveFont captions to watch lecture videos.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    In this research we refer to interlingual translation as subtitles and transcription in the same language as captions.

  2. 2.

    BMBF funding number: 16DHB3006; running time 1.1.2020–31.12.2022.

  3. 3.

    http://www.untertitelrichtlinien.de.

  4. 4.

    https://bbc.github.io/subtitle-guidelines.

References

  1. Gernsbacher, M.A.: Video captions benefit everyone. Policy Insights Behav. Brain Sci. 2(1), 195–202 (2015)

    Article  Google Scholar 

  2. Marteney, J.: Verbal and nonverbal communication. ASCCC Open Educational Resources Initiative (OERI). https://socialsci.libretexts.org/@go/page/67152 (2020)

  3. Wölfel, M., Schlippe, T., Stitz, A.: Voice driven type design. In: International Conference on Speech Technology and Human-Computer Dialog (SpeD), Bucharest, Romania (2015)

    Google Scholar 

  4. Schlippe, T., Wölfel, M., Stitz, A.: Generation of text from an audio speech signal. US Patent 10043519B2 (2018)

    Google Scholar 

  5. Schlippe, T., Alessai, S., El-Taweel, G., Wölfel, M., Zaghouani, W.: Visualizing voice characteristics with type design in closed captions for Arabic, International Conference on Cyberworlds (CW 2020), Caen, France (2020)

    Google Scholar 

  6. United Nations: Sustainable Development Goals: 17 goals to transform our world. https://www.un.org/sustainabledevelopment/sustainable-development-goals (2021)

  7. Correia, A.P., Liu, C., Xu, F.: Evaluating videoconferencing systems for the quality of the educational experience. Distance Educ. 41(4), 429–452 (2020)

    Article  Google Scholar 

  8. Koravuna, S., Surepally, U.K.: Educational gamification and artificial intelligence for promoting digital literacy. Association for Computing Machinery, New York, NY, USA (2020)

    Google Scholar 

  9. Chen, L., Chen, P., Lin, Z.: Artificial intelligence in education: A review. IEEE Access 8, 75264–75278 (2020). https://doi.org/10.1109/ACCESS.2020.2988510

  10. Rakhmanov, O., Schlippe, T.: Sentiment analysis for Hausa: Classifying students’ comments. The 1st Annual Meeting of the ELRA/ISCA Special Interest Group on Under-Resourced Languages (SIGUL 2022). Marseille, France (2022)

    Google Scholar 

  11. Libbrecht, P., Declerck, T., Schlippe, T., Mandl, T., Schiffner, D.: NLP for student and teacher: Concept for an AI based information literacy tutoring system. In: The 29th ACM International Conference on Information and Knowledge Management (CIKM2020). Galway, Ireland (2020)

    Google Scholar 

  12. Sawatzki, J., Schlippe, T., Benner-Wickner, M.: Deep learning techniques for automatic short answer grading: Predicting scores for English and German answers. In: The 2nd International Conference on Artificial Intelligence in Education Technology (AIET 2021). Wuhan, China (2021)

    Google Scholar 

  13. Schlippe, T., Sawatzki, J.: Cross-lingual automatic short answer grading. In: The 2nd International Conference on Artificial Intelligence in Education Technology (AIET 2021). Wuhan, China (2021)

    Google Scholar 

  14. Bothmer, K., Schlippe, T.: Investigating natural language processing techniques for a recommendation system to support employers, job seekers and educational institutions. In: The 23rd International Conference on Artificial Intelligence in Education (AIED) (2022)

    Google Scholar 

  15. Bothmer, K., Schlippe, T.: Skill Scanner: Connecting and supporting employers, job seekers and educational institutions with an AI-based recommendation system. In: Proceedings of The Learning Ideas Conference 2022 (15th Annual Conference), New York, 15–17 June (2022)

    Google Scholar 

  16. Schlippe, T., Sawatzki, J.: AI-based multilingual interactive exam preparation. In: Guralnick, D., Auer, M.E., Poce, A. (eds.) TLIC 2021. LNNS, vol. 349, pp. 396–408. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-90677-1_38

    Chapter  Google Scholar 

  17. Wölfel, M.: Towards the automatic generation of pedagogical conversational agents from lecture slides. In: International Conference on Multimedia Technology and Enhanced Learning (2021)

    Google Scholar 

  18. Ou, C., Joyner, D.A., Goel, A.K.: Designing and developing video lessons for online learning: A seven-principle model. Online Learn. 23(2), 82–104 (2019)

    Article  Google Scholar 

  19. Wang, J., Antonenko, P., Dawson, K.: Does visual attention to the instructor in online video affect learning and learner perceptions? An eye-tracking analysis. Comput. Educ. 146 (2020). https://doi.org/10.1016/j.compedu.2019.103779

  20. Perego, E., Del Missier, F., Porta, M., Mosconi, M.: The cognitive effectiveness of subtitle processing. Media Psychol. 13, 243–272 (2010)

    Article  Google Scholar 

  21. Linebarger, D.L.: Learning to read from television: The effects of using captions and narration. J. Educ. Psychol. 93, 288–298 (2001)

    Article  Google Scholar 

  22. Bowe, F.G., Kaufman, A.: Captioned Media: Teacher Perceptions of Potential Value for Students with No Hearing Impairments: A National Survey of Special Educators. Described and Captioned Media Program, Spartanburg, SC (2001)

    Google Scholar 

  23. Guo, P.J., Kim, J., Rubin, R.: How video production affects student engagement: An empirical study of MOOC videos. In: L@S’14: Proceedings of the First ACM Conference on Learning. March 2014, pp. 41–50 (2014). https://doi.org/10.1145/2556325.2566239

  24. Alfayez, Z.H.: Designing educational videos for university websites based on students’ preferences. Online Learn. 25(2), 280–298 (2021)

    Article  Google Scholar 

  25. Persson, J.R., Wattengård, E., Lilledahl, M.B.: The effect of captions and written text on viewing behavior in educational videos. Int. J. Math Sci. Technol. Educ. 7(1), 124–147 (2019)

    Google Scholar 

  26. Vy, Q.V., Fels, D.I.: Using placement and name for speaker identification in captioning. In: Miesenberger, K., Klaus, J., Zagler, W., Karshmer, A. (eds.) ICCHP 2010. LNCS, vol. 6179, pp. 247–254. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14097-6_40

    Chapter  Google Scholar 

  27. Brown, A., et al.: Dynamic subtitles: The user experience. In: TVX (2015)

    Google Scholar 

  28. Fox, W.: Integrated titles: An improved viewing experience. In: Eyetracking and Applied Linguistics (2016)

    Google Scholar 

  29. Ohene-Djan, J., Wright, J., Combie-Smith, K.: Emotional subtitles: A system and potential applications for deaf and hearing impaired people. In: CVHI (2007)

    Google Scholar 

  30. Rashid, R., Aitken, J., Fels, D.: Expressing emotions using animated text captions. Web Design for Dyslexics: Accessibility of Arabic Content (2006)

    Google Scholar 

  31. Bessemans, A., Renckens, M., Bormans, K., Nuyts, E., Larson, K.: Visual prosody supports reading aloud expressively. Visible Lang. 53, 28–49 (2019)

    Google Scholar 

  32. Gernsbacher, M.: Video captions benefit everyone. Policy Insights Behav. Brain Sci. 2, 195–202 (2015)

    Article  Google Scholar 

  33. El-Taweel, G.: Conveying emotions in Arabic SDH: The case of pride and prejudice. Master thesis, Hamad Bin Khalifa University (2016)

    Google Scholar 

  34. de Lacerda Pataca, C., Costa, P.D.P.: Speech modulated typography: Towards an affective representation model. In: International Conference on Intelligent User Interfaces, pp. 139–143 (2020)

    Google Scholar 

  35. de Lacerda Pataca, C., Dornhofer Paro Costa, P.: Hidden bawls, whispers, and yelps: Can text be made to sound more than just its words? (2022). arXiv:2202.10631

  36. Bringhurst, R.: The elements of typographic style, vol. 3.2, pp. 55–56. Hartley and Marks Publishers (2008)

    Google Scholar 

  37. Unger, G.: Wie man’s liest, pp. 63–65. Niggli Verlag (2006)

    Google Scholar 

  38. Bai, Q., Dan, Q., Mu, Z., Yang, M.: A systematic review of emoji: Current research and future perspectives. Front. Psychol. 10, 2221 (2019). https://doi.org/10.3389/fpsyg.2019.02221

  39. Rayner, S.G.: Cognitive styles and learning styles. In: Wright, J.D. (ed.) International Encyclopedia of Social and Behavioral Sciences, vol. 4, 2nd edn, pp. 110–117. Elsevier, Oxford (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tim Schlippe .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Schlippe, T., Fritsche, K., Sun, Y., Wölfel, M. (2023). AI-Based Visualization of Voice Characteristics in Lecture Videos’ Captions. In: Cheng, E.C.K., Wang, T., Schlippe, T., Beligiannis, G.N. (eds) Artificial Intelligence in Education Technologies: New Development and Innovative Practices. AIET 2022. Lecture Notes on Data Engineering and Communications Technologies, vol 154. Springer, Singapore. https://doi.org/10.1007/978-981-19-8040-4_8

Download citation

  • DOI: https://doi.org/10.1007/978-981-19-8040-4_8

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-19-8039-8

  • Online ISBN: 978-981-19-8040-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics