AI-Based Visualization of Voice Characteristics in Lecture Videos’ Captions

Schlippe, Tim; Fritsche, Katrin; Sun, Ying; Wölfel, Matthias

doi:10.1007/978-981-19-8040-4_8

Tim Schlippe⁶,
Katrin Fritsche⁷,
Ying Sun⁷ &
…
Matthias Wölfel⁸

Part of the book series: Lecture Notes on Data Engineering and Communications Technologies ((LNDECT,volume 154))

Included in the following conference series:

International Conference on Artificial Intelligence in Education Technology

622 Accesses

Abstract

More and more educational institutions are making lecture videos available online. Since 100+ empirical studies document that captioning a video improves comprehension of, attention to, and memory for the video [1], it makes sense to provide those lecture videos with captions. However, studies also show that the words themselves contribute only 7% and how we say those words with our tone, intonation, and verbal pace contributes 38% to making messages clear in human communication [2]. Consequently, in this paper, we address the question of whether an AI-based visualization of voice characteristics in captions helps students further improve the watching and learning experience in lecture videos. For the AI-based visualization of the speaker’s voice characteristics in the captions we use the WaveFont technology [3–5], which processes the voice signal and intuitively displays loudness, speed and pauses in the subtitle font. In our survey of 48 students, it could be shown that in all surveyed categories—visualization of voice characteristics, understanding the content, following the content, linguistic understanding, and identifying important words—always a significant majority of the participants prefers the WaveFont captions to watch lecture videos.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
In this research we refer to interlingual translation as subtitles and transcription in the same language as captions.
2.
BMBF funding number: 16DHB3006; running time 1.1.2020–31.12.2022.
3.
http://www.untertitelrichtlinien.de.
4.
https://bbc.github.io/subtitle-guidelines.

References

Gernsbacher, M.A.: Video captions benefit everyone. Policy Insights Behav. Brain Sci. 2(1), 195–202 (2015)
Article Google Scholar
Marteney, J.: Verbal and nonverbal communication. ASCCC Open Educational Resources Initiative (OERI). https://socialsci.libretexts.org/@go/page/67152 (2020)
Wölfel, M., Schlippe, T., Stitz, A.: Voice driven type design. In: International Conference on Speech Technology and Human-Computer Dialog (SpeD), Bucharest, Romania (2015)
Google Scholar
Schlippe, T., Wölfel, M., Stitz, A.: Generation of text from an audio speech signal. US Patent 10043519B2 (2018)
Google Scholar
Schlippe, T., Alessai, S., El-Taweel, G., Wölfel, M., Zaghouani, W.: Visualizing voice characteristics with type design in closed captions for Arabic, International Conference on Cyberworlds (CW 2020), Caen, France (2020)
Google Scholar
United Nations: Sustainable Development Goals: 17 goals to transform our world. https://www.un.org/sustainabledevelopment/sustainable-development-goals (2021)
Correia, A.P., Liu, C., Xu, F.: Evaluating videoconferencing systems for the quality of the educational experience. Distance Educ. 41(4), 429–452 (2020)
Article Google Scholar
Koravuna, S., Surepally, U.K.: Educational gamification and artificial intelligence for promoting digital literacy. Association for Computing Machinery, New York, NY, USA (2020)
Google Scholar
Chen, L., Chen, P., Lin, Z.: Artificial intelligence in education: A review. IEEE Access 8, 75264–75278 (2020). https://doi.org/10.1109/ACCESS.2020.2988510
Rakhmanov, O., Schlippe, T.: Sentiment analysis for Hausa: Classifying students’ comments. The 1st Annual Meeting of the ELRA/ISCA Special Interest Group on Under-Resourced Languages (SIGUL 2022). Marseille, France (2022)
Google Scholar
Libbrecht, P., Declerck, T., Schlippe, T., Mandl, T., Schiffner, D.: NLP for student and teacher: Concept for an AI based information literacy tutoring system. In: The 29th ACM International Conference on Information and Knowledge Management (CIKM2020). Galway, Ireland (2020)
Google Scholar
Sawatzki, J., Schlippe, T., Benner-Wickner, M.: Deep learning techniques for automatic short answer grading: Predicting scores for English and German answers. In: The 2nd International Conference on Artificial Intelligence in Education Technology (AIET 2021). Wuhan, China (2021)
Google Scholar
Schlippe, T., Sawatzki, J.: Cross-lingual automatic short answer grading. In: The 2nd International Conference on Artificial Intelligence in Education Technology (AIET 2021). Wuhan, China (2021)
Google Scholar
Bothmer, K., Schlippe, T.: Investigating natural language processing techniques for a recommendation system to support employers, job seekers and educational institutions. In: The 23rd International Conference on Artificial Intelligence in Education (AIED) (2022)
Google Scholar
Bothmer, K., Schlippe, T.: Skill Scanner: Connecting and supporting employers, job seekers and educational institutions with an AI-based recommendation system. In: Proceedings of The Learning Ideas Conference 2022 (15th Annual Conference), New York, 15–17 June (2022)
Google Scholar
Schlippe, T., Sawatzki, J.: AI-based multilingual interactive exam preparation. In: Guralnick, D., Auer, M.E., Poce, A. (eds.) TLIC 2021. LNNS, vol. 349, pp. 396–408. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-90677-1_38
Chapter Google Scholar
Wölfel, M.: Towards the automatic generation of pedagogical conversational agents from lecture slides. In: International Conference on Multimedia Technology and Enhanced Learning (2021)
Google Scholar
Ou, C., Joyner, D.A., Goel, A.K.: Designing and developing video lessons for online learning: A seven-principle model. Online Learn. 23(2), 82–104 (2019)
Article Google Scholar
Wang, J., Antonenko, P., Dawson, K.: Does visual attention to the instructor in online video affect learning and learner perceptions? An eye-tracking analysis. Comput. Educ. 146 (2020). https://doi.org/10.1016/j.compedu.2019.103779
Perego, E., Del Missier, F., Porta, M., Mosconi, M.: The cognitive effectiveness of subtitle processing. Media Psychol. 13, 243–272 (2010)
Article Google Scholar
Linebarger, D.L.: Learning to read from television: The effects of using captions and narration. J. Educ. Psychol. 93, 288–298 (2001)
Article Google Scholar
Bowe, F.G., Kaufman, A.: Captioned Media: Teacher Perceptions of Potential Value for Students with No Hearing Impairments: A National Survey of Special Educators. Described and Captioned Media Program, Spartanburg, SC (2001)
Google Scholar
Guo, P.J., Kim, J., Rubin, R.: How video production affects student engagement: An empirical study of MOOC videos. In: L@S’14: Proceedings of the First ACM Conference on Learning. March 2014, pp. 41–50 (2014). https://doi.org/10.1145/2556325.2566239
Alfayez, Z.H.: Designing educational videos for university websites based on students’ preferences. Online Learn. 25(2), 280–298 (2021)
Article Google Scholar
Persson, J.R., Wattengård, E., Lilledahl, M.B.: The effect of captions and written text on viewing behavior in educational videos. Int. J. Math Sci. Technol. Educ. 7(1), 124–147 (2019)
Google Scholar
Vy, Q.V., Fels, D.I.: Using placement and name for speaker identification in captioning. In: Miesenberger, K., Klaus, J., Zagler, W., Karshmer, A. (eds.) ICCHP 2010. LNCS, vol. 6179, pp. 247–254. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14097-6_40
Chapter Google Scholar
Brown, A., et al.: Dynamic subtitles: The user experience. In: TVX (2015)
Google Scholar
Fox, W.: Integrated titles: An improved viewing experience. In: Eyetracking and Applied Linguistics (2016)
Google Scholar
Ohene-Djan, J., Wright, J., Combie-Smith, K.: Emotional subtitles: A system and potential applications for deaf and hearing impaired people. In: CVHI (2007)
Google Scholar
Rashid, R., Aitken, J., Fels, D.: Expressing emotions using animated text captions. Web Design for Dyslexics: Accessibility of Arabic Content (2006)
Google Scholar
Bessemans, A., Renckens, M., Bormans, K., Nuyts, E., Larson, K.: Visual prosody supports reading aloud expressively. Visible Lang. 53, 28–49 (2019)
Google Scholar
Gernsbacher, M.: Video captions benefit everyone. Policy Insights Behav. Brain Sci. 2, 195–202 (2015)
Article Google Scholar
El-Taweel, G.: Conveying emotions in Arabic SDH: The case of pride and prejudice. Master thesis, Hamad Bin Khalifa University (2016)
Google Scholar
de Lacerda Pataca, C., Costa, P.D.P.: Speech modulated typography: Towards an affective representation model. In: International Conference on Intelligent User Interfaces, pp. 139–143 (2020)
Google Scholar
de Lacerda Pataca, C., Dornhofer Paro Costa, P.: Hidden bawls, whispers, and yelps: Can text be made to sound more than just its words? (2022). arXiv:2202.10631
Bringhurst, R.: The elements of typographic style, vol. 3.2, pp. 55–56. Hartley and Marks Publishers (2008)
Google Scholar
Unger, G.: Wie man’s liest, pp. 63–65. Niggli Verlag (2006)
Google Scholar
Bai, Q., Dan, Q., Mu, Z., Yang, M.: A systematic review of emoji: Current research and future perspectives. Front. Psychol. 10, 2221 (2019). https://doi.org/10.3389/fpsyg.2019.02221
Rayner, S.G.: Cognitive styles and learning styles. In: Wright, J.D. (ed.) International Encyclopedia of Social and Behavioral Sciences, vol. 4, 2nd edn, pp. 110–117. Elsevier, Oxford (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

IU International University of Applied Sciences, Erfurt, Germany
Tim Schlippe
University Jena, Jena, Germany
Katrin Fritsche & Ying Sun
Karlsruhe University of Applied Sciences, Karlsruhe, Germany
Matthias Wölfel

Authors

Tim Schlippe
View author publications
You can also search for this author in PubMed Google Scholar
Katrin Fritsche
View author publications
You can also search for this author in PubMed Google Scholar
Ying Sun
View author publications
You can also search for this author in PubMed Google Scholar
Matthias Wölfel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tim Schlippe .

Editor information

Editors and Affiliations

Department of Curriculum and Instruction, The Education University of Hong Kong, Tai Po, Hong Kong
Eric C. K. Cheng
Swinburne University of Technology, Melbourne, VIC, Australia
Tianchong Wang
IU International University of Applied Sciences, Erfurt, Germany
Tim Schlippe
Department of Food Science & Technology, University of Patras, Agrinio, Greece
Grigorios N. Beligiannis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Schlippe, T., Fritsche, K., Sun, Y., Wölfel, M. (2023). AI-Based Visualization of Voice Characteristics in Lecture Videos’ Captions. In: Cheng, E.C.K., Wang, T., Schlippe, T., Beligiannis, G.N. (eds) Artificial Intelligence in Education Technologies: New Development and Innovative Practices. AIET 2022. Lecture Notes on Data Engineering and Communications Technologies, vol 154. Springer, Singapore. https://doi.org/10.1007/978-981-19-8040-4_8

Download citation

DOI: https://doi.org/10.1007/978-981-19-8040-4_8
Published: 01 January 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-8039-8
Online ISBN: 978-981-19-8040-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics