Abstract
This paper introduces a lecture video corpus, Autoblog 2020. With the increase of online learning in universities, there is a demand for a systematic toolchain development for lecture video processing. However, the existing lecture video corpus does not satisfy the requirement for such tasks, and lecture transcription and analyses are relatively unexplored areas in speech and natural language research. Autoblog 2020 Corpus is developed towards the end goal of free video-to-blog post conversion software that supports making video presentations more accessible. It will include automatic editing of disfluencies, automatic speech recognition (ASR), and spoken term extraction so that researchers can process and share their contents more efficiently. In this paper, we present a description of the corpus, linguistic analyses and preliminary experiment results regarding ASR, keyword extraction, and segmentation. The results will be used in future work to develop a video-to-blog post conversion.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
References
Afouras, T., Chung, J.S., Senior, A., Vinyals, O., Zisserman, A.: Deep audio-visual speech recognition. IEEE Trans. Pattern Anal. Mach. Intell. (2018)
Alberti, C., Bacchiani, M.: Automatic captioning in youtube (2009). https://ai.googleblog.com/2009/12/automatic-captioning-in-youtube.html. Accessed 09 June 2021
Campos, R., Mangaravite, V., Pasquali, A., Jorge, A., Nunes, C., Jatowt, A.: Yake! keyword extraction from single documents using multiple local features. Inf. Sci. 509, 257–289 (2020)
Chen, Y.N., Huang, Y., Kong, S.Y., Lee, L.S.: Automatic key term extraction from spoken course lectures using branching entropy and prosodic/semantic features. In: 2010 IEEE Spoken Language Technology Workshop, pp. 265–270. IEEE (2010)
Handa, A., Agarwal, R., Kohli, N.: A multimodel keyword spotting system based on lip movement and speech features. Multimedia Tools Appl. 79(27), 20461–20481 (2020)
Hernandez, F., Nguyen, V., Ghannay, S., Tomashenko, N., Estève, Y.: TED-LIUM 3: twice as much data and corpus repartition for experiments on speaker adaptation. In: Karpov, A., Jokisch, O., Potapova, R. (eds.) SPECOM 2018. LNCS (LNAI), vol. 11096, pp. 198–208. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99579-3_21
Kogure, S., Nishizaki, H., Tsuchiya, M., Yamamoto, K., Togashi, S., Nakagawa, S.: Speech recognition performance of CJLC: corpus of Japanese lecture contents. In: Ninth Annual Conference of the International Speech Communication Association (2008)
Koka, R.S., Chowdhury, F.N., Rahman, M.R., Solorio, T., Subhlok, J.: Automatic identification of keywords in lecture video segments. In: 2020 IEEE International Symposium on Multimedia (ISM), pp. 162–165. IEEE (2020)
Lee, G.C., Yeh, F.-H., Chen, Y.-J., Chang, T.-K.: Robust handwriting extraction and lecture video summarization. Multimedia Tools Appl. 76(5), 7067–7085 (2016). https://doi.org/10.1007/s11042-016-3353-y
Momeni, L., Afouras, T., Stafylakis, T., Albanie, S., Zisserman, A.: Seeing wake words: audio-visual keyword spotting. In: The 31st British Machine Vision Conference (2020)
Park, A., Hazen, T.J., Glass, J.R.: Automatic processing of audio lectures for information retrieval: vocabulary selection and language modeling. In: Proceedings (ICASSP 2005) IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. I-497. IEEE (2005)
Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
Petridis, S., Stafylakis, T., Ma, P., Cai, F., Tzimiropoulos, G., Pantic, M.: End-to-end audiovisual speech recognition. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6548–6552. IEEE (2018)
Raine, P.: Building sentences with web 2.0 and the tatoeba database. Accents Asia 10(2), 2–7 (2018)
Salton, G., McGill, M.J.: Introduction to modern information retrieval (1986)
Schwarz, M., Scherrer, A., Hohmann, C., Heiberg, J., Brugger, A., Nuñez-Jimenez, A.: COVID-19 and the academy: it is time for going digital. Energy Res. Soc. Sci. 68, 101684 (2020)
Shimada, A., Okubo, F., Yin, C., Ogata, H.: Automatic summarization of lecture slides for enhanced student preview technical report and user study. IEEE Trans. Learn. Technol. 11(2), 165–178 (2017)
Subudhi, B.N., Veerakumar, T., Yadav, D., Suryavanshi, A.P., Disha, S.: Video skimming for lecture video sequences using histogram based low level features. In: 2017 IEEE 7th International Advance Computing Conference (IACC), pp. 684–689. IEEE (2017)
Tao, F., Busso, C.: End-to-end audiovisual speech recognition system with multitask learning. IEEE Trans. Multimedia 23, 1–11 (2020)
Trancoso, I., Martins, R., Moniz, H., Mata, A.I., Viana, M.C.: The lectra corpus-classroom lecture transcriptions in European portuguese. Econ. Theor. 1(17), 15-1 (2008)
Tsuchiya, M., Kogure, S., Nishizaki, H., Ohta, K., Nakagawa, S.: Developing corpus of Japanese classroom lecture speech contents. In: LREC (2008)
Acknowledgments
This work was supported by the Deutscher Akademischer Austauschdienst (DAAD) in the International Programmes Digital (IP Digital).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Hernandez, A., Yang, S.H. (2021). Multimodal Corpus Analysis of Autoblog 2020: Lecture Videos in Machine Learning. In: Karpov, A., Potapova, R. (eds) Speech and Computer. SPECOM 2021. Lecture Notes in Computer Science(), vol 12997. Springer, Cham. https://doi.org/10.1007/978-3-030-87802-3_24
Download citation
DOI: https://doi.org/10.1007/978-3-030-87802-3_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-87801-6
Online ISBN: 978-3-030-87802-3
eBook Packages: Computer ScienceComputer Science (R0)