Multimodal Corpus Analysis of Autoblog 2020: Lecture Videos in Machine Learning

Hernandez, Abner; Yang, Seung Hee

doi:10.1007/978-3-030-87802-3_24

Abner Hernandez¹⁰ &
Seung Hee Yang¹¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12997))

Included in the following conference series:

International Conference on Speech and Computer

1699 Accesses
2 Citations

Abstract

This paper introduces a lecture video corpus, Autoblog 2020. With the increase of online learning in universities, there is a demand for a systematic toolchain development for lecture video processing. However, the existing lecture video corpus does not satisfy the requirement for such tasks, and lecture transcription and analyses are relatively unexplored areas in speech and natural language research. Autoblog 2020 Corpus is developed towards the end goal of free video-to-blog post conversion software that supports making video presentations more accessible. It will include automatic editing of disfluencies, automatic speech recognition (ASR), and spoken term extraction so that researchers can process and share their contents more efficiently. In this paper, we present a description of the corpus, linguistic analyses and preliminary experiment results regarding ASR, keyword extraction, and segmentation. The results will be used in future work to develop a video-to-blog post conversion.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://autoblog.tf.fau.de.

References

Afouras, T., Chung, J.S., Senior, A., Vinyals, O., Zisserman, A.: Deep audio-visual speech recognition. IEEE Trans. Pattern Anal. Mach. Intell. (2018)
Google Scholar
Alberti, C., Bacchiani, M.: Automatic captioning in youtube (2009). https://ai.googleblog.com/2009/12/automatic-captioning-in-youtube.html. Accessed 09 June 2021
Campos, R., Mangaravite, V., Pasquali, A., Jorge, A., Nunes, C., Jatowt, A.: Yake! keyword extraction from single documents using multiple local features. Inf. Sci. 509, 257–289 (2020)
Article Google Scholar
Chen, Y.N., Huang, Y., Kong, S.Y., Lee, L.S.: Automatic key term extraction from spoken course lectures using branching entropy and prosodic/semantic features. In: 2010 IEEE Spoken Language Technology Workshop, pp. 265–270. IEEE (2010)
Google Scholar
Handa, A., Agarwal, R., Kohli, N.: A multimodel keyword spotting system based on lip movement and speech features. Multimedia Tools Appl. 79(27), 20461–20481 (2020)
Article Google Scholar
Hernandez, F., Nguyen, V., Ghannay, S., Tomashenko, N., Estève, Y.: TED-LIUM 3: twice as much data and corpus repartition for experiments on speaker adaptation. In: Karpov, A., Jokisch, O., Potapova, R. (eds.) SPECOM 2018. LNCS (LNAI), vol. 11096, pp. 198–208. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99579-3_21
Chapter Google Scholar
Kogure, S., Nishizaki, H., Tsuchiya, M., Yamamoto, K., Togashi, S., Nakagawa, S.: Speech recognition performance of CJLC: corpus of Japanese lecture contents. In: Ninth Annual Conference of the International Speech Communication Association (2008)
Google Scholar
Koka, R.S., Chowdhury, F.N., Rahman, M.R., Solorio, T., Subhlok, J.: Automatic identification of keywords in lecture video segments. In: 2020 IEEE International Symposium on Multimedia (ISM), pp. 162–165. IEEE (2020)
Google Scholar
Lee, G.C., Yeh, F.-H., Chen, Y.-J., Chang, T.-K.: Robust handwriting extraction and lecture video summarization. Multimedia Tools Appl. 76(5), 7067–7085 (2016). https://doi.org/10.1007/s11042-016-3353-y
Article Google Scholar
Momeni, L., Afouras, T., Stafylakis, T., Albanie, S., Zisserman, A.: Seeing wake words: audio-visual keyword spotting. In: The 31st British Machine Vision Conference (2020)
Google Scholar
Park, A., Hazen, T.J., Glass, J.R.: Automatic processing of audio lectures for information retrieval: vocabulary selection and language modeling. In: Proceedings (ICASSP 2005) IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. I-497. IEEE (2005)
Google Scholar
Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
Google Scholar
Petridis, S., Stafylakis, T., Ma, P., Cai, F., Tzimiropoulos, G., Pantic, M.: End-to-end audiovisual speech recognition. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6548–6552. IEEE (2018)
Google Scholar
Raine, P.: Building sentences with web 2.0 and the tatoeba database. Accents Asia 10(2), 2–7 (2018)
Google Scholar
Salton, G., McGill, M.J.: Introduction to modern information retrieval (1986)
Google Scholar
Schwarz, M., Scherrer, A., Hohmann, C., Heiberg, J., Brugger, A., Nuñez-Jimenez, A.: COVID-19 and the academy: it is time for going digital. Energy Res. Soc. Sci. 68, 101684 (2020)
Google Scholar
Shimada, A., Okubo, F., Yin, C., Ogata, H.: Automatic summarization of lecture slides for enhanced student preview technical report and user study. IEEE Trans. Learn. Technol. 11(2), 165–178 (2017)
Article Google Scholar
Subudhi, B.N., Veerakumar, T., Yadav, D., Suryavanshi, A.P., Disha, S.: Video skimming for lecture video sequences using histogram based low level features. In: 2017 IEEE 7th International Advance Computing Conference (IACC), pp. 684–689. IEEE (2017)
Google Scholar
Tao, F., Busso, C.: End-to-end audiovisual speech recognition system with multitask learning. IEEE Trans. Multimedia 23, 1–11 (2020)
Article Google Scholar
Trancoso, I., Martins, R., Moniz, H., Mata, A.I., Viana, M.C.: The lectra corpus-classroom lecture transcriptions in European portuguese. Econ. Theor. 1(17), 15-1 (2008)
Google Scholar
Tsuchiya, M., Kogure, S., Nishizaki, H., Ohta, K., Nakagawa, S.: Developing corpus of Japanese classroom lecture speech contents. In: LREC (2008)
Google Scholar

Download references

Acknowledgments

This work was supported by the Deutscher Akademischer Austauschdienst (DAAD) in the International Programmes Digital (IP Digital).

Author information

Authors and Affiliations

Pattern Recognition Lab, Friedrich Alexander University Erlangen-Nürnberg, Erlangen, Germany
Abner Hernandez
Department of Artificial Intelligence in Biomedical Engineering, Friedrich Alexander University Erlangen-Nürnberg, Erlangen, Germany
Seung Hee Yang

Authors

Abner Hernandez
View author publications
You can also search for this author in PubMed Google Scholar
Seung Hee Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Seung Hee Yang .

Editor information

Editors and Affiliations

St. Petersburg Federal Research Center of the Russian Academy of Sciences, St. Petersburg, Russia
Alexey Karpov
Moscow State Linguistic University, Moscow, Russia
Rodmonga Potapova

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hernandez, A., Yang, S.H. (2021). Multimodal Corpus Analysis of Autoblog 2020: Lecture Videos in Machine Learning. In: Karpov, A., Potapova, R. (eds) Speech and Computer. SPECOM 2021. Lecture Notes in Computer Science(), vol 12997. Springer, Cham. https://doi.org/10.1007/978-3-030-87802-3_24

Download citation

DOI: https://doi.org/10.1007/978-3-030-87802-3_24
Published: 22 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-87801-6
Online ISBN: 978-3-030-87802-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics