Skip to main content

Multimodal Corpus Analysis of Autoblog 2020: Lecture Videos in Machine Learning

  • Conference paper
  • First Online:
Speech and Computer (SPECOM 2021)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12997))

Included in the following conference series:


This paper introduces a lecture video corpus, Autoblog 2020. With the increase of online learning in universities, there is a demand for a systematic toolchain development for lecture video processing. However, the existing lecture video corpus does not satisfy the requirement for such tasks, and lecture transcription and analyses are relatively unexplored areas in speech and natural language research. Autoblog 2020 Corpus is developed towards the end goal of free video-to-blog post conversion software that supports making video presentations more accessible. It will include automatic editing of disfluencies, automatic speech recognition (ASR), and spoken term extraction so that researchers can process and share their contents more efficiently. In this paper, we present a description of the corpus, linguistic analyses and preliminary experiment results regarding ASR, keyword extraction, and segmentation. The results will be used in future work to develop a video-to-blog post conversion.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others


  1. 1.


  1. Afouras, T., Chung, J.S., Senior, A., Vinyals, O., Zisserman, A.: Deep audio-visual speech recognition. IEEE Trans. Pattern Anal. Mach. Intell. (2018)

    Google Scholar 

  2. Alberti, C., Bacchiani, M.: Automatic captioning in youtube (2009). Accessed 09 June 2021

  3. Campos, R., Mangaravite, V., Pasquali, A., Jorge, A., Nunes, C., Jatowt, A.: Yake! keyword extraction from single documents using multiple local features. Inf. Sci. 509, 257–289 (2020)

    Article  Google Scholar 

  4. Chen, Y.N., Huang, Y., Kong, S.Y., Lee, L.S.: Automatic key term extraction from spoken course lectures using branching entropy and prosodic/semantic features. In: 2010 IEEE Spoken Language Technology Workshop, pp. 265–270. IEEE (2010)

    Google Scholar 

  5. Handa, A., Agarwal, R., Kohli, N.: A multimodel keyword spotting system based on lip movement and speech features. Multimedia Tools Appl. 79(27), 20461–20481 (2020)

    Article  Google Scholar 

  6. Hernandez, F., Nguyen, V., Ghannay, S., Tomashenko, N., Estève, Y.: TED-LIUM 3: twice as much data and corpus repartition for experiments on speaker adaptation. In: Karpov, A., Jokisch, O., Potapova, R. (eds.) SPECOM 2018. LNCS (LNAI), vol. 11096, pp. 198–208. Springer, Cham (2018).

    Chapter  Google Scholar 

  7. Kogure, S., Nishizaki, H., Tsuchiya, M., Yamamoto, K., Togashi, S., Nakagawa, S.: Speech recognition performance of CJLC: corpus of Japanese lecture contents. In: Ninth Annual Conference of the International Speech Communication Association (2008)

    Google Scholar 

  8. Koka, R.S., Chowdhury, F.N., Rahman, M.R., Solorio, T., Subhlok, J.: Automatic identification of keywords in lecture video segments. In: 2020 IEEE International Symposium on Multimedia (ISM), pp. 162–165. IEEE (2020)

    Google Scholar 

  9. Lee, G.C., Yeh, F.-H., Chen, Y.-J., Chang, T.-K.: Robust handwriting extraction and lecture video summarization. Multimedia Tools Appl. 76(5), 7067–7085 (2016).

    Article  Google Scholar 

  10. Momeni, L., Afouras, T., Stafylakis, T., Albanie, S., Zisserman, A.: Seeing wake words: audio-visual keyword spotting. In: The 31st British Machine Vision Conference (2020)

    Google Scholar 

  11. Park, A., Hazen, T.J., Glass, J.R.: Automatic processing of audio lectures for information retrieval: vocabulary selection and language modeling. In: Proceedings (ICASSP 2005) IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. I-497. IEEE (2005)

    Google Scholar 

  12. Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)

    Google Scholar 

  13. Petridis, S., Stafylakis, T., Ma, P., Cai, F., Tzimiropoulos, G., Pantic, M.: End-to-end audiovisual speech recognition. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6548–6552. IEEE (2018)

    Google Scholar 

  14. Raine, P.: Building sentences with web 2.0 and the tatoeba database. Accents Asia 10(2), 2–7 (2018)

    Google Scholar 

  15. Salton, G., McGill, M.J.: Introduction to modern information retrieval (1986)

    Google Scholar 

  16. Schwarz, M., Scherrer, A., Hohmann, C., Heiberg, J., Brugger, A., Nuñez-Jimenez, A.: COVID-19 and the academy: it is time for going digital. Energy Res. Soc. Sci. 68, 101684 (2020)

    Google Scholar 

  17. Shimada, A., Okubo, F., Yin, C., Ogata, H.: Automatic summarization of lecture slides for enhanced student preview technical report and user study. IEEE Trans. Learn. Technol. 11(2), 165–178 (2017)

    Article  Google Scholar 

  18. Subudhi, B.N., Veerakumar, T., Yadav, D., Suryavanshi, A.P., Disha, S.: Video skimming for lecture video sequences using histogram based low level features. In: 2017 IEEE 7th International Advance Computing Conference (IACC), pp. 684–689. IEEE (2017)

    Google Scholar 

  19. Tao, F., Busso, C.: End-to-end audiovisual speech recognition system with multitask learning. IEEE Trans. Multimedia 23, 1–11 (2020)

    Article  Google Scholar 

  20. Trancoso, I., Martins, R., Moniz, H., Mata, A.I., Viana, M.C.: The lectra corpus-classroom lecture transcriptions in European portuguese. Econ. Theor. 1(17), 15-1 (2008)

    Google Scholar 

  21. Tsuchiya, M., Kogure, S., Nishizaki, H., Ohta, K., Nakagawa, S.: Developing corpus of Japanese classroom lecture speech contents. In: LREC (2008)

    Google Scholar 

Download references


This work was supported by the Deutscher Akademischer Austauschdienst (DAAD) in the International Programmes Digital (IP Digital).

Author information

Authors and Affiliations


Corresponding author

Correspondence to Seung Hee Yang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hernandez, A., Yang, S.H. (2021). Multimodal Corpus Analysis of Autoblog 2020: Lecture Videos in Machine Learning. In: Karpov, A., Potapova, R. (eds) Speech and Computer. SPECOM 2021. Lecture Notes in Computer Science(), vol 12997. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-87801-6

  • Online ISBN: 978-3-030-87802-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics