Segmentation and Annotation of Audiovisual Recordings Based on Automated Speech Recognition

Repp, Stephan; Waitelonis, Jörg; Sack, Harald; Meinel, Christoph

doi:10.1007/978-3-540-77226-2_63

Stephan Repp¹,
Jörg Waitelonis²,
Harald Sack² &
…
Christoph Meinel¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4881))

Included in the following conference series:

International Conference on Intelligent Data Engineering and Automated Learning

3180 Accesses
15 Citations

Abstract

Searching multimedia data in particular audiovisual data is still a challenging task to fulfill. The number of digital video recordings has increased dramatically as recording technology has become more affordable and network infrastructure has become easy enough to provide download and streaming solutions. But, the accessibility and traceability of its content for further use is still rather limited. In our paper we are describing and evaluating a new approach to synchronizing auxiliary text-based material as, e. g. presentation slides with lecture video recordings. Our goal is to show that the tentative transliteration is sufficient for synchronization. Different approaches to synchronize textual material with deficient transliterations of lecture recordings are discussed and evaluated in this paper. Our evaluation data-set is based on different languages and various speakers’ recordings.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Beeferman, D., Berger, A., Lafferty, J.D.: Statistical models for text segmentation. Machine Learning 34(1-3), 177–210 (1999)
Article MATH Google Scholar
Chen, Y., Heng, W.J.: Automatic synchronization of speech transcript and slides in presentation. In: ISCAS. Proceedings of the IEEE International Symposium on Circuits and Systems, Circuits and Systems Society, pp. 568–571 (May 2003)
Google Scholar
Choi, F.Y.Y.: Advances in domain independent linear text segmentation. In: Proceedings of NAACL 2000 (2000)
Google Scholar
Chu, W.-T., Chen, H.-Y.: Cross-media correlation: a case study of navigated hypermedia documents. In: MULTIMEDIA 2002. Proceedings of the tenth ACM international conference on Multimedia, pp. 57–66. ACM Press, New York, USA (2002)
Chapter Google Scholar
Galley, M., McKeown, K., Fosler-Lussier, E., Jing, H.: Discourse segmentation of multi-party conversation. In: ACL, pp. 562–569 (2003)
Google Scholar
Gross, R., Bett, M., Yu, H., Zhu, X., Pan, Y., Yang, J., Waibel, A.: Towards a multimodal meeting record. In: IEEE International Conference on Multimedia and Expo (III), pp. 1593–1596 (2000)
Google Scholar
Haubold, A., Kender, J.R.: Augmented segmentation and visualization for presentation videos. ACM Multimedia, 51–60 (2005)
Google Scholar
Hearst, M.A.: Texttiling: Segmenting text into multi-paragraph subtopic passages. Computational Linguistics 23(1), 33–64 (1997)
Google Scholar
Hsueh, P., Moore, J.: Automatic topic segmentation and lablelling in multiparty dialogue. In: First IEEE/ACM workshop on Spoken Language Technology (SLT), Aruba, IEEE Computer Society, Los Alamitos (2006)
Google Scholar
Hürst, W., Kreuzer, T., Wiesenhütter, M.: A qualitative study towards using large vocabulary automatic speech recognition to index recorded presentations for search and access over the web. In: IADIS Internatinal Conference WWW/Internet (ICWI), pp. 135–143 (2002)
Google Scholar
Li, M., Ma, B., Wang, L.: Finding similar regions in many sequences. J. Comput. Syst. Sci. 65(1), 73–96 (2002)
Article MathSciNet Google Scholar
Myers, G.: A fast bit-vector algorithm for approximate string matching based on dynamic programming. J. ACM 46(3), 395–415 (1999)
Article MATH MathSciNet Google Scholar
Ney, H., Ortmanns, S.: Progress in dynamic programming search for lvcsr. Proceedings of the IEEE 88(8), 1224–1240 (2000)
Article Google Scholar
Ngo, C.-W., Wang, F., Pong, T.-C.: Structuring lecture videos for distance learning applications. In: ISMSE. Proceedings of the Multimedia Software Engineering, pp. 215–222 (December 2003)
Google Scholar
Pevzner, L., Hearst, M.A.: A critique and improvement of an evaluation metric for text segmentation. Computational Linguistics 28(1), 19–36 (2002)
Article Google Scholar
Repp, S., Meinel, C.: Segmenting of recorded lecture videos - the algorithm voiceseg. In: ICETE. Proceedings of the 1th Signal Processing and Multimedia Applications, pp. 317–322 (August 2006)
Google Scholar
Repp, S., Meinel, C.: Semantic indexing for recorded educational lecture videos. In: PERCOMW 2006, Washington, DC, USA, pp. 240–245 (2006)
Google Scholar
Sack, H., Waitelonis, J.: Integrating social tagging and document annotation for content-based search in multimedia data. In: SAAW 2006. Proc. of the 1st Semantic Authoring and Annotation Workshop, Athens (GA), USA (2006)
Google Scholar
Yamamoto, N., Ogata, J., Ariki, Y.: Topic segmentation and retrieval system for lecture videos based on spontaneous speech recognition. In: EUROSPEECH. Proceedings of the 8th European Conference on Speech Communication and Technology, pp. 961–964 (September 2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Hasso-Plattner-Institut für Softwaresystemtechnik GmbH (HPI), P.O. Box 900460, D-14440 Potsdam, Germany
Stephan Repp & Christoph Meinel
Friedrich-Schiller-Universität Jena, Ernst-Abbe-Platz 2-4, D-07743 Jena, Germany
Jörg Waitelonis & Harald Sack

Authors

Stephan Repp
View author publications
You can also search for this author in PubMed Google Scholar
Jörg Waitelonis
View author publications
You can also search for this author in PubMed Google Scholar
Harald Sack
View author publications
You can also search for this author in PubMed Google Scholar
Christoph Meinel
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Hujun Yin Peter Tino Emilio Corchado Will Byrne Xin Yao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Repp, S., Waitelonis, J., Sack, H., Meinel, C. (2007). Segmentation and Annotation of Audiovisual Recordings Based on Automated Speech Recognition. In: Yin, H., Tino, P., Corchado, E., Byrne, W., Yao, X. (eds) Intelligent Data Engineering and Automated Learning - IDEAL 2007. IDEAL 2007. Lecture Notes in Computer Science, vol 4881. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77226-2_63

Download citation

DOI: https://doi.org/10.1007/978-3-540-77226-2_63
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-77225-5
Online ISBN: 978-3-540-77226-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics