What Speech Recognition Accuracy is Needed for Video Transcripts to be a Useful Search Interface?

Chao, Yang; Bourguet, Marie-Luce

doi:10.1007/978-3-319-66429-3_82

Yang Chao¹⁶ &
Marie-Luce Bourguet¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10458))

Included in the following conference series:

International Conference on Speech and Computer

2212 Accesses

Abstract

Informative videos (e.g. recorded lectures) are increasingly being made available online, but they are difficult to use, browse and search. Nowadays, popular platforms let users search and navigate videos via a transcript, which, in order to guarantee a satisfactory level of word accuracy, has typically been generated using some manual inputs. The goal of our work is to try and take a step closer to the fully automatic generation of informative video transcripts based on current automatic speech recognition technology. We present a user study designed to better understand viewers’ use of video transcripts for searching a video content, with the aim of estimating what minimum word recognition accuracy is needed for video captions to be a useful search interface. We found that transcripts with 70% word recognition accuracy are as effective as 100% accuracy transcripts in supporting video search when using single word search. We also found that there are large variations in the time it takes to search a video, independently of the quality of the transcript. With adequate and adapted search strategies, even low accuracy transcripts can support quick video search.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A video indexing and retrieval computational prototype based on transcribed speech

Article 30 August 2021

An Investigation of Cross-Language Information Retrieval for User-Generated Internet Video

“Hey, vitrivr!” – A Multimodal UI for Video Retrieval

References

TED Homepage. http://www.ted.com/. Last Accessed 12 Apr 2017
edX Homepage. http://www.edx.org. Last Accessed 12 Apr 2017
Coursera Homepage. http://www.coursera.org/. Last Accessed 12 Apr 2017
Breslow, L.B., Pritchard, D.E., DeBoer, J., Stump, G.S., Ho, A.D., Seaton, D.T.: Studying learning in the worldwide classroom: Research into edX’s first MOOC. Res. Pract. Assess. 8, 13–25 (2013)
Google Scholar
Kim, J., Li, S.W., Cai, C.J., Gajos, K.Z., Miller, R.C.: Leveraging video interaction data and content analysis to improve video learning. In: Proceedings of the CHI 2014, Learning Innovation at Scale workshop, pp. 31–40 (2014)
Google Scholar
Guo, P.J., Kim, J., Rubin, R.: How video production affects student engagement: an empirical study of MOOC videos. In: Proceedings of the first ACM Learning@scale Conference, pp. 41–50. ACM (2014)
Google Scholar
Pavel, A., Reed, C., Hartmann, B., Agrawala, M.: Video digests: a browsable, skimmable format for informational lecture videos. In: Proceedings of UIST 2014, 5–8 October, Honolulu, USA (2014)
Google Scholar
Victor, B.: April 2013. http://worrydream.com/MediaForThinkingTheUnthinkable. Last Accessed 12 Apr 2017
WebAim Homepage. http://webaim.org/techniques/captions/. Last Accessed 12 Apr 2017
CaptionSync Homepage. http://www.automaticsync.com/captionsync/. Last Accessed 12 Apr 2017
PlayMedia Homepage. http://www.3playmedia.com/. Last Accessed 12 Apr 2017
YouTube Homepage. https://www.youtube.com/. Last Accessed 12 Apr 2017
GoogleSpeech Homepage. https://cloud.google.com/speech/. Last Accessed 12 Apr 2017
Miró, J.D., Silvestre-Cerdà, J.A., Civera, J., Turró, C., Juan, A.: Efficiency and usability study of innovative computer-aided transcription strategies for video lecture repositories. Speech Commun. 74, 65–75 (2015)
Article Google Scholar
Ranchal, R., Taber-Doughty, T., Guo, Y., Bain, K., Martin, H., Robinson, J.P., Duerstock, B.S.: Using speech recognition for real-time captioning and lecture transcription in the classroom. IEEE Trans. Learn. Technol. 6(4), 299–311 (2013)
Article Google Scholar
Sphinx Homepage. http://cmusphinx.sourceforge.net/. Last Accessed 12 Apr 2017
WhiteHouse Homepage. https://www.whitehouse.gov/. Last Accessed 12 Apr 2017

Download references

Author information

Authors and Affiliations

Beijing University of Posts and Telecommunication, Beijing, China
Yang Chao
Queen Mary University of London, London, UK
Marie-Luce Bourguet

Authors

Yang Chao
View author publications
You can also search for this author in PubMed Google Scholar
Marie-Luce Bourguet
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marie-Luce Bourguet .

Editor information

Editors and Affiliations

SPIIRAS, Saint Petersburg, Russia
Alexey Karpov
Moscow State Linguistic University, Moscow, Russia
Rodmonga Potapova
University of Hertfordshire, Hatfield, United Kingdom
Iosif Mporas

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chao, Y., Bourguet, ML. (2017). What Speech Recognition Accuracy is Needed for Video Transcripts to be a Useful Search Interface?. In: Karpov, A., Potapova, R., Mporas, I. (eds) Speech and Computer. SPECOM 2017. Lecture Notes in Computer Science(), vol 10458. Springer, Cham. https://doi.org/10.1007/978-3-319-66429-3_82

Download citation

DOI: https://doi.org/10.1007/978-3-319-66429-3_82
Published: 13 August 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-66428-6
Online ISBN: 978-3-319-66429-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

What Speech Recognition Accuracy is Needed for Video Transcripts to be a Useful Search Interface?

Abstract

Access this chapter

Similar content being viewed by others

A video indexing and retrieval computational prototype based on transcribed speech

An Investigation of Cross-Language Information Retrieval for User-Generated Internet Video

“Hey, vitrivr!” – A Multimodal UI for Video Retrieval

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

What Speech Recognition Accuracy is Needed for Video Transcripts to be a Useful Search Interface?

Abstract

Access this chapter

Similar content being viewed by others

A video indexing and retrieval computational prototype based on transcribed speech

An Investigation of Cross-Language Information Retrieval for User-Generated Internet Video

“Hey, vitrivr!” – A Multimodal UI for Video Retrieval

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation