Skip to main content

What Speech Recognition Accuracy is Needed for Video Transcripts to be a Useful Search Interface?

  • Conference paper
  • First Online:
Speech and Computer (SPECOM 2017)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10458))

Included in the following conference series:

  • 2212 Accesses

Abstract

Informative videos (e.g. recorded lectures) are increasingly being made available online, but they are difficult to use, browse and search. Nowadays, popular platforms let users search and navigate videos via a transcript, which, in order to guarantee a satisfactory level of word accuracy, has typically been generated using some manual inputs. The goal of our work is to try and take a step closer to the fully automatic generation of informative video transcripts based on current automatic speech recognition technology. We present a user study designed to better understand viewers’ use of video transcripts for searching a video content, with the aim of estimating what minimum word recognition accuracy is needed for video captions to be a useful search interface. We found that transcripts with 70% word recognition accuracy are as effective as 100% accuracy transcripts in supporting video search when using single word search. We also found that there are large variations in the time it takes to search a video, independently of the quality of the transcript. With adequate and adapted search strategies, even low accuracy transcripts can support quick video search.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. TED Homepage. http://www.ted.com/. Last Accessed 12 Apr 2017

  2. edX Homepage. http://www.edx.org. Last Accessed 12 Apr 2017

  3. Coursera Homepage. http://www.coursera.org/. Last Accessed 12 Apr 2017

  4. Breslow, L.B., Pritchard, D.E., DeBoer, J., Stump, G.S., Ho, A.D., Seaton, D.T.: Studying learning in the worldwide classroom: Research into edX’s first MOOC. Res. Pract. Assess. 8, 13–25 (2013)

    Google Scholar 

  5. Kim, J., Li, S.W., Cai, C.J., Gajos, K.Z., Miller, R.C.: Leveraging video interaction data and content analysis to improve video learning. In: Proceedings of the CHI 2014, Learning Innovation at Scale workshop, pp. 31–40 (2014)

    Google Scholar 

  6. Guo, P.J., Kim, J., Rubin, R.: How video production affects student engagement: an empirical study of MOOC videos. In: Proceedings of the first ACM Learning@scale Conference, pp. 41–50. ACM (2014)

    Google Scholar 

  7. Pavel, A., Reed, C., Hartmann, B., Agrawala, M.: Video digests: a browsable, skimmable format for informational lecture videos. In: Proceedings of UIST 2014, 5–8 October, Honolulu, USA (2014)

    Google Scholar 

  8. Victor, B.: April 2013. http://worrydream.com/MediaForThinkingTheUnthinkable. Last Accessed 12 Apr 2017

  9. WebAim Homepage. http://webaim.org/techniques/captions/. Last Accessed 12 Apr 2017

  10. CaptionSync Homepage. http://www.automaticsync.com/captionsync/. Last Accessed 12 Apr 2017

  11. PlayMedia Homepage. http://www.3playmedia.com/. Last Accessed 12 Apr 2017

  12. YouTube Homepage. https://www.youtube.com/. Last Accessed 12 Apr 2017

  13. GoogleSpeech Homepage. https://cloud.google.com/speech/. Last Accessed 12 Apr 2017

  14. Miró, J.D., Silvestre-Cerdà, J.A., Civera, J., Turró, C., Juan, A.: Efficiency and usability study of innovative computer-aided transcription strategies for video lecture repositories. Speech Commun. 74, 65–75 (2015)

    Article  Google Scholar 

  15. Ranchal, R., Taber-Doughty, T., Guo, Y., Bain, K., Martin, H., Robinson, J.P., Duerstock, B.S.: Using speech recognition for real-time captioning and lecture transcription in the classroom. IEEE Trans. Learn. Technol. 6(4), 299–311 (2013)

    Article  Google Scholar 

  16. Sphinx Homepage. http://cmusphinx.sourceforge.net/. Last Accessed 12 Apr 2017

  17. WhiteHouse Homepage. https://www.whitehouse.gov/. Last Accessed 12 Apr 2017

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marie-Luce Bourguet .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Chao, Y., Bourguet, ML. (2017). What Speech Recognition Accuracy is Needed for Video Transcripts to be a Useful Search Interface?. In: Karpov, A., Potapova, R., Mporas, I. (eds) Speech and Computer. SPECOM 2017. Lecture Notes in Computer Science(), vol 10458. Springer, Cham. https://doi.org/10.1007/978-3-319-66429-3_82

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-66429-3_82

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-66428-6

  • Online ISBN: 978-3-319-66429-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics