SPECOM 2017: Speech and Computer pp 820-828 | Cite as

What Speech Recognition Accuracy is Needed for Video Transcripts to be a Useful Search Interface?

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10458)

Abstract

Informative videos (e.g. recorded lectures) are increasingly being made available online, but they are difficult to use, browse and search. Nowadays, popular platforms let users search and navigate videos via a transcript, which, in order to guarantee a satisfactory level of word accuracy, has typically been generated using some manual inputs. The goal of our work is to try and take a step closer to the fully automatic generation of informative video transcripts based on current automatic speech recognition technology. We present a user study designed to better understand viewers’ use of video transcripts for searching a video content, with the aim of estimating what minimum word recognition accuracy is needed for video captions to be a useful search interface. We found that transcripts with 70% word recognition accuracy are as effective as 100% accuracy transcripts in supporting video search when using single word search. We also found that there are large variations in the time it takes to search a video, independently of the quality of the transcript. With adequate and adapted search strategies, even low accuracy transcripts can support quick video search.

Keywords

Speech recognition Word accuracy Video transcripts Video search 

References

  1. 1.
    TED Homepage. http://www.ted.com/. Last Accessed 12 Apr 2017
  2. 2.
    edX Homepage. http://www.edx.org. Last Accessed 12 Apr 2017
  3. 3.
    Coursera Homepage. http://www.coursera.org/. Last Accessed 12 Apr 2017
  4. 4.
    Breslow, L.B., Pritchard, D.E., DeBoer, J., Stump, G.S., Ho, A.D., Seaton, D.T.: Studying learning in the worldwide classroom: Research into edX’s first MOOC. Res. Pract. Assess. 8, 13–25 (2013)Google Scholar
  5. 5.
    Kim, J., Li, S.W., Cai, C.J., Gajos, K.Z., Miller, R.C.: Leveraging video interaction data and content analysis to improve video learning. In: Proceedings of the CHI 2014, Learning Innovation at Scale workshop, pp. 31–40 (2014)Google Scholar
  6. 6.
    Guo, P.J., Kim, J., Rubin, R.: How video production affects student engagement: an empirical study of MOOC videos. In: Proceedings of the first ACM Learning@scale Conference, pp. 41–50. ACM (2014)Google Scholar
  7. 7.
    Pavel, A., Reed, C., Hartmann, B., Agrawala, M.: Video digests: a browsable, skimmable format for informational lecture videos. In: Proceedings of UIST 2014, 5–8 October, Honolulu, USA (2014)Google Scholar
  8. 8.
    Victor, B.: April 2013. http://worrydream.com/MediaForThinkingTheUnthinkable. Last Accessed 12 Apr 2017
  9. 9.
    WebAim Homepage. http://webaim.org/techniques/captions/. Last Accessed 12 Apr 2017
  10. 10.
    CaptionSync Homepage. http://www.automaticsync.com/captionsync/. Last Accessed 12 Apr 2017
  11. 11.
    PlayMedia Homepage. http://www.3playmedia.com/. Last Accessed 12 Apr 2017
  12. 12.
    YouTube Homepage. https://www.youtube.com/. Last Accessed 12 Apr 2017
  13. 13.
    GoogleSpeech Homepage. https://cloud.google.com/speech/. Last Accessed 12 Apr 2017
  14. 14.
    Miró, J.D., Silvestre-Cerdà, J.A., Civera, J., Turró, C., Juan, A.: Efficiency and usability study of innovative computer-aided transcription strategies for video lecture repositories. Speech Commun. 74, 65–75 (2015)CrossRefGoogle Scholar
  15. 15.
    Ranchal, R., Taber-Doughty, T., Guo, Y., Bain, K., Martin, H., Robinson, J.P., Duerstock, B.S.: Using speech recognition for real-time captioning and lecture transcription in the classroom. IEEE Trans. Learn. Technol. 6(4), 299–311 (2013)CrossRefGoogle Scholar
  16. 16.
    Sphinx Homepage. http://cmusphinx.sourceforge.net/. Last Accessed 12 Apr 2017
  17. 17.
    WhiteHouse Homepage. https://www.whitehouse.gov/. Last Accessed 12 Apr 2017

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Beijing University of Posts and TelecommunicationBeijingChina
  2. 2.Queen Mary University of LondonLondonUK

Personalised recommendations