On the Use of Automatic Speech Recognition for Spoken Information Retrieval from Video Databases

  • Luis R. Salgado-Garza
  • Juan A. Nolazco-Flores
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3287)

Abstract

This document describes the realization of a spoken information retrieval system and its application to words search in an indexed video database. The system uses an automatic speech recognition (ASR) software to convert the audio signal of a video file into a transcript file and then a document indexing tool to index this transcripted file. Then, a spoken query, uttered by any user, is presented to the ASR to decode the audio signal and propose a hypothesis that is later used to formulate a query to the indexed database. The final outcome of the system is a list of video frame tags containing the audio correspondent to the spoken query. The speech recognition system achieved less than 15% Word Error Rate (WER) and its combined operation with the document indexing system showed outstanding performance with spoken queries.

Keywords

Speech Recognition Language Model Audio Signal Automatic Speech Recognition Information Retrieval System 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Chen, B., Wang, H.M., Lee, L.S.: Retrieval of Broadcast News Speech in Mandarin Chinese Collected in Taiwan using Syllable-Level Statistical Characteristics. In: Proceedings of ICASSP (2000)Google Scholar
  2. 2.
    Witten, I.H., Moffat, A., Bell, T.C.: Managing gigabytes: compressing and indexing documents and images. Van Nostrand Reinhold, New York (1994)MATHGoogle Scholar
  3. 3.
    Miller, D.R.H., Leek, T., Schwartz, R.M.: A hidden Markov model information retrieval system. In: Proceedings of the 22nd ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1999), pp. 214–221 (1999)Google Scholar
  4. 4.
    Witten, I.H., Don, K.J., Dewsnip, M., Tablan, V.: Text Mining in a digital library. Journal of Digital Libraries (2003) (in Press)Google Scholar
  5. 5.
    Wolf, P.P., Raj, B.: The MERL SpokenQuery Information Retrieval System: A System for Retrieving Pertinent Documents from a Spoken Query. In: IEEE International Conference on Multimedia and Expo (ICME), vol. 2, pp. 317–320 (2002)Google Scholar
  6. 6.
    Spärck Jones, K., Jones, G.J.F., Foote, J.T., Young, S.J.: Experiments in spoken document retrieval. Inf. Processing and Management 32(4), 399–417 (1996)CrossRefGoogle Scholar
  7. 7.
    Deller, J.R., Proakis, J.G., Hansen, J.H.L.: Discrete-Time Processing of Speech Signals. Prentice Hall, Englewood Cliffs, Sec. 6.2 (1993)Google Scholar
  8. 8.
    Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelehood for incomplete data via the EM algorithm. J. Roy. Stat. Soc. 39(1), 1–38 (1977)MATHMathSciNetGoogle Scholar
  9. 9.
    Clarkson, P., Rosenfeld, R.: Statistical Language Modelling using the CMUCambridge Toolkit. In: Proceedings of Eurospeech, Rodhes, Greece, pp. 2707–2710 (1997)Google Scholar
  10. 10.
    Salgado-Garza, L.R., Stern, R.M., Nolazco-Flores, J.A.: N-Best List Rescoring Using Syntactic Trigrams. In: Monroy, R., Arroyo-Figueroa, G., Sucar, L.E., Sossa, H. (eds.) MICAI 2004. LNCS (LNAI), vol. 2972, pp. 79–88. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  11. 11.
    Seigler, M., Jain, U., Raj, B., Stern, R.: Automatic segmentation, classification, and clustering of Broadcast news audio. In: Proc. Of the DARPA speech recognition workshop (February 1997)Google Scholar
  12. 12.
    Hsin-Min, W., Berlin, C.: Content-based Language Models for Spoken Document Retrieval. International Journal of Computer Processing of Oriental Languages (IJCPOL) 14(2) (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Luis R. Salgado-Garza
    • 1
  • Juan A. Nolazco-Flores
    • 1
  1. 1.Computer Science DepartmentITESM, Campus MonterreyMonterreyMéxico

Personalised recommendations