Abstract
This document describes the realization of a spoken information retrieval system and its application to words search in an indexed video database. The system uses an automatic speech recognition (ASR) software to convert the audio signal of a video file into a transcript file and then a document indexing tool to index this transcripted file. Then, a spoken query, uttered by any user, is presented to the ASR to decode the audio signal and propose a hypothesis that is later used to formulate a query to the indexed database. The final outcome of the system is a list of video frame tags containing the audio correspondent to the spoken query. The speech recognition system achieved less than 15% Word Error Rate (WER) and its combined operation with the document indexing system showed outstanding performance with spoken queries.
Chapter PDF
Similar content being viewed by others
Keywords
- Speech Recognition
- Language Model
- Audio Signal
- Automatic Speech Recognition
- Information Retrieval System
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Chen, B., Wang, H.M., Lee, L.S.: Retrieval of Broadcast News Speech in Mandarin Chinese Collected in Taiwan using Syllable-Level Statistical Characteristics. In: Proceedings of ICASSP (2000)
Witten, I.H., Moffat, A., Bell, T.C.: Managing gigabytes: compressing and indexing documents and images. Van Nostrand Reinhold, New York (1994)
Miller, D.R.H., Leek, T., Schwartz, R.M.: A hidden Markov model information retrieval system. In: Proceedings of the 22nd ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1999), pp. 214–221 (1999)
Witten, I.H., Don, K.J., Dewsnip, M., Tablan, V.: Text Mining in a digital library. Journal of Digital Libraries (2003) (in Press)
Wolf, P.P., Raj, B.: The MERL SpokenQuery Information Retrieval System: A System for Retrieving Pertinent Documents from a Spoken Query. In: IEEE International Conference on Multimedia and Expo (ICME), vol. 2, pp. 317–320 (2002)
Spärck Jones, K., Jones, G.J.F., Foote, J.T., Young, S.J.: Experiments in spoken document retrieval. Inf. Processing and Management 32(4), 399–417 (1996)
Deller, J.R., Proakis, J.G., Hansen, J.H.L.: Discrete-Time Processing of Speech Signals. Prentice Hall, Englewood Cliffs, Sec. 6.2 (1993)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelehood for incomplete data via the EM algorithm. J. Roy. Stat. Soc. 39(1), 1–38 (1977)
Clarkson, P., Rosenfeld, R.: Statistical Language Modelling using the CMUCambridge Toolkit. In: Proceedings of Eurospeech, Rodhes, Greece, pp. 2707–2710 (1997)
Salgado-Garza, L.R., Stern, R.M., Nolazco-Flores, J.A.: N-Best List Rescoring Using Syntactic Trigrams. In: Monroy, R., Arroyo-Figueroa, G., Sucar, L.E., Sossa, H. (eds.) MICAI 2004. LNCS (LNAI), vol. 2972, pp. 79–88. Springer, Heidelberg (2004)
Seigler, M., Jain, U., Raj, B., Stern, R.: Automatic segmentation, classification, and clustering of Broadcast news audio. In: Proc. Of the DARPA speech recognition workshop (February 1997)
Hsin-Min, W., Berlin, C.: Content-based Language Models for Spoken Document Retrieval. International Journal of Computer Processing of Oriental Languages (IJCPOL) 14(2) (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Salgado-Garza, L.R., Nolazco-Flores, J.A. (2004). On the Use of Automatic Speech Recognition for Spoken Information Retrieval from Video Databases. In: Sanfeliu, A., Martínez Trinidad, J.F., Carrasco Ochoa, J.A. (eds) Progress in Pattern Recognition, Image Analysis and Applications. CIARP 2004. Lecture Notes in Computer Science, vol 3287. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30463-0_47
Download citation
DOI: https://doi.org/10.1007/978-3-540-30463-0_47
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23527-9
Online ISBN: 978-3-540-30463-0
eBook Packages: Springer Book Archive