On the Use of Automatic Speech Recognition for Spoken Information Retrieval from Video Databases

Salgado-Garza, Luis R.; Nolazco-Flores, Juan A.

doi:10.1007/978-3-540-30463-0_47

Luis R. Salgado-Garza¹⁹ &
Juan A. Nolazco-Flores¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3287))

Included in the following conference series:

Iberoamerican Congress on Pattern Recognition

1148 Accesses

Abstract

This document describes the realization of a spoken information retrieval system and its application to words search in an indexed video database. The system uses an automatic speech recognition (ASR) software to convert the audio signal of a video file into a transcript file and then a document indexing tool to index this transcripted file. Then, a spoken query, uttered by any user, is presented to the ASR to decode the audio signal and propose a hypothesis that is later used to formulate a query to the indexed database. The final outcome of the system is a list of video frame tags containing the audio correspondent to the spoken query. The speech recognition system achieved less than 15% Word Error Rate (WER) and its combined operation with the document indexing system showed outstanding performance with spoken queries.

Download to read the full chapter text

Chapter PDF

Indexing and Retrieval of Speech Documents

What Speech Recognition Accuracy is Needed for Video Transcripts to be a Useful Search Interface?

Spoken Document Retrieval: Sub-sequence DTW Framework and Variants

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Chen, B., Wang, H.M., Lee, L.S.: Retrieval of Broadcast News Speech in Mandarin Chinese Collected in Taiwan using Syllable-Level Statistical Characteristics. In: Proceedings of ICASSP (2000)
Google Scholar
Witten, I.H., Moffat, A., Bell, T.C.: Managing gigabytes: compressing and indexing documents and images. Van Nostrand Reinhold, New York (1994)
MATH Google Scholar
Miller, D.R.H., Leek, T., Schwartz, R.M.: A hidden Markov model information retrieval system. In: Proceedings of the 22nd ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1999), pp. 214–221 (1999)
Google Scholar
Witten, I.H., Don, K.J., Dewsnip, M., Tablan, V.: Text Mining in a digital library. Journal of Digital Libraries (2003) (in Press)
Google Scholar
Wolf, P.P., Raj, B.: The MERL SpokenQuery Information Retrieval System: A System for Retrieving Pertinent Documents from a Spoken Query. In: IEEE International Conference on Multimedia and Expo (ICME), vol. 2, pp. 317–320 (2002)
Google Scholar
Spärck Jones, K., Jones, G.J.F., Foote, J.T., Young, S.J.: Experiments in spoken document retrieval. Inf. Processing and Management 32(4), 399–417 (1996)
Article Google Scholar
Deller, J.R., Proakis, J.G., Hansen, J.H.L.: Discrete-Time Processing of Speech Signals. Prentice Hall, Englewood Cliffs, Sec. 6.2 (1993)
Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelehood for incomplete data via the EM algorithm. J. Roy. Stat. Soc. 39(1), 1–38 (1977)
MATH MathSciNet Google Scholar
Clarkson, P., Rosenfeld, R.: Statistical Language Modelling using the CMUCambridge Toolkit. In: Proceedings of Eurospeech, Rodhes, Greece, pp. 2707–2710 (1997)
Google Scholar
Salgado-Garza, L.R., Stern, R.M., Nolazco-Flores, J.A.: N-Best List Rescoring Using Syntactic Trigrams. In: Monroy, R., Arroyo-Figueroa, G., Sucar, L.E., Sossa, H. (eds.) MICAI 2004. LNCS (LNAI), vol. 2972, pp. 79–88. Springer, Heidelberg (2004)
Chapter Google Scholar
Seigler, M., Jain, U., Raj, B., Stern, R.: Automatic segmentation, classification, and clustering of Broadcast news audio. In: Proc. Of the DARPA speech recognition workshop (February 1997)
Google Scholar
Hsin-Min, W., Berlin, C.: Content-based Language Models for Spoken Document Retrieval. International Journal of Computer Processing of Oriental Languages (IJCPOL) 14(2) (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, ITESM, Campus Monterrey, Av. Eugenio Garza Sada 2501 Sur, Col. Tecnológico, Monterrey, N.L., C.P. 64849, México
Luis R. Salgado-Garza & Juan A. Nolazco-Flores

Authors

Luis R. Salgado-Garza
View author publications
You can also search for this author in PubMed Google Scholar
Juan A. Nolazco-Flores
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. System Engineering and Automation, Universitat Politècnica de Catalunya (UPC), Barcelona, Spain
Alberto Sanfeliu
Computer Science Department, National Institute of Astrophysics, Optics and Electronics (INAOE), Luis Enrique Erro No. 1, 72840, Sta. Maria Tonantzintla, Puebla, Mexico
José Francisco Martínez Trinidad
Computer Science Department, National Institute of Astrophysics, Optics and Electronics, (INAOE), Luis Enrique Erro No.1, 72840, Sta. Maria Tonantzintla, Puebla, Mexico
Jesús Ariel Carrasco Ochoa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Salgado-Garza, L.R., Nolazco-Flores, J.A. (2004). On the Use of Automatic Speech Recognition for Spoken Information Retrieval from Video Databases. In: Sanfeliu, A., Martínez Trinidad, J.F., Carrasco Ochoa, J.A. (eds) Progress in Pattern Recognition, Image Analysis and Applications. CIARP 2004. Lecture Notes in Computer Science, vol 3287. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30463-0_47

Download citation

DOI: https://doi.org/10.1007/978-3-540-30463-0_47
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23527-9
Online ISBN: 978-3-540-30463-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

On the Use of Automatic Speech Recognition for Spoken Information Retrieval from Video Databases

Abstract

Chapter PDF

Similar content being viewed by others

Indexing and Retrieval of Speech Documents

What Speech Recognition Accuracy is Needed for Video Transcripts to be a Useful Search Interface?

Spoken Document Retrieval: Sub-sequence DTW Framework and Variants

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

On the Use of Automatic Speech Recognition for Spoken Information Retrieval from Video Databases

Abstract

Chapter PDF

Similar content being viewed by others

Indexing and Retrieval of Speech Documents

What Speech Recognition Accuracy is Needed for Video Transcripts to be a Useful Search Interface?

Spoken Document Retrieval: Sub-sequence DTW Framework and Variants

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation