Abstract
We address the problem of spoken document retrieval (alternately termed content-based audio-search and retrieval), which involves searching a large spoken document or database for a specific spoken query. We formulate the search within the sub-sequence DTW (SS-DTW) framework proposed earlier in literature, adapted here to work on acoustic feature representation of the database and spoken query term. Further, we propose several variants within this framework, such as (i) path-length based score normalization, (ii) clustered quantization of acoustic feature vectors for fast search and retrieval with invariant performances and, (iii) phonetic representation of the database and spoken query term, derived from ground-truth annotation as well as HMM based continuous phoneme recognition. We characterize the performance of the proposed framework, algorithms and variants in terms of ROC curves, EER and time-complexity and present results using the TIMIT database with annotated spoken sentences from 400 speakers.
A. Seshasayee—Author carried out this work as Research Associate at PESIT-BSC, Bangalore.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Divakaran, A.: Multimedia Content Analysis: Theory and Applications. Springer, New York (2009)
Muller, M.: Dynamic time warping. In: Muller, M. (ed.) Information Retrieval for Music and Motion, Chap. 4, pp. 69–84. Springer, Heidelberg (2007)
Fisher, W.M., Doddington, G.R., George, R., Goudie-Marshall, K.M.: The DARPA speech recognition research database: specifications and status. In: Proceedings of DARPA Workshop on Speech Recognition, pp. 93–99 (1986). https://catalog.ldc.upenn.edu/LDC93S1
Rabiner, L.R., Juang, B.H.: Fundamentals of Speech Recognition. Prentice Hall, Upper Saddle River (1993)
Rosenberg, A.E., Bimbot, F., Parthasarathy, S.: Overview of speaker recognition. In: Benesty, J., Sondhi, M.M., Huang, Y. (eds.) Handbook of Speech Processing, Chap. 36, pp. 725–741. Springer, Berlin (2008)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Khatwani, A., Pawar, K., Hegde, S., Rao, S., Seshasayee, A., Ramasubramanian, V. (2015). Spoken Document Retrieval: Sub-sequence DTW Framework and Variants. In: Prasath, R., Vuppala, A., Kathirvalavakumar, T. (eds) Mining Intelligence and Knowledge Exploration. MIKE 2015. Lecture Notes in Computer Science(), vol 9468. Springer, Cham. https://doi.org/10.1007/978-3-319-26832-3_29
Download citation
DOI: https://doi.org/10.1007/978-3-319-26832-3_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26831-6
Online ISBN: 978-3-319-26832-3
eBook Packages: Computer ScienceComputer Science (R0)