Abstract
Spoken keywords detection is essential to organize efficiently lots of hours of audio contents such as meetings, radio news, etc. These systems are developed with the purpose of indexing large audio databases or of detecting keywords in continuous speech streams. This paper addresses a new approach to spoken keyword detection using Autoassociative Neural Networks (AANN). The proposed work concerns the use of the distribution capturing ability of the Autoassociative neural network (AANN) for spoken keyword detection. It involves sliding a frame-based keyword template along the speech signal and using confidence score obtained from the normalized squared error of AANN to efficiently search for a match. This work formulates a new spoken keyword detection algorithm. The experimental results show that the proposed approach competes with the keyword detection methods reported in the literature and it is an alternative method to the existing key word detection methods.
Similar content being viewed by others
References
Young, S. J., Evermann, G., Gales, M. J. F., et al. (2006). The HTK book, version 3.4.
Bianchini, M., Frasconi, P., & Gori, M. (1995). Learning in multilayered networks used as autoassociators. IEEE Transactions on Neural Networks, 6, 512–515.
Bourlard, H., & Kamp, Y. (1988). Auto association by multi layer perceptrons and singular value decomposition. Biological Cybernetics, 59, 291–294.
Bridle, J. S. (1973). An efficient elastic-template method for detecting given words in running speech. In Proc. of the Brit. Acoust. Soc. meeting.
Davis, S. B., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28, 357–366.
Haykin, S. (1999). Neural networks: a comprehensive foundation. New Jersey: Prentice-Hall.
Hofstetter, E. M., & Rose, R. C. (1992). Techniques for task independent word spotting in continuous speech messages. In Proc. of ICASSP.
James, D. A., & Young, S. J. (1994). A fast lattice-based approach to vocabulary independent wordspotting. In Proc. of ICASSP.
Jansen, A., & Niyogi, P. (2009). Point process models for spotting keywords in continuous speech. IEEE Transactions on Audio, Speech, and Language Processing, 17(8), 1457–1470.
Jothilakshmi, S., Ramalingam, V., & Palanivel, S. (2009). Speaker diarization using autoassociative neural networks. Engineering Applications of Artificial Intelligence, 22, 667–675.
Jothilakshmi, S., Ramalingam, V., & Palanivel, S. (2010). Unsupervised speaker segmentation using autoassociative neural networks. International Journal of Computer Applications, 1(7), 24–30.
Junkawitsch, J., Neubauer, L., Höge, H., & Ruske, G. (1996). A new keyword spotting algorithm with pre-calculated optimal thresholds. In Proc. of ICSLP.
Kishore, S. P.: (2000). Speaker verification using autoassociative neural networks model. M. S. thesis. Department of Computer Science and Eng., Indian Institute of Technology Madras.
Kramer, M. A. (1991). Non linear principal component analysis using auto associative neural networks. AIChE Journal, 37, 233–243.
Ma, C., & Lee, C. H. (2007). A study on word detector design and knowledge based pruning and rescoring. In Proc. of Interspeech.
Palanivel, S. (2004). Person authentication using speech, face and visual speech. Ph.D. thesis, Department of Computer Science and Eng., Indian Institute of Technology, Madras.
Silaghi, M. C., & Bourlard, H. (2000). Iterative posterior-based keyword spotting without filler models. In Proc. of ICASSP.
Tejedor, J., Wang, D., Frankel, J., King, S., & Colas, J. (2008). A comparison of grapheme and phoneme-based units for Spanish spoken term detection. Speech Communication, 50, 980–991.
Thambiratnam, K., & Sridharan, S. (2005). Dynamic match phone-lattice searches for very fast and unrestricted vocabulary kws. In Proc. of ICASSP.
Weintraub, M. (1995). Lvscr log-likelihood scoring for keyword spotting. In Proc. of ICASSP.
Wilpon, J. G., Rabiner, L. R., Lee, C. H., & Goldman, E. R. (1989). Application of hidden Markov models for recognition of a limited set of words in unconstrained speech. In Proc. of ICASSP.
Wilpon, J. G., Rabiner, L. R., Lee, C. H., & Goldman, E. R. (1990). Automatic recognition of keywords in unconstrained speech using hidden Markov models. IEEE Transactions on Acoustics, Speech, and Signal Processing, 38(11), 1870–1878.
Yegnanarayana, B. (1999). Artificial neural networks. New Delhi: Prentice-Hall.
Yegnanarayana, B., & Kishore, S. P. (2002). AANN: an alternative to GMM for pattern recognition. IEEE Transactions on Neural Networks, 15, 459–469.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Jothilakshmi, S. Spoken keyword detection using autoassociative neural networks. Int J Speech Technol 17, 83–89 (2014). https://doi.org/10.1007/s10772-013-9208-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-013-9208-2