Skip to main content
Log in

Extension of a Kernel-Based Classifier for Discriminative Spoken Keyword Spotting

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

A keyword spotter is considered as a binary classifier that separates a class of utterances containing a target keyword from utterances without the keyword. These two classes are not inherently linearly separable. Thus, linear classifiers are not completely suitable for such cases. In this paper, we extend a kernel-based classification approach to separate the mentioned two non-linearly separable classes so that the area under the Receiver/Relative Operating Characteristic (ROC) curve (the most common measure for keyword spotter evaluation) is maximized. We evaluated the proposed keyword spotter under different experimental conditions on TIMIT database. The results indicate that, in false alarm per keyword per hour smaller than two, the true detection rate of the proposed kernel-based classification approach is about 15 % greater than that of the linear classifiers exploited in previous researches. Additionally, area under the ROC curve (AUC) of the proposed method is 1 % higher than AUC of the linear classifiers that is significant due to confidence levels 80 and 95 % obtained by t-test and F-test evaluations, respectively. In addition, we evaluated the proposed method in different noisy conditions. The results indicate that the proposed method show a good robustness in noisy conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

References

  1. Rabiner LR, Juang B, Yegnanarayana B (2010) Fundamentals of speech recognition. Pearson, New Delhi

    Google Scholar 

  2. Rabiner LR, Schafer RW (2011) Theory and application of digital speech processing, 1st edn. Prentice Hall, New York

    Google Scholar 

  3. Deller JR, Hansen JHL, Proakis JG (2000) Discrete-time processing of speech signals. IEEE Press, New York

    Google Scholar 

  4. Ghaffari A, Homaeinezhad MR, Daevaeiha MM (2011) High resolution ambulatory holter ECG events detection-delineation via modified multi-lead wavelet-based features analysis: detection and quantification of heart rate turbulence. Expert Syst Appl 38:5299–5310

    Google Scholar 

  5. Wang D, Tejedor J, Frankel J, King S, Colas J (2009) Posterior-based confidence measures for spoken term detection. In: Proceedings of ICASSP, pp 4889–4892

  6. Rose RC, Paul DB (1990) A hidden Markov model based keyword recognition system. In: Proceedings of ICASSP, pp 129–132

  7. Tejedor J, Wang D, Frankel J, King S, Colás J (2008) A comparison of grapheme and phone-based units for Spanish spoken term detection. Speech Commun 50:980–991

    Article  Google Scholar 

  8. Fernandez S, Graves A, Schmidhuber J (2007) An application of recurrent neural networks to discriminative keyword spotting. In: International conference on artificial, neural networks (ICANN), pp 220–229

  9. Li KP, Naylor JA, Rossen ML (1992) A whole word recurrent neural network for keyword spotting. In IEEE international conference on acoustics, speech, and signal processing (ICASSP), pp 81–84

  10. Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines. Cambridge University Press, Cambridge

    Google Scholar 

  11. Vapnik VN (1998) Statistical learning theory. Wiley, New York

    MATH  Google Scholar 

  12. Altun Y, Tsochantaridis I, Hofmann Th (2003) Hidden Markov support vector machines. In: Proceedings of the twentieth international conference on machine learning

  13. Bahl LR, Brown PF, de Souza P, Mercer RL (1989) Maximum mutual information estimation of hidden Markov model parameters for speech recognition. In: IEEE international conference on acoustics, speech, and signal processing (ICASSP), pp 49–52

  14. Juang B, Katagiri S (1992) Discriminative learning for minimum error classification. IEEE Trans Signal Process 40:3043–3054

    Article  MATH  Google Scholar 

  15. Povey D, Woodland P (2002) Minimum phone error and I-smoothing for improved discriminative training. In: International conference on acoustics, speech, and signal processing (ICASSP), pp 105–108

  16. Tabibian Sh, Akbari A, Nasersharif B (2011) An evolutionary based discriminative system for keyword spotting. In: Symposium on artificial intelligence and signal processing (AISP2011), indexed by IEEE, pp 83–88

  17. Keshet J, Bengio S (2009) Automatic speech and speaker recognition. Large margin and kernel methods. Wiley, New York

    Book  Google Scholar 

  18. Keshet J, Grangier D, Bengio S (2009) Discriminative keyword spotting. Speech Commun 51:317–329

    Article  Google Scholar 

  19. Tabibian Sh, Shokri A, Akbari A, Nasersharif B (2010) Performance evaluation for an HMM-based keyword spotter and a Large-margin based one in noisy environments. In: World conference on information technology, procedia computer science, vol 3, pp 1018–1022

  20. Salomon J, King S, Osborne M (2002) Frame wise phone classification using support vector machines. In: Proceedings of the seventh international conference on spoken language processing, pp 2645–2648

  21. Ganapathiraju A, Hamaker J, Picone J (2002) Support vector machines for speech recognition. In: Proceedings of the international conference on spoken language processing

  22. Padrell-Sendra1 J, Martin-Iglesias D, Diaz-de-Maria F (2006) Support vector machines for continuous speech recognition. In: European signal processing conference (EUSIPCO), pp 2–5

  23. Bardideh M, Razzazi F, Ghassemian H (2007) An SVM-based confidence measure for continuous speech recognition. In: IEEE international conference on signal processing and communications (ICSPC), pp 24–27

  24. Benayed Y, Fohr D, Haton JP, Chollet G (2003) Improving the performance of a keyword spotting system by using support vector machines. In: IEEE workshop on automatic speech recognition and understanding (ASRU), pp 145–149

  25. Ben Ayed Y, Fohr D, Haton JP, Chollet G (2002) Keyword spotting using support vector machines. In: Proceedings of the international conference on text, speech and dialogue, pp 285–292

  26. Zhi-yi Q, Yu L, Li-hong Zh, Ming-xin Sh (2006) A speech recognition system based on a hybrid HMM/SVM architecture. In: Proceedings of the first international conference on innovative computing, information and control (ICICIC), pp 100–104

  27. Solera-Urena R, Padrell-Sendra J, Martín-Iglesias D, Gallardo-Antolín A, Peláaez-Moreno C, Díaz-de-María F (2007) SVMs for automatic speech recognition: a survey. Progress in nonlinear speech processing. Springer, New York, pp 190–216

  28. Hejazi SA, Kazemi R, Ghaemmaghami S (2008) Isolated Persian digit recognition using a hybrid HMM-SVM. In: International symposium on intelligent signal processing and communication systems (ISPACS), pp 1–4

  29. Huang H, Zhu J (2006) Kernel-based non-linear feature extraction methods for speech recognition. In: Proceedings of the sixth international conference on intelligent systems design and applications (ISDA), pp 749–754

  30. Zheng WM, Zou CR, Zhao L (2005) An improved algorithm for kernel principle components analysis. Neural Process Lett 22:49–56

    Article  Google Scholar 

  31. Zhang R, Wang W (2011) Learning linear and nonlinear PCA with linear programming. Neural Process Lett 33(2):151–170

    Article  Google Scholar 

  32. Yang J, Frangi AF, Yang JY (2004) A new kernel Fisher discriminant algorithm with application to face recognition. Neurocomputing 56:415–421

    Article  Google Scholar 

  33. Xu Y, Zhang D, Jin Zh, Li M, Yang JY (2006) A fast kernel-based nonlinear discriminant analysis for multi-class problems. Pattern Recognit 39:1026–1033

    Article  MATH  Google Scholar 

  34. Theodoridis S, Koutroumbas K (2009) Pattern recognition, 2nd edn. Elsevier, Amsterdam

    Google Scholar 

  35. Rychetsky M (2001) Algorithms and architectures for machine learning based on regularized neural networks and support vector approaches. Shaker Verlag, Berlin

    Google Scholar 

  36. Suykens JAK, Van Gestel T, De Brabanter J, De Moor B, Vanthienen J (2002) Least squares support vector machines. World Scientific, Singapore

    Book  MATH  Google Scholar 

  37. Salomon J, King S, Osborne M (2002) Frame wise phone classification using support vector machines. In: Proceedings of the seventh international conference on spoken language processing (ICSLP2002-INTERSPEECH)

  38. Keshet J, Shalev-Shwartz S, Bengio S, Singer Y, Chazan D (2006) Discriminative kernel-based phoneme sequence recognition. In: Proceedings of international conference on spoken, language processing (INTERSPEECH)

  39. Dekel O, Keshet J, Singer Y (2004) Online algorithm for hierarchical phoneme classification. In Workshop on machine learning for multimodal interaction, pp 146–158

  40. Perez-Cruz F, Bousquet O (2004) Kernel methods and their potential use in signal processing. IEEE Signal Process Mag 21:57–65

    Article  Google Scholar 

  41. Chang ChCh, Lin ChJ (2009) LIBSVM: a library for support vector machines. http://www.csie.ntu.edu.tw/~cjlin

  42. Chen CP, Blimes J, Kirchhoff K (2002) Low-resource noise-robust feature post-processing on AURORA 2.0. In: Proceedings of ICSLP, pp 2445–2448

  43. Kuo JW, Lo HY, Wang HM (2007) Improved HMM/SVM methods for automatic phoneme segmentation. In: Proceedings of the tenth European conference on speech communication and technology (Interspeech2007-Eurospeech)

  44. Toledano DT, Gómez LAH, Grande LV (2003) Automatic phonetic segmentation. IEEE Trans Speech Audio Process 11:617–625

    Article  Google Scholar 

  45. Toh M, Togneri R, Nordholm S (2005) Spectral entropy as speech features for speech recognition. In: Proceedings of postgraduate electrical engineering and computing symposium (PEECS) , pp 22–25

  46. Peeters G (2004) A large set of audio features for sound description (similarity and classification) in the CUIDADO project. Cuidado Project Report Ircam, pp 1–25

  47. Lin ChY, Rager Jang JSh (2005) Automatic segmentation and labeling for Mandarin Chinese speech corpora for concatenation-based TTS. Comput Linguist Chin Lang Process 10:145–166

    Google Scholar 

  48. Buadat G, Anouar F (2000) Generalized discriminant analysis using a kernel approach. J Neural Comput 12:2385–2404

    Article  Google Scholar 

  49. Zacharie DG, Pinto JP (2007) Keyword spotting on word lattices. Research Report, IDIAP Research Institute

  50. Cortes C, Mohri M (2004) Confidence intervals for the area under the ROC curve. Adv Neural Inf Process Syst 17:305–312

    Google Scholar 

  51. Lori L, Kassel R, Stephanie S (1989) Speech database development: design and analysis of the acoustic-phonetic corpus. In: Proceedings of DARPA speech recognition workshop, vol 2, pp 161–170

  52. Liu Ch (2004) Gabor-based kernel PCA with fractional power polynomial models for face recognition. IEEE Trans Pattern Anal Mach Intell 26:572–581

    Article  Google Scholar 

  53. Rossius R, Zenker G, Ittner A, Dilger W (1998) A short note about the application of polynomial kernel with fractional degree in support vector learning. In: Lecture notes in computer science, pp 143–148

  54. Tamimi H, Zell A (2004) Vision based localization of mobile robots using kernel approaches. In: Proceedings of the international conference on intelligent robots and systems (IROS 2004), pp 1896–1901

  55. Martin AF et al (1997) The DET curve in assessment of detection task performance. In: Proceedings of Eurospeech, vol 4, pp 1899–1903

  56. O’Mahony M (1986) Sensory evaluation of food: statistical methods and procedures. CRC Press, Boca Raton

    Google Scholar 

  57. Lomax RG (2007) Statistical concepts: a second course. Lawrence Erlbaum Associates, Mahwah

Download references

Acknowledgments

We would like to thank Iran Telecommunication Research Center for its support of this project.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shima Tabibian.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tabibian, S., Akbari, A. & Nasersharif, B. Extension of a Kernel-Based Classifier for Discriminative Spoken Keyword Spotting. Neural Process Lett 39, 195–218 (2014). https://doi.org/10.1007/s11063-013-9299-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-013-9299-4

Keywords

Navigation