Abstract
Hidden Markov Models (HMMs) are, undoubtedly, the most employed core technique for Automatic Speech Recognition (ASR). Nevertheless, we are still far from achieving high-performance ASR systems. Some alternative approaches, most of them based on Artificial Neural Networks (ANNs), were proposed during the late eighties and early nineties. Some of them tackled the ASR problem using predictive ANNs, while others proposed hybrid HMM/ANN systems. However, despite some achievements, nowadays, the preponderance of Markov Models is a fact.
During the last decade, however, a new tool appeared in the field of machine learning that has proved to be able to cope with hard classification problems in several fields of application: the Support Vector Machines (SVMs). The SVMs are effective discriminative classifiers with several outstanding characteristics, namely: their solution is that with maximum margin; they are capable to deal with samples of a very higher dimensionality; and their convergence to the minimum of the associated cost function is guaranteed.
These characteristics have made SVMs very popular and successful. In this chapter we discuss their strengths and weakness in the ASR context and make a review of the current state-of-the-art techniques. We organize the contributions in two parts: isolated-word recognition and continuous speech recognition. Within the first part we review several techniques to produce the fixed-dimension vectors needed for original SVMs. Afterwards we explore more sophisticated techniques based on the use of kernels capable to deal with sequences of different length. Among them is the DTAK kernel, simple and effective, which rescues an old technique of speech recognition: Dynamic Time Warping (DTW). Within the second part, we describe some recent approaches to tackle more complex tasks like connected digit recognition or continuous speech recognition using SVMs. Finally we draw some conclusions and outline several ongoing lines of research.
Keywords
- Support Vector Machine
- Speech Recognition
- Automatic Speech Recognition
- Dynamic Time Warping
- Fisher Kernel
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, access via your institution.
Buying options
Preview
Unable to display preview. Download preview PDF.
References
Sakoe, H., Isotani, R., Yoshida, K., Iso, K., Watanabe, T.: Speaker-Independent Word Recognition using Dynamic Programming Neural Networks. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), Glasgow, Scotland, pp. 439–442 (1989)
Iso, K., Watanabe, T.: Speaker-Independent Word Recognition using a Neural Prediction Model. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), Alburquerque, New Mexico, USA, pp. 441–444 (1990)
Tebelskis, J., Waibel, A., Petek, B., Schmidbauer, O.: Continuous Speech Recognition using Predictive Neural Networks. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, Canada, pp. 61–64 (1991)
Bourlard, H., Morgan, N.: Connectionist speech recognition: a hybrid approach. Kluwer Academic Publishers, Dordrecht (1994)
Schlkopf, B., Smola, A.: Learning with kernels. MIT Press, Cambridge (2002)
Vapnik, V.: Statistical Learning Theory. Wiley, Chichester (1998)
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1995)
Waibel, A., Hanazawa, T., Hinton, G., Shikano, K., Lang, K.: Phoneme recognition using time-delay neural networks. IEEE Transactions on Acoustics, Speech and Signal Processing 37, 328–339 (1989)
Robinson, T., Fallside, F.: A recurrent error propagation network speech recognition system. Computer, Speech and Language 5, 259–274 (1991)
Trentin, E., Gori, M.: A survey of hybrid ann/hmm models for automatic speech recognition. Neurocomputing 37, 91–126 (2001)
Bourlard, H., Morgan, N.: Continuous speech recognition by connectionist statistical methods. IEEE Transactions on Neural Networks 4, 893–909 (1993)
Robinson, T., Hochberg, M., Renals, S.: The Use of Recurrent Neural Networks in Continuous Speech Recognition. In: Automatic Speech and Speaker Recognition - Advanced Topics, pp. 159–184. Kluwer Academic Publishers, Norwell (1995)
Reichl, W., Ruske, G.: A hybrid rbf-hmm system for continuous speech recognition. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), Detroit, MI, USA, pp. 3335–3338 (1995)
Ellis, D., Singh, R., Sivadas, S.: Tandem-acoustic modeling in large-vocabulary recognition. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), Salt Lake City, Utah, USA, pp. 517–520 (2001)
Boser, B.E., Guyon, I., Vapnik, V.: A training algorithm for optimal margin classifiers. In: Computational Learning Theory, pp. 144–152 (1992)
Pérez-Cruz, F., Bousquet, O.: Kernel Methods and Their Potential Use in Signal Processing. IEEE Signal Processing Magazine 21(3), 57–65 (2004)
Fletcher, R.: Practical Methods of Optimization. Wiley-Interscience, New York (1987)
Navia-Vázquez, A., Pérez-Cruz, F., Artés-Rodríguez, A., Figueiras-Vidal, A.R.: Weighted Least Squares Training of Support Vector Classifiers leading to Compact and Adaptive Schemes. IEEE Transactions on Neural Networks 12(5), 1047–1059 (2001)
Fine, S., Navratil, J., Gopinath, R.A.: A hybrid gmm/svm approach to speaker identification. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, Salt Lake City, Utah, USA, pp. 417–420 (2001)
Le, Q., Bengio, S.: Client Dependent GMM-SVM Models for Speaker Verification. In: Kaynak, O., Alpaydın, E., Oja, E., Xu, L. (eds.) ICANN 2003 and ICONIP 2003. LNCS, vol. 2714, pp. 443–451. Springer, Heidelberg (2003)
Ma, C., Randolph, M.A., Drish, J.: A support vector machines-based rejection technique for speech recognition. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, Salt Lake City, Utah, USA, pp. 381–384 (2001)
Hsu, C.W., Lin, C.J.: A Comparison of Methods for Multi-class Support Vector Machines. IEEE Transactions on Neural Networks 13(2), 415–425 (2002)
Ganapathiraju, A., Hamaker, J.E., Picone, J.: Applications of support vector machines to speech recognition. IEEE Transactions on Signal Processing 52, 2348–2355 (2004)
Thubthong, N., Kijsirikul, B.: Support vector machines for thai phoneme recognition. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 9, 803–813 (2001)
Clarkson, P., Moreno, P.J.: On the use of support vector machines for phonetic classification. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 2, Phoenix, Arizona, USA, pp. 585–588 (1999)
Sekhar, C., Lee, W.F., Takeda, K., Itakura, F.: Acoustic modelling of subword units using support vector machines. In: Workshop on spoken language processing, Mumbai, India (2003)
Young, S.: HTK-Hidden Markov Model Toolkit (ver 2.1). Cambridge University Press, Cambridge (1995)
García-Cabellos, J.M., Peláez-Moreno, C., Gallardo-Antolín, A., Pérez-Cruz, F., Díaz-de-María, F.: SVM Classifiers for ASR: A Discusion about Parameterization. In: Proceedings of EUSIPCO 2004, Wien, Austria, pp. 2067–2070 (2004)
Ech-Cherif, A., Kohili, M., Benyettou, A., Benyettou, M.: Lagrangian support vector machines for phoneme classification. In: Proceedings of the 9th International Conference on Neural Information Processing (ICONIP ’02), vol. 5, Singapore, pp. 2507–2511 (2002)
Martín-Iglesias, D., Bernal-Chaves, J., Peláez-Moreno, C., Gallardo-Antolín, A., Díaz-de-María, F.: A Speech Recognizer Based on Multiclass SVMs with HMM-Guided Segmentation. In: Faundez-Zanuy, M., Janer, L., Esposito, A., Satue-Villar, A., Roure, J., Espinosa-Duro, V. (eds.) NOLISP 2005. LNCS (LNAI), vol. 3817, pp. 257–266. Springer, Heidelberg (2006)
Solera-Ureña, R., Martín-Iglesias, D., Gallardo-Antolín, A., Peláez-Moreno, C., Díaz-de-María, F.: Robust ASR using Support Vector Machines. Speech Communication, Elsevier, submitted (2006)
Gangashetty, S.V., Sekhar, C., Yegnanarayana, B.: Combining evidence from multiple classifiers for recognition of consonant-vowel units of speech in multiple languages. In: Proceedings of the International Conference on Intelligent Sensing and Information Processing, Chennai, India, pp. 387–391 (2005)
Shimodaira, H., Noma, K.I., Nakai, M., Sagayama, S.: Support vector machine with dynamic time-alignment kernel for speech recognition. In: Proceedings of Eurospeech, Aalborg, Denmark, pp. 1841–1844 (2001)
Shimodaira, H., Noma, K., Nakai, M.: Dynamic Time-Alignment Kernel in Support Vector Machine. In: Advances in Neural Information Processing Systems 14, vol. 2, pp. 921–928. MIT Press, Cambridge (2002)
Rabiner, L.R., Rosenberg, A.E., Levinson, S.E.: Considerations in Dynamic Time Warping Algorithms for Discrete Word Recognition. IEEE Transactions on Acoustics, Speech and Signal Processing 26(6), 575–582 (1978)
Glass, J.R.: A probabilistic framework for segment-based speech recognition. Computer Speech and Language 17, 137–152 (2003)
Jaakkola, T., Haussler, D.: Exploiting generative models in discriminative classifiers. Technical report, Dept. of Computer Science, Univ. of California (1998)
Smith, N.D., Gales, M.J.F.: Using SVMs and discriminative models for speech recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, Orlando, Florida, USA, pp. 77–80 (2002)
Smith, N.D., Gales, M.J.F.: Speech recognition using SVMs. In: Advances in Neural Information Processing Systems 14, pp. 1197–1204. MIT Press, Cambridge (2002)
Smith, N.D., Niranjan, M.: Data-dependent Kernels in SVM Classification of Speech Patterns. In: Proceedings of the International Conference on Spoken Language Processing (ICSLP), vol. 1, Beijing, China, pp. 297–300 (2000)
Wan, V., Renals, S.: Speaker verification using sequence discriminant support vector machines. IEEE Transactions on Speech and Audio Processing 13, 203–210 (2005)
Ganapathiraju, A., Hamaker, J., Picone, J.: Hybrid SVM/HMM Architectures for Speech Recognition. In: Proceedings of the 2000 Speech Transcription Workshop, vol. 4, Maryland, USA, May 2000, pp. 504–507 (2000)
Padrell-Sendra, J., Martín-Iglesias, D., Díaz-de-María, F.: Support vector machines for continuous speech recognition. In: Proceedings of the 14th European Signal Processing Conference, Florence, Italy (2006)
Young, S.J., Russell, N.H., Thornton, J.H.S.: Token Passing: a Conceptual Model for Connected Speech Recognition Systems. Technical report, CUED Cambridge University (1989)
Cosi, P.: Hybrid HMM-NN architectures for connected digit recognition. In: Proceedings of the International Joint Conference on Neural Networks, vol. 5, pp. 85–90 (2000)
Juneja, A., Espy-Wilson, C.: Segmentation of continuous speech using acoustic-phonetic parameters and statistical learning. In: Proceedings of the 9th International Conference on Neural Information Processing (ICONIP ’02), vol. 2, pp. 726–730 (2002)
Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines (2004)
Platt, J.C.: Fast Training of Support Vector Machines Using Sequential Minimal Optimization. In: Advances in Kernel Methods: Support Vector Learning, pp. 185–208. MIT Press, Cambridge (1999)
Platt, J.C.: Probabilities for SV Machines. In: Advances in Large Margin Classifiers, pp. 61–74. MIT Press, Cambridge (1999)
Wu, T.F., Lin, C.J., Weng, R.C.: Probability estimates for multi-class classification by pairwise coupling. The Journal of Machine Learning Research 5, 975–1005 (2004)
Burges, C.J.C.: Simplified support vector decision rules. In: Proceedings of the Thirteenth International Conference on Machine Learning, Bari, Italy, pp. 71–77 (1996)
Osuna, E., Freund, R., Girosi, F.: An improved training algorithm for support vector machines. In: IEEE Workshop on Neural Networks for Signal Processing, Amelia Island, Florida, USA, pp. 276–285 (1997)
Gutiérrez, D., Parrado, E., Navia, A.: Mega-GSVC: Training SVMs with Millions of Data. In: Proceedings of the Learning’04 International Conference (2004)
Parrado, E., Arenas, J., Mora, I., Figueiras, A., Navia, A.: Growing Support Vector Classifiers with Controlled Complexity. Pattern Recognition 36, 1479–1488 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this chapter
Cite this chapter
Solera-Ureña, R., Padrell-Sendra, J., Martín-Iglesias, D., Gallardo-Antolín, A., Peláez-Moreno, C., Díaz-de-María, F. (2007). SVMs for Automatic Speech Recognition: A Survey. In: Stylianou, Y., Faundez-Zanuy, M., Esposito, A. (eds) Progress in Nonlinear Speech Processing. Lecture Notes in Computer Science, vol 4391. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71505-4_11
Download citation
DOI: https://doi.org/10.1007/978-3-540-71505-4_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71503-0
Online ISBN: 978-3-540-71505-4
eBook Packages: Computer ScienceComputer Science (R0)