A new hybrid framework based on Hidden Markov models and K-nearest neighbors for speech recognition

Hazmoune, Samira; Bougamouza, Fateh; Mazouzi, Smaine; Benmohammed, Mohamed

doi:10.1007/s10772-018-9535-4

A new hybrid framework based on Hidden Markov models and K-nearest neighbors for speech recognition

Published: 16 July 2018

Volume 21, pages 689–704, (2018)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Samira Hazmoune¹,
Fateh Bougamouza¹,
Smaine Mazouzi¹ &
…
Mohamed Benmohammed²

317 Accesses
5 Citations
Explore all metrics

Abstract

Hidden Markov models (HMM) have proved their success in several research areas, especially in speech recognition field. However, the major drawback of HMM classifier, is its sensitiveness to some initial parameters such as the number of states, which need to be tuned carefully. In fact, it is well known that the number of states suitable for a certain utterance may not perform as well for other utterances of the same class. To deal with this problem, and in order to take into consideration some levels of data variability, we investigate a new hybrid framework for speech recognition, in which we integrate the HMM classifier within the K-nearest neighbors (KNN) architecture. In this framework, we propose to build several HMMs differing in their numbers of states to represent each class of data, and to use KNN rule to decide the K nearest models and the most represented class using Viterbi likelihood as similarity measurement. In order to remove ambiguity during the decision step, we propose two different methods. The proposed framework is evaluated using the UCI Spoken Arabic Digit dataset. The obtained results show the effectiveness of our approach either when compared to HMM and KNN baseline or when compared to previous works on the same dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sequence-Discriminative Training of Neural Networks

The Influence the Training Set Size Has on the Performance of a Digit Speech Recognition System in Macedonian

Optimal parameters selected for automatic recognition of spoken Amazigh digits and letters using Hidden Markov Model Toolkit

Article 23 October 2020

References

Abdel-Hamid, O., Mohamed, A. R., Jiang, H., & Penn, G. (2012). Applying convolutional neural networks concepts to hybrid NN-HMM model for speech recognition. In the IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 4277–4280). Kyoto: IEEE
AlKhateeb, J. H., Khelifi, F., Jiang, J., & Ipson, S. S. (2009). A new approach for off-line handwritten Arabic word recognition using KNN classifier. In IEEE international conference on signal and image processing applications (ICSIPA) (pp. 191–194). Kuala Lumpur: IEEE.
Al-Qatab, B. A., & Ainon, R. N. (2010). Arabic speech recognition using hidden Markov model toolkit (HTK). In International symposium in information technology (ITSim) (Vol. 2, pp. 557–562). Kuala Lumpur: IEEE.
Biem, A. (2003). A model selection criterion for classification: Application to hmm topology optimization. In Seventh international conference on document analysis and recognition (pp. 104–108). Edinburgh: IEEE.
Bougamouza, F., Hazmoune, S., & Benmohammed, M. (2016). Using Mel Frequency Cepstral Coefficient method for online Arabic characters handwriting recognition. In 5th international conference on multimedia computing and systems (ICMCS) (pp. 87–92). Rabat: IEEE.
Cavalin, P. R., Sabourin, R., & Suen, C. Y. (2012). LoGID: An adaptive framework combining local and global incremental learning for dynamic selection of ensembles of HMMs. Pattern Recognition, 45(9), 3544–3556.
Article Google Scholar
Chebotar, Y., & Waters, A. (2016). Distilling knowledge from ensembles of neural networks for speech recognition. In Interspeech, San Francisco (pp. 3439–3443).
Clarkson, P., & Moreno, P. J. (1999). On the use of support vector machines for phonetic classification. In IEEE international conference on acoustics, speech, and signal processing (Vol. 2, pp. 585–588). Phoenix: IEEE.
Cohen, I., Sebe, N., Garg, A., Chen, L. S., & Huang, T. S. (2003). Facial expression recognition from video sequences: temporal and static modeling. Computer Vision and Image Understanding, 91(1), 160–187.
Article Google Scholar
Deselaers, T., Heigold, G., & Ney, H. (2007). Speech recognition with state-based nearest neighbour classifiers. In Interspeech-2007, Antwerp (pp. 2093–2096.
Dhanashri, D., & Dhonde, S. B. (2017). Isolated word speech recognition system using deep neural networks. In Proceedings of the international conference on data engineering and communication technology (pp. 9–17). Singapore: Springer.
Ding, J., & Chang, C. W. (2016). An adaptive hidden Markov model-based gesture recognition approach using Kinect to simplify large-scale video data processing for humanoid robot imitation. Multimedia Tools and Applications, 75(23), 15537–15551.
Article Google Scholar
En-Naimani, Z., Lazaar, M., & Ettaouil, M. (2014). Hybrid system of optimal self organizing maps and hidden Markov model for Arabic digits recognition. WSEAS Transactions on Systems, 13(60), 606–616.
Google Scholar
Fix, E., & Hodges, J. L. (1951). Discriminatory analysis, nonparametric discrimination: Consistency properties, Technical Report 4, USAF School of Aviation Medicine, Randolph Field, Texas.
Ganapathiraju, A., Hamaker, J., & Picone, J. (2000). Hybrid SVM/HMM architectures for speech recognition. In Sixth international conference on spoken language processing, Beijing (Vol. 4, pp. 504–507).
Ganapathiraju, A., Hamaker, J. E., & Picone, J. (2004). Applications of support vector machines to speech recognition. IEEE Transactions on Signal Processing, 52(8), 2348–2355.
Article Google Scholar
Geiger, J., Schenk, J., Wallhoff, F., & Rigoll, G. (2010). Optimizing the number of states for HMM-based on-line handwritten whiteboard recognition. In International conference on frontiers in handwriting recognition (ICFHR) (pp. 107–112). Kolkata: IEEE.
Gunter, S., & Bunke, H. (2003). Optimizing the number of states, training iterations and Gaussians in an HMM-based handwritten word recognizer. In Seventh international conference on document analysis and recognition, (pp. 472–476). Edinburgh: IEEE.
Hai, N. T., Van Thuyen, N., Mai, T. T., & Van Toi, V. (2015). MFCC-DTW algorithm for speech recognition in an intelligent wheelchair. In 5th international conference on biomedical engineering in Vietnam (pp. 417–421). Cham: Springer.
Hammami, N., & Bedda, M. (2010). Improved tree model for Arabic speech recognition. In 3rd IEEE international conference on computer science and information technology (ICCSIT) (Vol. 5, pp. 521–526). IEEE.
Hammami, N., Bedda, M., & Farah, N. (2011). HMM parameters estimation based on cross-validation for Spoken Arabic Digits recognition. In International conference on communications, computing and control applications (CCCA) (pp. 1–4). Hammamet: IEEE.
Hammami, N., Bedda, M., Farah, N., & Lakehal-Ayat, R. O. (2013). Spoken Arabic Digits recognition based on (GMM) for e-Quran voice browsing: Application for blind category. In Taibah University international conference on advances in information technology for the Holy Quran and its sciences (32519), (pp. 123–127). Medina: IEEE.
Hammami, N., Bedda, M., & Nadir, F. (2012). The second-order derivatives of MFCC for improving spoken Arabic digits recognition using tree distributions approximation model and HMMs. In International conference on communications and information technology (ICCIT), (pp. 1–5). IEEE.
Hazmoune, S., Bougamouza, F., Mazouzi, S., & Benmohammed, M. (2013a). A novel speech recognition approach based on multiple modeling by hidden Markov models. In International Conference on Computer Applications Technology (ICCAT), 2013 (pp. 1–6). Sousse: IEEE.
Hazmoune, S., Bougamouza, F., Mazouzi, S., & Benmohammed, M. (2013b). Contributions to HMM-based speech recognition systems. International Journal of Computational Linguistics Research, 4(1), 38–47.
Google Scholar
Jiang, Z., Ding, X., Peng, L., & Liu, C. (2012). Analyzing the information entropy of states to optimize the number of states in an HMM-based off-line handwritten Arabic word recognizer. In 21st International Conference on Pattern Recognition (ICPR) (pp. 697–700). Istanbul: IEEE.
Khelifa, M. O., Elhadj, Y. M., Abdellah, Y., & Belkasmi, M. (2017). Constructing accurate and robust HMM/GMM models for an Arabic speech recognition system. International Journal of Speech Technology, 20(4), 937–949.
Article Google Scholar
Kwong, S., Chau, C. W., Man, K. F., & Tang, K. S. (2001). Optimisation of HMM topology and its model parameters by genetic algorithms. Pattern Recognition, 34(2), 509–522.
Article MATH Google Scholar
Lee, H. K., & Kim, J. H. (1999). An HMM-based threshold model approach for gesture recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 21(10), 961–973.
Article Google Scholar
Lichman, M. (2013). UCI machine learning repository. University of California, Irvine, School of Information and Computer Sciences. http://archive.ics.uci.edu/ml.
Luo, X. (2011). Chinese speech recognition based on a hybrid SVM and HMM architecture. In International symposium on neural networks (pp. 629–635). Berlin: Springer.
Ma, C., Randolph, M. A., & Drish, J. (2001). A support vector machines-based rejection technique for speech recognition. In IEEE international conference on acoustics, speech, and signal processing. proceedings (ICASSP’01) (Vol. 1, pp. 381–384). Salt Lake City: IEEE.
Masmoudi, S., Frikha, M., Chtourou, M., & Hamida, A. B. (2011). Efficient MLP constructive training algorithm using a neuron recruiting approach for isolated word recognition system. International Journal of Speech Technology, 14(1), 1–10.
Article Google Scholar
Matsui, T., Kanno, T., & Furui, S. (1996). Speaker recognition using HMM composition in noisy environments. Computer Speech & Language, 10(2), 107–116.
Article Google Scholar
Muda, L., Begam, M., & Elamvazuthi, I. (2010). Voice recognition algorithms using mel frequency cepstral coefficient (MFCC) and dynamic time warping (DTW) techniques. Journal of Computing, 2(3), 138–143.
Google Scholar
Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257–286.
Article Google Scholar
Rabiner, L. R., & Juang, B. (1986). An introduction to hidden Markov models. IEEE ASSP magazine, 3(1), 4–16.
Article Google Scholar
Rabiner, L. R., & Juang, B. H. (1992). Hidden Markov models for speech recognition—strengths and limitations. In P. Laface & R. De Mori (Eds.), Speech recognition and understanding (pp. 3–29). Berlin: Springer.
Chapter Google Scholar
Ramĺrez, M., Sotaquirá, M., De La Cruz, A., Maria, E., Avellaneda, G., & Ochoa, A. (2016). An automatic speech recognition system for helping visually impaired children to learn Braille. In XXI symposium on signal processing, images and artificial vision (STSIVA) (pp. 1–4). Bucaramanga: IEEE.
Rao, K. S., Reddy, V. R., & Maity, S. (2015). Language identification using spectral and prosodic features. Berlin: Springer.
Book Google Scholar
Schmidt, M., Schels, M., & Schwenker, F. (2010). A hidden markov model based approach for facial expression recognition in image sequences. In F. Schwenker, N. El Gayar (Eds.), IAPR workshop on artificial neural networks in pattern recognition (pp. 149–160). Berlin: Springer.
Chapter Google Scholar
Sun, J., Sun, J., Abida, K., & Karray, F. (2012). A novel template matching approach to speaker-independent arabic spoken digit recognition. In M. Kamel, F. Karray, & H. Hagras (Eds.), Autonomous and Intelligent Systems (pp. 192–199). Berlin: Springer.
Chapter Google Scholar
Thubthong, N., & Kijsirikul, B. (2001). Support vector machines for Thai phoneme recognition. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 9(06), 803–813.
Article MATH Google Scholar
Wang, Q., & Ju, S. (2008). A mixed classifier based on combination of HMM and KNN. In Fourth international conference on natural computation, ICNC’08 (Vol. 4, pp. 38–42). Washington, DC: IEEE.
Wang, X. H., Liu, A., & Zhang, S. Q. (2015). New facial expression recognition based on FSVM and KNN. Optik-International Journal for Light and Electron Optics, 126(21), 3132–3134.
Article Google Scholar
Xu, C. (2014). Model constrution in Speech recognition on time and space sampling point of view. In IEEE 9th Conference on industrial electronics and applications (ICIEA) (pp. 1095–1097). IEEE.
Xu, Y., Siohan, O., Simcha, D., Kumar, S., & Liao, H. (2015). Exemplar-based large vocabulary speech recognition using k-nearest neighbors. In International conference on acoustics, speech and signal processing (ICASSP), (pp. 5167–5171). IEEE.
Zarrouk, E., Ayed, Y. B., & Gargouri, F. (2014). Hybrid continuous speech recognition systems by HMM, MLP and SVM: A comparative study. International Journal of Speech Technology, 17(3), 223–233.
Article Google Scholar
Zeinali, H., Sameti, H., & Burget, L. (2017). HMM-based phrase-independent i-vector extractor for text-dependent speaker verification. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25(7), 1421–1435.
Article Google Scholar
Zhang, X., Povey, D., & Khudanpur, S. (2015). A diversity-penalizing ensemble training method for deep learning. In INTERSPEECH (pp. 3590–3594).
Zhang, X., Sun, J., & Luo, Z. (2014). One-against-all weighted dynamic time warping for language-independent and speaker-dependent speech recognition in adverse conditions. PLoS ONE, 9(2), e85458. https://doi.org/10.1371/journal.pone.0085458.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of 20 août 1955, Skikda, Algeria
Samira Hazmoune, Fateh Bougamouza & Smaine Mazouzi
(LIRE) Distributed Computer Science Laboratory, University of Constantine 2 Abdelhamid Mehri, Constantine, Algeria
Mohamed Benmohammed

Authors

Samira Hazmoune
View author publications
You can also search for this author in PubMed Google Scholar
Fateh Bougamouza
View author publications
You can also search for this author in PubMed Google Scholar
Smaine Mazouzi
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed Benmohammed
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Samira Hazmoune.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hazmoune, S., Bougamouza, F., Mazouzi, S. et al. A new hybrid framework based on Hidden Markov models and K-nearest neighbors for speech recognition. Int J Speech Technol 21, 689–704 (2018). https://doi.org/10.1007/s10772-018-9535-4

Download citation

Received: 26 October 2017
Accepted: 07 July 2018
Published: 16 July 2018
Issue Date: September 2018
DOI: https://doi.org/10.1007/s10772-018-9535-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A new hybrid framework based on Hidden Markov models and K-nearest neighbors for speech recognition

Abstract

Access this article

Similar content being viewed by others

Sequence-Discriminative Training of Neural Networks

The Influence the Training Set Size Has on the Performance of a Digit Speech Recognition System in Macedonian

Optimal parameters selected for automatic recognition of spoken Amazigh digits and letters using Hidden Markov Model Toolkit

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A new hybrid framework based on Hidden Markov models and K-nearest neighbors for speech recognition

Abstract

Access this article

Similar content being viewed by others

Sequence-Discriminative Training of Neural Networks

The Influence the Training Set Size Has on the Performance of a Digit Speech Recognition System in Macedonian

Optimal parameters selected for automatic recognition of spoken Amazigh digits and letters using Hidden Markov Model Toolkit

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation