Skip to main content
Log in

A new hybrid framework based on Hidden Markov models and K-nearest neighbors for speech recognition

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

Hidden Markov models (HMM) have proved their success in several research areas, especially in speech recognition field. However, the major drawback of HMM classifier, is its sensitiveness to some initial parameters such as the number of states, which need to be tuned carefully. In fact, it is well known that the number of states suitable for a certain utterance may not perform as well for other utterances of the same class. To deal with this problem, and in order to take into consideration some levels of data variability, we investigate a new hybrid framework for speech recognition, in which we integrate the HMM classifier within the K-nearest neighbors (KNN) architecture. In this framework, we propose to build several HMMs differing in their numbers of states to represent each class of data, and to use KNN rule to decide the K nearest models and the most represented class using Viterbi likelihood as similarity measurement. In order to remove ambiguity during the decision step, we propose two different methods. The proposed framework is evaluated using the UCI Spoken Arabic Digit dataset. The obtained results show the effectiveness of our approach either when compared to HMM and KNN baseline or when compared to previous works on the same dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  • Abdel-Hamid, O., Mohamed, A. R., Jiang, H., & Penn, G. (2012). Applying convolutional neural networks concepts to hybrid NN-HMM model for speech recognition. In the IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 4277–4280). Kyoto: IEEE

  • AlKhateeb, J. H., Khelifi, F., Jiang, J., & Ipson, S. S. (2009). A new approach for off-line handwritten Arabic word recognition using KNN classifier. In IEEE international conference on signal and image processing applications (ICSIPA) (pp. 191–194). Kuala Lumpur: IEEE.

  • Al-Qatab, B. A., & Ainon, R. N. (2010). Arabic speech recognition using hidden Markov model toolkit (HTK). In International symposium in information technology (ITSim) (Vol. 2, pp. 557–562). Kuala Lumpur: IEEE.

  • Biem, A. (2003). A model selection criterion for classification: Application to hmm topology optimization. In Seventh international conference on document analysis and recognition (pp. 104–108). Edinburgh: IEEE.

  • Bougamouza, F., Hazmoune, S., & Benmohammed, M. (2016). Using Mel Frequency Cepstral Coefficient method for online Arabic characters handwriting recognition. In 5th international conference on multimedia computing and systems (ICMCS) (pp. 87–92). Rabat: IEEE.

  • Cavalin, P. R., Sabourin, R., & Suen, C. Y. (2012). LoGID: An adaptive framework combining local and global incremental learning for dynamic selection of ensembles of HMMs. Pattern Recognition, 45(9), 3544–3556.

    Article  Google Scholar 

  • Chebotar, Y., & Waters, A. (2016). Distilling knowledge from ensembles of neural networks for speech recognition. In Interspeech, San Francisco (pp. 3439–3443).

  • Clarkson, P., & Moreno, P. J. (1999). On the use of support vector machines for phonetic classification. In IEEE international conference on acoustics, speech, and signal processing (Vol. 2, pp. 585–588). Phoenix: IEEE.

  • Cohen, I., Sebe, N., Garg, A., Chen, L. S., & Huang, T. S. (2003). Facial expression recognition from video sequences: temporal and static modeling. Computer Vision and Image Understanding, 91(1), 160–187.

    Article  Google Scholar 

  • Deselaers, T., Heigold, G., & Ney, H. (2007). Speech recognition with state-based nearest neighbour classifiers. In Interspeech-2007, Antwerp (pp. 2093–2096.

  • Dhanashri, D., & Dhonde, S. B. (2017). Isolated word speech recognition system using deep neural networks. In Proceedings of the international conference on data engineering and communication technology (pp. 9–17). Singapore: Springer.

  • Ding, J., & Chang, C. W. (2016). An adaptive hidden Markov model-based gesture recognition approach using Kinect to simplify large-scale video data processing for humanoid robot imitation. Multimedia Tools and Applications, 75(23), 15537–15551.

    Article  Google Scholar 

  • En-Naimani, Z., Lazaar, M., & Ettaouil, M. (2014). Hybrid system of optimal self organizing maps and hidden Markov model for Arabic digits recognition. WSEAS Transactions on Systems, 13(60), 606–616.

    Google Scholar 

  • Fix, E., & Hodges, J. L. (1951). Discriminatory analysis, nonparametric discrimination: Consistency properties, Technical Report 4, USAF School of Aviation Medicine, Randolph Field, Texas.

  • Ganapathiraju, A., Hamaker, J., & Picone, J. (2000). Hybrid SVM/HMM architectures for speech recognition. In Sixth international conference on spoken language processing, Beijing (Vol. 4, pp. 504–507).

  • Ganapathiraju, A., Hamaker, J. E., & Picone, J. (2004). Applications of support vector machines to speech recognition. IEEE Transactions on Signal Processing, 52(8), 2348–2355.

    Article  Google Scholar 

  • Geiger, J., Schenk, J., Wallhoff, F., & Rigoll, G. (2010). Optimizing the number of states for HMM-based on-line handwritten whiteboard recognition. In International conference on frontiers in handwriting recognition (ICFHR) (pp. 107–112). Kolkata: IEEE.

  • Gunter, S., & Bunke, H. (2003). Optimizing the number of states, training iterations and Gaussians in an HMM-based handwritten word recognizer. In Seventh international conference on document analysis and recognition, (pp. 472–476). Edinburgh: IEEE.

  • Hai, N. T., Van Thuyen, N., Mai, T. T., & Van Toi, V. (2015). MFCC-DTW algorithm for speech recognition in an intelligent wheelchair. In 5th international conference on biomedical engineering in Vietnam (pp. 417–421). Cham: Springer.

  • Hammami, N., & Bedda, M. (2010). Improved tree model for Arabic speech recognition. In 3rd IEEE international conference on computer science and information technology (ICCSIT) (Vol. 5, pp. 521–526). IEEE.

  • Hammami, N., Bedda, M., & Farah, N. (2011). HMM parameters estimation based on cross-validation for Spoken Arabic Digits recognition. In International conference on communications, computing and control applications (CCCA) (pp. 1–4). Hammamet: IEEE.

  • Hammami, N., Bedda, M., Farah, N., & Lakehal-Ayat, R. O. (2013). Spoken Arabic Digits recognition based on (GMM) for e-Quran voice browsing: Application for blind category. In Taibah University international conference on advances in information technology for the Holy Quran and its sciences (32519), (pp. 123–127). Medina: IEEE.

  • Hammami, N., Bedda, M., & Nadir, F. (2012). The second-order derivatives of MFCC for improving spoken Arabic digits recognition using tree distributions approximation model and HMMs. In International conference on communications and information technology (ICCIT), (pp. 1–5). IEEE.

  • Hazmoune, S., Bougamouza, F., Mazouzi, S., & Benmohammed, M. (2013a). A novel speech recognition approach based on multiple modeling by hidden Markov models. In International Conference on Computer Applications Technology (ICCAT), 2013 (pp. 1–6). Sousse: IEEE.

  • Hazmoune, S., Bougamouza, F., Mazouzi, S., & Benmohammed, M. (2013b). Contributions to HMM-based speech recognition systems. International Journal of Computational Linguistics Research, 4(1), 38–47.

    Google Scholar 

  • Jiang, Z., Ding, X., Peng, L., & Liu, C. (2012). Analyzing the information entropy of states to optimize the number of states in an HMM-based off-line handwritten Arabic word recognizer. In 21st International Conference on Pattern Recognition (ICPR) (pp. 697–700). Istanbul: IEEE.

  • Khelifa, M. O., Elhadj, Y. M., Abdellah, Y., & Belkasmi, M. (2017). Constructing accurate and robust HMM/GMM models for an Arabic speech recognition system. International Journal of Speech Technology, 20(4), 937–949.

    Article  Google Scholar 

  • Kwong, S., Chau, C. W., Man, K. F., & Tang, K. S. (2001). Optimisation of HMM topology and its model parameters by genetic algorithms. Pattern Recognition, 34(2), 509–522.

    Article  MATH  Google Scholar 

  • Lee, H. K., & Kim, J. H. (1999). An HMM-based threshold model approach for gesture recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 21(10), 961–973.

    Article  Google Scholar 

  • Lichman, M. (2013). UCI machine learning repository. University of California, Irvine, School of Information and Computer Sciences. http://archive.ics.uci.edu/ml.

  • Luo, X. (2011). Chinese speech recognition based on a hybrid SVM and HMM architecture. In International symposium on neural networks (pp. 629–635). Berlin: Springer.

  • Ma, C., Randolph, M. A., & Drish, J. (2001). A support vector machines-based rejection technique for speech recognition. In IEEE international conference on acoustics, speech, and signal processing. proceedings (ICASSP’01) (Vol. 1, pp. 381–384). Salt Lake City: IEEE.

  • Masmoudi, S., Frikha, M., Chtourou, M., & Hamida, A. B. (2011). Efficient MLP constructive training algorithm using a neuron recruiting approach for isolated word recognition system. International Journal of Speech Technology, 14(1), 1–10.

    Article  Google Scholar 

  • Matsui, T., Kanno, T., & Furui, S. (1996). Speaker recognition using HMM composition in noisy environments. Computer Speech & Language, 10(2), 107–116.

    Article  Google Scholar 

  • Muda, L., Begam, M., & Elamvazuthi, I. (2010). Voice recognition algorithms using mel frequency cepstral coefficient (MFCC) and dynamic time warping (DTW) techniques. Journal of Computing, 2(3), 138–143.

    Google Scholar 

  • Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257–286.

    Article  Google Scholar 

  • Rabiner, L. R., & Juang, B. (1986). An introduction to hidden Markov models. IEEE ASSP magazine, 3(1), 4–16.

    Article  Google Scholar 

  • Rabiner, L. R., & Juang, B. H. (1992). Hidden Markov models for speech recognition—strengths and limitations. In P. Laface & R. De Mori (Eds.), Speech recognition and understanding (pp. 3–29). Berlin: Springer.

    Chapter  Google Scholar 

  • Ramĺrez, M., Sotaquirá, M., De La Cruz, A., Maria, E., Avellaneda, G., & Ochoa, A. (2016). An automatic speech recognition system for helping visually impaired children to learn Braille. In XXI symposium on signal processing, images and artificial vision (STSIVA) (pp. 1–4). Bucaramanga: IEEE.

  • Rao, K. S., Reddy, V. R., & Maity, S. (2015). Language identification using spectral and prosodic features. Berlin: Springer.

    Book  Google Scholar 

  • Schmidt, M., Schels, M., & Schwenker, F. (2010). A hidden markov model based approach for facial expression recognition in image sequences. In F. Schwenker, N. El Gayar (Eds.), IAPR workshop on artificial neural networks in pattern recognition (pp. 149–160). Berlin: Springer.

    Chapter  Google Scholar 

  • Sun, J., Sun, J., Abida, K., & Karray, F. (2012). A novel template matching approach to speaker-independent arabic spoken digit recognition. In M. Kamel, F. Karray, & H. Hagras (Eds.), Autonomous and Intelligent Systems (pp. 192–199). Berlin: Springer.

    Chapter  Google Scholar 

  • Thubthong, N., & Kijsirikul, B. (2001). Support vector machines for Thai phoneme recognition. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 9(06), 803–813.

    Article  MATH  Google Scholar 

  • Wang, Q., & Ju, S. (2008). A mixed classifier based on combination of HMM and KNN. In Fourth international conference on natural computation, ICNC’08 (Vol. 4, pp. 38–42). Washington, DC: IEEE.

  • Wang, X. H., Liu, A., & Zhang, S. Q. (2015). New facial expression recognition based on FSVM and KNN. Optik-International Journal for Light and Electron Optics, 126(21), 3132–3134.

    Article  Google Scholar 

  • Xu, C. (2014). Model constrution in Speech recognition on time and space sampling point of view. In IEEE 9th Conference on industrial electronics and applications (ICIEA) (pp. 1095–1097). IEEE.

  • Xu, Y., Siohan, O., Simcha, D., Kumar, S., & Liao, H. (2015). Exemplar-based large vocabulary speech recognition using k-nearest neighbors. In International conference on acoustics, speech and signal processing (ICASSP), (pp. 5167–5171). IEEE.

  • Zarrouk, E., Ayed, Y. B., & Gargouri, F. (2014). Hybrid continuous speech recognition systems by HMM, MLP and SVM: A comparative study. International Journal of Speech Technology, 17(3), 223–233.

    Article  Google Scholar 

  • Zeinali, H., Sameti, H., & Burget, L. (2017). HMM-based phrase-independent i-vector extractor for text-dependent speaker verification. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25(7), 1421–1435.

    Article  Google Scholar 

  • Zhang, X., Povey, D., & Khudanpur, S. (2015). A diversity-penalizing ensemble training method for deep learning. In INTERSPEECH (pp. 3590–3594).

  • Zhang, X., Sun, J., & Luo, Z. (2014). One-against-all weighted dynamic time warping for language-independent and speaker-dependent speech recognition in adverse conditions. PLoS ONE, 9(2), e85458. https://doi.org/10.1371/journal.pone.0085458.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Samira Hazmoune.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hazmoune, S., Bougamouza, F., Mazouzi, S. et al. A new hybrid framework based on Hidden Markov models and K-nearest neighbors for speech recognition. Int J Speech Technol 21, 689–704 (2018). https://doi.org/10.1007/s10772-018-9535-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-018-9535-4

Keywords

Navigation