Skip to main content

Arabic phonemes recognition using hybrid LVQ/HMM model for continuous speech recognition

Abstract

In attempt to increase the rate of Arabic phonemes recognition, we introduce a novel hybrid recognition algorithm. The algorithm is composed of the learning vector quantization (LVQ) and hidden Markov model (HMM). The hybrid algorithm used to recognizing Arabic phonemes in continuous open-vocabulary speech. A recorded Arabic corpus of different TV news for modern standard Arabic was used for training and testing purposes. We employ a data driven approach to generate the training feature vectors that embed the frame neighboring correlation information. Next, we generate the phonemes codebooks using the K-means splitting algorithm. Then, we trained the generated codebooks using the LVQ algorithm. We achieved a performance of 98.49 % during independent classification training and 90 % during dependent classification training. When using the trained LVQ codebooks in Arabic utterance transcription, the phoneme recognition rate was 72 % using LVQ only. We combined the LVQ codebooks with the single state HMM model using enhanced Viterbi algorithm which includes the phonemes bigrams. We achieved 89 % of Arabic phonemes recognition rate based on the hybrid LVQ/HMM algorithm.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Notes

  1. Except for the first 3 frames and the last 3 frames in the feature matrix.

References

  • AbuZeina, D., & Al-Khatib, W. (2012). Within-word pronunciation variation modeling for Arabic ASRs: A direct data-driven approach. International Journal of Speech Technology, 15(2), 65–75.

    Article  Google Scholar 

  • Ali, M., & Elshafei, M. (2009). Arabic phonetic dictionaries for speech recognition. Journal of Information Technology, 2(80), 67–80.

    Article  Google Scholar 

  • Al-Manie, M., Alkanhal, M., & Al-Ghamdi, M. (2010). Arabic speech segmentation: Automatic verses manual method and zero crossing measurements. Indian Journal of Science and Technology, 3, 1134–1138.

    Google Scholar 

  • Avdagic, Z., Nuhic, A., & Konjicija, S. (2007). Phoneme recognition as a member of predefined class using hybrid cascaded LVQ/elman neural network. In 2007 IEEE International Conference on Signal Processing and Communications, (pp. 1195–1198).

  • Cosi, P., Frasconi, P., Gori, M., Lastrucci, L., & Soda, G. (2000). Competitive radial basis functions training for phone classification. Neurocomputing, 34(1–4), 117–129.

    Article  MATH  Google Scholar 

  • Essa, E., Tolba, A., & Elmougy, S. (2008). Combined classifier based Arabic speech recognition. In Proceedings of the 2008 IEEE International Conference on Computer Engineering & Systems.

  • Gemmeke, J., ten Bosch, L., Boves, L., & Cranen, B. (2009). Using sparse representations for exemplar based continuous digit recognition. In Proceeding of the EUSIPCO, (pp. 24–28).

  • Gürgen, F., Alpaydin, R., Ünlüakin, U., & Alpaydin, E. (1994). Distributed and local neural classifiers for phoneme recognition†. Pattern Recognition Letters, 15(11), 1111–1118.

    Article  Google Scholar 

  • Kohonen, T. (1988). Self-organization and associative memory (2nd ed., pp. 199–202). Berlin: Springer.

    Book  MATH  Google Scholar 

  • Kondo, K., Kamata, H., & Ishida, Y. (1994). Speaker-independent spoken digits recognition using LVQ. In Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN’94), (Vol. 7, pp. 4448–4451).

  • Kumpf, K., & King, R. (1996). Automatic accent classification of foreign accented Australian English speech. In Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP ’96,( Vol. 3, pp. 1740–1743).

  • Kurimo, M. (1997). Training mixture density HMMs with SOM and LVQ. Computer Speech & Language, 11(4), 321–343.

    Article  Google Scholar 

  • Lamere, P., Kwok, P., & Walker, W. (2003). Design of the CMU Sphinx-4 decoder. In Eurospeech.

  • Ma, D., & ZENG, X. (2012). An improved VQ based algorithm for recognizing speaker-independent isolated words. In 2012 International Conference on Machine Learning and Cybernetics, pp. 792–796.

  • MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of 5-th Berkeley symposium on Mathematical Statistics and Probability, (Vol. 1, pp. 281–297).

  • Mäntysalo, J., Torkkola, K., & Kohonen, T. (1994). Mapping content dependent acoustic information into context independent form by LVQ. Speech Communication, 14(2), 119–130.

    Article  Google Scholar 

  • McDermott, E., & Katagiri, S. (1991). LVQ-based shift-tolerant phoneme recognition. Signal Processing, IEEE Transactions, 39(6), 1398–1411.

    Article  Google Scholar 

  • Nahar, K., Elshafei, M., & Al-Khatib, W. (2012). Statistical analysis of Arabic phonemes for continuous Arabic speech recognition. International Journal of Computer and Information Technology, 1(2), 49–61.

    Google Scholar 

  • Prasad, T., & Kohli, M.(2010). Vector quantization of microarray gene expression data. In Proceedings of the World Congress on Engineering.

  • Selouani, S., & Caelen, J. (1999). A hybrid learning vector quantization/time-delay neural networks system for the recognition of arabic speech. In Proceedings of the IEEE-EURASIP Workshop on Nonlinear Signal and Image Processing (NSIP’99), (Vol. 2, pp. 709–713).

  • Waibel, A., Hanazawa, T., Hinton, G., Shikano, K., & Lang, K. J. (1989). Phoneme recognition using time-delay neural networks. IEEE Transactions on Acoustics, Speech, and Signal Processing, 37(3), 328–339.

    Article  Google Scholar 

  • Yokota, M., Katagiri, S., & McDermott, E. (1988). Learning in an LVQ based phoneme recognition system. (7E/CE Technical Report, SP88-104).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Khalid M. O. Nahar.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Nahar, K.M.O., Abu Shquier, M., Al-Khatib, W.G. et al. Arabic phonemes recognition using hybrid LVQ/HMM model for continuous speech recognition. Int J Speech Technol 19, 495–508 (2016). https://doi.org/10.1007/s10772-016-9337-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-016-9337-5

Keywords

  • Learning vector quantization (LVQ)
  • Codebooks
  • K-means algorithm
  • Phonemes transcription
  • Hidden Markov model (HMM)
  • Hybrid LVQ/HMM model