Skip to main content

Advertisement

Log in

A heterogeneous speech feature vectors generation approach with hybrid hmm classifiers

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

Automatic speech recognition (ASR) system plays a vital role in the human–machine interaction. ASR system faces the challenge of performance degradation due to inconsistency between training and testing phases. This occurs due to extraction and representation of erroneous, redundant feature vectors. This paper proposes three different combinations at speech feature vector generation phase and two hybrid classifiers at modeling phase. In feature extraction phase MFCC, RASTA-PLP, and PLP are combined in different ways. In modeling phase, the mean and variance are calculated to generate the inter and intra class feature vectors. These feature vectors are further adopted by optimization algorithm to generate refined feature vectors with traditional statistical technique. This approach uses GA + HMM and DE + HMM techniques to produce refine model parameters. The experiments are conducted on datasets of large vocabulary isolated Punjabi lexicons. The simulation result shows the performance improvement using MFCC and DE + HMM technique when compared with RASTA-PLP, PLP using hybrid HMM classifiers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Aggarwal, R. K., & Dave, M. (2013). Performance evaluation of sequentially combined heterogeneous feature streams for Hindi speech recognition system. Telecommunication Systems, 52(3), 1457–1466.

    Article  Google Scholar 

  • Alam, M. J., Kenny, P., Dumouchel, P., & O’Shaughnessy, D. (2014). Robust feature extractors for continuous speech recognition. 22nd European Signal Processing Conference, (EUSIPCO), pp. 944–948.

  • Alam, M. J., Kinnunen, T., Kenny, P., Ouellet, P., & O’Shaughnessy, D. (2013). Multitaper MFCC and PLP features for speaker verification using i-vectors. Speech Communication, 55(2), 237–251.

    Article  Google Scholar 

  • Bengio, Y., & Grandvalet, Y. (2004). No unbiased estimator of the variance of k-fold cross-validation. Journal of Machine Learning Research, 5, 1089–1105.

    MathSciNet  MATH  Google Scholar 

  • Chang, E. I., Lippmann, R., & Tong, D. W. (1990). Using genetic algorithms to improve pattern classification performance. In NIPS, pp. 797–803.

  • Clemente, I. A., Heckmann, M., & Wrede, B. (2012). Incremental word learning: Efficient hmm initialization and large margin discriminative adaptation. Speech Communication, 54(9), 1029–1048.

    Article  Google Scholar 

  • Davis, S., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28, 357–366.

    Article  Google Scholar 

  • Dua, M., Aggarwal, R. K., Kadyan, V., & Dua, S. (2012). Punjabi speech to text system for connected words. Fourth International Conference on Advances in Recent Technologies in Communication and Computing, pp. 206–209.

  • Dua, M., Aggarwal, R. K., Kadyan, V., & Dua, S. (2012). Punjabi automatic speech recognition using HTK. International Journal of Computer Science Issues, 9(4), 359–364.

    Google Scholar 

  • Ganapathiraju, A. (2002). Support vector machines for speech recognition. Doctoral dissertation, Mississippi State University.

  • Ghai, W., & Singh, N. (2013). Continuous speech recognition for Punjabi language. International Journal of Computer Applications, 72(14), 23–28.

    Article  Google Scholar 

  • Grierson, G. A. (1968). Linguistic survey of India. 5: Indo-aryan family, Eastern group; 2. New Delhi: Motilal Banarsidass.

  • Hermansky, H. (1990). Perceptual linear predictive (PLP) analysis of speech. The Journal of the Acoustical Society of America, 87(4), 1738–1752.

    Article  Google Scholar 

  • Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A. R., Jaitly, N., & Kingsbury, B. (2012). Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 29(6), 82–97.

    Article  Google Scholar 

  • Holand, J. H. (1975). Adaptation in natural and artificial systems’. Ann Arbor: University of Michigan.

    Google Scholar 

  • Juang, B. H., & Rabiner, L. R. (1991). Hidden Markov models for speech recognition. Technometrics, 33(3), 251–272.

    Article  MathSciNet  MATH  Google Scholar 

  • Koehler, J., Morgan, N., Hermansky, H., Hirsch, H. G., & Tong, G. (1994). Integrating RASTA-PLP into speech recognition. IEEE International Conference on Acoustics, Speech and Signal Processing, Vol. 1, pp. 421–424.

  • Kumar, R., & Singh, M. (2011). Spoken isolated word recognition of Punjabi language using dynamic time warp technique. In C. Singh, G. Singh Lehal, J. Sengupta, D. V. Sharma, V. Goyal (Eds.), Information systems for Indian languages (pp. 301–301). Heidelberg: Springer.

    Chapter  Google Scholar 

  • Lata, S., & Arora, S. (2012). Exploratory analysis of Punjabi tones in relation to orthographic characters: A case study. Workshop on Indian Language and Data: Resources and Evaluation Workshop Programme, pp. 76.

  • Lippmann, R. P. (1989). Review of neural networks for speech recognition. Neural Computation, 1(1), 1–38.

    Article  Google Scholar 

  • Mittal, S. (2014). Development of phonetic engine for Punjabi language. Masters dissertation, Thapar University Patiala.

  • Mittal, T., & Sharma, R. K. (2016). Speech recognition using ANN and predator-influenced civilized swarm optimization algorithm. Turkish Journal of Electrical Engineering & Computer Sciences, 24(6), 4790–4803.

    Article  Google Scholar 

  • Povey, D., Burget, L., Agarwal, M., Akyazi, P., Feng, K., Ghoshal, A., & Schwarz, P. (2010). Subspace gaussian mixture models for speech recognition. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4330–4333.

  • Psutka, J., Müller, L., & Psutka, J. V. (2001). Comparison of MFCC and PLP parameterizations in the speaker independent continuous speech recognition task. 7th European Conference on Speech Communication and Technology (EUROSPEECH 2001), Aalborg, Denmark, pp. 1813–1816.

  • Punjabi Speech Corpus. Retrieved at 10:30, August 20, 2015, from http://cdac.in/index.aspx?id=mc_ilf_Speech_Corpora.

  • Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257–286.

Download references

Acknowledgements

The work is partially supported by the research project funded by IEEE SIGHT in the Punjabi language. The sample benchmark corpus is collected from CDAC Pune. I would like to specially thanks to Dr. Amitoj Singh, Assistant Professor, MRSPTU, Bathinda and Ms. Sumanpreet Virk, Associate Professor, Punjabi University, Patiala for guiding me in Punjabi language phonology that helped in verification and generation of speech corpus.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Virender Kadyan.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kadyan, V., Mantri, A. & Aggarwal, R.K. A heterogeneous speech feature vectors generation approach with hybrid hmm classifiers. Int J Speech Technol 20, 761–769 (2017). https://doi.org/10.1007/s10772-017-9446-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-017-9446-9

Keywords

Navigation