Improved Phone Recognition Using Excitation Source Features
Phone recognizers serve as the preprocessing unit for speech recognition systems and phonetic engines. Even though, most of the state of the art speech recognition achieve relatively better accuracy at the sentence level, the phone level recognition performance falls way below the sentence level performance. The increased recognition rates at the sentence levels are achieved with help of refined language models used for the language under consideration. Therefore, the objective of the present work is to improve the phoneme level accuracy of the hidden markov model(HMM) based acoustic phone models by combining excitation source features with the conventional mel frequency cepstral coefficients (MFCC) for American English. TIMIT and CMU Arctic database, is used for the experiments in the present work. The average spectral energy around the zero-frequency region of each frame is used as the excitation source feature to combine with the 13 MFCC features. The effectiveness of the phoneme recognition is confirmed by a 0.5% increase in the phone recognition accuracy against the state of the art HMM-GMM acoustic models with MFCC features.
Unable to display preview. Download preview PDF.
- 1.Sreejith, A., Mary, L., Riyas, K.S., Joseph, A., Augustine, A.: Automatic prosodic labeling and broad class phonetic engine for malayalam. In: Proc. Int. Conf. Control, Communication and Computing (ICCC) (2013)Google Scholar
- 2.Ghahremani, P., BabaAli, B., Povey, D., Reidhammer, K., Trmal, J., Khudanpur, S.: A pitch extraction algorithm tuned for automatic speech recognition. In: Proc. ICASSP 2014 (2014)Google Scholar
- 3.Hidden Markov Model Toolkit (HTK) Book, University of Cambridge (2003)Google Scholar
- 4.Kruger, S.E., Schaffoner, M., Katz, M., Andelic, E., Wendemuth, A.: Using support vector machines in a hmm based speech recognition system. In: Proc. SPECOM (2005)Google Scholar
- 5.Stadermann, J., Rigoll, G.: A hybrid svm/hmm acoustic modeling approach to automatic speech recognition. In: INTERSPEECH (2004)Google Scholar
- 7.Deekshitha, G., Mary, L.: Prosodically guided phonetic engine. In: Proc. IEEE International Conference on Signal Process., Informatics Commun. and Energy Sys. (2015)Google Scholar
- 9.Govind, D., Prasana, S.R.M., Yegnanarayana, B.: Significance of glottal activity detection for duration modification. In: Proc. Speech Prosody (2012)Google Scholar
- 11.Garafolo, J., et al.: TIMIT: Acoustic-Phonetic Continuous Speech Corpus LDC93S1. Linguistic Data Consortium (1993)Google Scholar
- 12.Kominek, J., Black, A.: CMU-Arctic speech databases. In: 5th ISCA Speech Synthesis Workshop, Pittsburgh, PA, pp. 223–224 (2004)Google Scholar