Speaker Independent Word Recognition Using Cepstral Distance Measurement
Speech recognition has been developed from theoretical methods practical systems. Since 90’s people have moved their interests to the difficult task of Large Vocabulary Continuous Speech Recognition (LVCSR) and indeed achieved a great progress. Meanwhile, many well-known research and commercial institutes have established their recognition systems including via Voice system IBM, Whisper system by Microsoft etc. In this paper we have developed a simple and efficient algorithm for the recognition of speech signal for speaker independent isolated word recognition system. We use Mel frequency cepstral coefficients (MFCCs) as features of the recorded speech. A decoding algorithm is proposed for recognizing the target speech computing the cepstral distance of the cepstral coefficients. Simulation experiments were carried using MATLAB here the method produced relatively good (85% word recognition accuracy) results.
KeywordsSpeech Recognition Speech Signal Automatic Speech Recognition Speech Recognition System Continuous Speech Recognition
Unable to display preview. Download preview PDF.
- 1.Huang, X.D., Lee, K.F.: Phonene classification using semicontinuous hidden markov models. IEEE Trans. on Signal Processessing 40(5), 1962–1067 (1992)Google Scholar
- 3.Acero, Acoustical and environmental robustness in automatic speech recognition. Kluwer Academic Pubs. (1993)Google Scholar
- 4.Rabiner, L.R., Schafer, R.W.: Digital Processing of Speech Signals. Prentice Hall (1978)Google Scholar
- 6.Young, S.: A Review of Large-Vocabulary Continuous Speech Recognition. IEEE Signal Processing Magazine, 45–57 (September 1996)Google Scholar
- 7.Rabiner, L.R., Juang, B.-H.: Fundamentals of Speech Recognition. Prentice-Hall (1993)Google Scholar
- 8.Mel Frequency Cepstral Coefficients: An Evaluation of Robustness of MP3 Encoded Music by Sigurdur Sigurdsson, Kaare Brandt Petersen and TueLehn-SchiølerGoogle Scholar
- 9.Speech and speaker recognition: A tutorial by Samudravijaya, K., Young, S.J.: The general use of tying in phoneme-based hmm speech recognisers. In: Proceedings of ICASSP (1992)Google Scholar
- 10.Nefian, A.V., Liang, L., Pi, X., Liu, X., Mao, C.: An coupled hidden Markov model for audio-visual speech recognition. In: International Conference on Acoustics, Speech and Signal Processing (2002)Google Scholar
- 11.Neti, C., Potamianos, G., Luettin, J., Matthews, I., Vergyri, D., Sison, J., Mashari, A., Zhou, J.: Audio visual speech recognition. In: Final Workshop 2000 Report (2000)Google Scholar
- 12.Oerder, M., Ney, H.: Word graphs: an efficient interface between continuous-speech recognition and language understanding. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2 (1993)Google Scholar
- 13.Potamianos, G., Luettin, J., Neti, C.: Asynchronous stream modelling for large vocabulary audio-visual speech recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 1, pp. 169–172 (2001)Google Scholar
- 14.Dupont, S., Luettin, J.: Audio-visual speech modeling for continuous speech recognition. IEEE Transactions on Multimedia 151 (September 2000)Google Scholar