Pronunciation Similarity Estimation for Spoken Language Learning
This paper presents an approach for estimating pronunciation similarity between two speakers using the cepstral distance. General speech recognition systems have been used to find the matched words of a speaker, using the acoustical score of a speech signal and the grammatical score of a word sequence. In the case of learning a language, for a speaker with impaired hearing, it is not easy to estimate the pronunciation similarity using automatic speech recognition systems, as this requires more information of pronouncing characteristics, than information on word matching. This is a new challenge for computer aided pronunciation learning. The dynamic time warping algorithm is used for cepstral distance computation between two speech data with codebook distance subtracted to consider the characteristics of each speaker. The experiments evaluated on the Korean fundamental vowel set show that the similarity of two speaker’s pronunciation can be efficiently computed using computers.
KeywordsSpeech Recognition Speech Signal Dynamic Time Warping Decision Threshold Confidence Measure
Unable to display preview. Download preview PDF.
- 2.Yan, Q., Vaseghi, S., Rentzos, D., Ho, H.C., Turajlic, E.: Analysis of acoustic correlates of Britich, Australian and American accents. In: IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 345–350 (2003)Google Scholar
- 3.Humphries, J.: Accent modelling and adaptation in acoustic speech recognition, Ph.D. thesis, Cambridge University (1997)Google Scholar
- 4.Yan, Q., Vaseghi, S.: Analysis, modeling and synthesis of formants of British, American and Australian accents. In: Proceedings of International Conference on Acoustics, Speech, and Signal Processing, pp. 712–715 (2003)Google Scholar
- 7.Rose, R.C., Juang, B.H., Lee, C.H.: A training procedure for verifying string hypothesis in continuous speech recognition. In: Proceedings of International Conference on Acoustics, Speech, and Signal Processing, pp. 281–284 (1995)Google Scholar
- 9.Witt, S.M.: Use of Speech Recognition in Computer-assisted Language Learning, Ph.D. thesis, Cambridge University (1999)Google Scholar