Pronunciation Similarity Estimation for Spoken Language Learning

  • Donghyun Kim
  • Dongsuk Yook
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4285)


This paper presents an approach for estimating pronunciation similarity between two speakers using the cepstral distance. General speech recognition systems have been used to find the matched words of a speaker, using the acoustical score of a speech signal and the grammatical score of a word sequence. In the case of learning a language, for a speaker with impaired hearing, it is not easy to estimate the pronunciation similarity using automatic speech recognition systems, as this requires more information of pronouncing characteristics, than information on word matching. This is a new challenge for computer aided pronunciation learning. The dynamic time warping algorithm is used for cepstral distance computation between two speech data with codebook distance subtracted to consider the characteristics of each speaker. The experiments evaluated on the Korean fundamental vowel set show that the similarity of two speaker’s pronunciation can be efficiently computed using computers.


Speech Recognition Speech Signal Dynamic Time Warping Decision Threshold Confidence Measure 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 77(2), 257–286 (1989)CrossRefGoogle Scholar
  2. 2.
    Yan, Q., Vaseghi, S., Rentzos, D., Ho, H.C., Turajlic, E.: Analysis of acoustic correlates of Britich, Australian and American accents. In: IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 345–350 (2003)Google Scholar
  3. 3.
    Humphries, J.: Accent modelling and adaptation in acoustic speech recognition, Ph.D. thesis, Cambridge University (1997)Google Scholar
  4. 4.
    Yan, Q., Vaseghi, S.: Analysis, modeling and synthesis of formants of British, American and Australian accents. In: Proceedings of International Conference on Acoustics, Speech, and Signal Processing, pp. 712–715 (2003)Google Scholar
  5. 5.
    Rahim, M.G., Lee, C.H., Juang, B.H.: Discriminative utterance verification for connected digits recognition. IEEE Transactions on Speech and Audio Processing 5(3), 266–277 (1997)CrossRefGoogle Scholar
  6. 6.
    Sukkar, R.A., Setlur, A.R., Lee, C.H., Jacob, J.: Verifying and correcting recognition string hypotheses using discriminative utterance verification. Speech Communication 22, 333–342 (1997)CrossRefGoogle Scholar
  7. 7.
    Rose, R.C., Juang, B.H., Lee, C.H.: A training procedure for verifying string hypothesis in continuous speech recognition. In: Proceedings of International Conference on Acoustics, Speech, and Signal Processing, pp. 281–284 (1995)Google Scholar
  8. 8.
    Jiang, H.: Confidence measure for speech recognition: A survey. Speech Communication 45, 455–470 (2005)CrossRefGoogle Scholar
  9. 9.
    Witt, S.M.: Use of Speech Recognition in Computer-assisted Language Learning, Ph.D. thesis, Cambridge University (1999)Google Scholar
  10. 10.
    Myers, C., Rabiner, L., Rosenberg, A.: Performance tradeoffs in dynamic time warping algorithms for isolated word recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing 28(6), 623–635 (1980)MATHCrossRefGoogle Scholar
  11. 11.
    Vapnik, V.N.: An overview of statistical learning theory. IEEE Transactions on Neural Networks 10(5), 998–999 (1999)CrossRefGoogle Scholar
  12. 12.

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Donghyun Kim
    • 1
  • Dongsuk Yook
    • 1
  1. 1.Speech Information Processing Laboratory, Department of Computer Science and EngineeringKorea UniversitySeoulKorea

Personalised recommendations