Skip to main content

Pronunciation Similarity Estimation for Spoken Language Learning

  • Conference paper
  • 878 Accesses

Part of the Lecture Notes in Computer Science book series (LNAI,volume 4285)

Abstract

This paper presents an approach for estimating pronunciation similarity between two speakers using the cepstral distance. General speech recognition systems have been used to find the matched words of a speaker, using the acoustical score of a speech signal and the grammatical score of a word sequence. In the case of learning a language, for a speaker with impaired hearing, it is not easy to estimate the pronunciation similarity using automatic speech recognition systems, as this requires more information of pronouncing characteristics, than information on word matching. This is a new challenge for computer aided pronunciation learning. The dynamic time warping algorithm is used for cepstral distance computation between two speech data with codebook distance subtracted to consider the characteristics of each speaker. The experiments evaluated on the Korean fundamental vowel set show that the similarity of two speaker’s pronunciation can be efficiently computed using computers.

Keywords

  • Speech Recognition
  • Speech Signal
  • Dynamic Time Warping
  • Decision Threshold
  • Confidence Measure

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/11940098_46
  • Chapter length: 8 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   109.00
Price excludes VAT (USA)
  • ISBN: 978-3-540-49668-7
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   149.99
Price excludes VAT (USA)

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 77(2), 257–286 (1989)

    CrossRef  Google Scholar 

  2. Yan, Q., Vaseghi, S., Rentzos, D., Ho, H.C., Turajlic, E.: Analysis of acoustic correlates of Britich, Australian and American accents. In: IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 345–350 (2003)

    Google Scholar 

  3. Humphries, J.: Accent modelling and adaptation in acoustic speech recognition, Ph.D. thesis, Cambridge University (1997)

    Google Scholar 

  4. Yan, Q., Vaseghi, S.: Analysis, modeling and synthesis of formants of British, American and Australian accents. In: Proceedings of International Conference on Acoustics, Speech, and Signal Processing, pp. 712–715 (2003)

    Google Scholar 

  5. Rahim, M.G., Lee, C.H., Juang, B.H.: Discriminative utterance verification for connected digits recognition. IEEE Transactions on Speech and Audio Processing 5(3), 266–277 (1997)

    CrossRef  Google Scholar 

  6. Sukkar, R.A., Setlur, A.R., Lee, C.H., Jacob, J.: Verifying and correcting recognition string hypotheses using discriminative utterance verification. Speech Communication 22, 333–342 (1997)

    CrossRef  Google Scholar 

  7. Rose, R.C., Juang, B.H., Lee, C.H.: A training procedure for verifying string hypothesis in continuous speech recognition. In: Proceedings of International Conference on Acoustics, Speech, and Signal Processing, pp. 281–284 (1995)

    Google Scholar 

  8. Jiang, H.: Confidence measure for speech recognition: A survey. Speech Communication 45, 455–470 (2005)

    CrossRef  Google Scholar 

  9. Witt, S.M.: Use of Speech Recognition in Computer-assisted Language Learning, Ph.D. thesis, Cambridge University (1999)

    Google Scholar 

  10. Myers, C., Rabiner, L., Rosenberg, A.: Performance tradeoffs in dynamic time warping algorithms for isolated word recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing 28(6), 623–635 (1980)

    MATH  CrossRef  Google Scholar 

  11. Vapnik, V.N.: An overview of statistical learning theory. IEEE Transactions on Neural Networks 10(5), 998–999 (1999)

    CrossRef  Google Scholar 

  12. http://svmlight.joachims.org

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kim, D., Yook, D. (2006). Pronunciation Similarity Estimation for Spoken Language Learning. In: Matsumoto, Y., Sproat, R.W., Wong, KF., Zhang, M. (eds) Computer Processing of Oriental Languages. Beyond the Orient: The Research Challenges Ahead. ICCPOL 2006. Lecture Notes in Computer Science(), vol 4285. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11940098_46

Download citation

  • DOI: https://doi.org/10.1007/11940098_46

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-49667-0

  • Online ISBN: 978-3-540-49668-7

  • eBook Packages: Computer ScienceComputer Science (R0)