Perceptual Evaluation of Pronunciation Quality for Computer Assisted Language Learning
In this paper, we propose a novel method of perceptual evaluation of pronunciation quality for Computer Assisted Language Learning used in e-learning. The overall score of the pronunciation quality is the combination of the matching score, the perceptual score and the asymmetric score. The matching score is the measure of the acoustic distortion of the test speech, the perceptual score models the perceived distortion by human in perception domain and the asymmetric score describes the asymmetric effect of the sensation of the deletion error and the insertion error in spoken English. The correlation coefficient between the predicted objective score and the subjective score by the experts is 0.75, which is advantageous over current methods based on HMM.
KeywordsMean Opinion Score Acoustic Model Subjective Score Critical Band Perceptual Evaluation
Unable to display preview. Download preview PDF.
- 1.Young, S., Evermann, G., et al.: The HTK book (for HTK Version 3.2). Cambridge University, UK (2002)Google Scholar
- 2.ITU-T P.862, Perceptual evaluation of speech quality (PESQ): An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecsGoogle Scholar
- 3.ITU-T P.48, Specification for an intermediate reference systemGoogle Scholar
- 4.Use of speech recognition in computer-assisted-language-learning. PhD. Dissertation, Cambridge University (1999)Google Scholar
- 5.Different aspects of expert pronunciation quality ratings and their relation to scores produced by speech recognition algorithms. In: Speech Communication, vol. 30, pp. 109–119. Elsevier Science B.V., Amsterdam (2000)Google Scholar
- 6.Combination of machine scores for automatic grading of pronunciation quality. In: Speech Communication, vol. 30, pp. 121–130. Elsevier Science B. V., Amsterdam (2000)Google Scholar
- 7.Phone-based pronunciation quality assessment algorithm. Journal of Tsinghua University (Sci. and Tech.) 45(1), 5–8 (2005)Google Scholar