Abstract
This paper discusses Mandarin vowel pronunciation quality assessment. The phonetic pronunciation quality is traditionally evaluated under the speech recognition framework by the phonetic posterior probability score, which may be computed by normalizing the frame-based posterior probability or be calculated on the phone segment directly. By the first method, we can achieve a human-machine scoring correlation coefficient (CC) of 0.832 for vowel; and by the second, the CC can be up to 0.847. This paper proposes a novel kind of formant feature and applies the feature to the evaluation of vowel: we transform the formant plots on the time-frequency plane to a bitmap and extract its Gabor feature for pattern classification; when use the classification probability for pronunciation assessment, we can get a CC of 0.842. Finally we combine the three scores with various linear or nonlinear methods; the best CC of 0.913 is gotten by using neural network.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Franco, H., Neumeyer, L., et al.: Automatic pronunciation Scoring for Language Instruction. ICASSP, Munich, pp. 1471–1474. Munich (1997)
Neumeyer, L., Franco, H.: Automatic Scoring of Pronunciation Quality. Speech Communication 30, 83–93 (2000)
Franco, H., Neumeyer, L., Digalakis, V., Ronen, V.: Combination of machine scores for automatic grading of pronunciation quality. Speech Communication 30, 121–130 (2000)
Yasushi, T., Masatake, D., Tatsuya, K.: Practical use of English pronunciation system for Japanese students in the CALL classroom. INTERSPEECH, pp. 1689–1692 (2004)
Witt, S.M., Young, S.J.: Phone-level pronunciation scoring and assessment for interactive language learning. Speech communication 30, 95–108 (2000)
Hillenbrand, J., Getty, L.A., Clark, M.J., et al.: Acoustic characteristics of American English vowels. Journal of the Acoustical Society of America 97, 3099–3111 (1995)
Schmid, P., Barnard, E.: Explicit, n-best formant features for vowel classification. ICASSP, pp. 21–24 (1997)
Nearey, T.M., Assmann, P.F.: Modeling the role of inherent spectral change in vowel identification. Jorunal of the Acoustical Society of America 80, 1297–1308 (1986)
Lee, M., VanSanten, J., Mobius, B., Olive, J.: Formant Tracking Using Context-Dependent Phonemic Information. IEEE Transactions on Speech and Audio Processing 13, 741–750 (2005)
Petkov, N.: Biologically motivated computationally intensive approaches to image pattern recognition. Future Generation Computer Systems 11, 451–465 (1995)
Grigorescu, S.E., Petkov, N., Kruizinga, P.: Comparison of texture features based on Gabor filters. IEEE Transactions on Image Processing 11, 1160–1167 (2002)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Pan, F., Zhao, Q., Yan, Y. (2007). New Machine Scores and Their Combinations for Automatic Mandarin Phonetic Pronunciation Quality Assessment. In: Apolloni, B., Howlett, R.J., Jain, L. (eds) Knowledge-Based Intelligent Information and Engineering Systems. KES 2007. Lecture Notes in Computer Science(), vol 4692. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74819-9_101
Download citation
DOI: https://doi.org/10.1007/978-3-540-74819-9_101
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74817-5
Online ISBN: 978-3-540-74819-9
eBook Packages: Computer ScienceComputer Science (R0)