Skip to main content

Analysis of phonemes and tones confusion rules obtained by ASR


This paper is based on the exploration of the effective method of erroneous phoneme pronunciation of Chinese mandarin learners whose mother tongue is Uyghur and the solution of major problems of language education, concerning the learner’s pronunciation, it uses a different method, namely data-driven approach, and the automatic speech recognition is also used to recognize phonemes of the pronunciation of Chinese mandarin learners. The phoneme sequence is identified and then the standard pronunciation phonemes corresponding to the recognized phonemes are used as the target phonemes to obtain the mapping relation of each target phoneme and recognition phoneme, thus the possible phoneme error categories and possible erroneous rules in pronunciation can be obtained, which may give some help to the learners to learn the Chinese auxiliary language system and the corresponding pronunciation evaluation model.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6


  1. Ito, A., Lim, Y.-L., Suzuki, M., & MaKino, S. (2007). Pronunciation error detection for computer-assisted language learning system based on error rule clustering using a decision tree. Acoustical Science and Technology, 28(2), 131–133.

    Article  Google Scholar 

  2. Stanley, T., & Hacioglu, K. (2012). Improving L1-specific phonological error diagnosis in computer assisted pronunciation training. In INTERSPEECH 2012 (pp. 827–830). ISCA.

  3. Wang, Y. B., & Lee, L. S. (2012). Improved approaches of modeling and detecting error patterns with empirical analysis for computer-aided pronunciation training. In ICASSP 2012 (pp. 5049–5052). IEEE.

  4. Jiang, D., Wang, W., Shi, L., & Song, H. (2018). A compressive sensing-based approach to end-to-end network traffic reconstruction. IEEE Transactions on Network Science and Engineering, 5(3), 1–12.

    Google Scholar 

  5. Jiang, M., Jiang, L., Jiang, D., et al. (2017). Dynamic measurement errors prediction for sensors based on firefly algorithm optimize support vector machine. Sustainable Cities and Society, 2017(35), 250–256.

    Article  Google Scholar 

  6. Wang, F., Jiang, D., & Qi, S. (2019). An adaptive routing algorithm for integrated information networks. China Communications, 7(1), 196–207.

    Google Scholar 

  7. Jiang, D., Zhang, P., & Lv, Z. (2016). Energy-efficient multi-constraint routing algorithm with load balancing for smart city applications. IEEE Internet of Things Journal, 3(6), 1437–1447.

    Article  Google Scholar 

  8. Jiang, D., Li, W., & Lv, H. (2017). An energy-efficient cooperative multicast routing in multi-hop wireless networks for smart medical applications. Neurocomputing, 220, 160–169.

    Article  Google Scholar 

  9. Witt, S. M. (1999). Use of speech recognition in computer-assisted language learning. Cambridge: Cambridge University.

    Google Scholar 

  10. Ye, H., & Young, S. J. (2005). Improving the speech recognition performance of beginners in spoken conversational interaction for language learning. In INTERSPEECH 2005 (pp. 289–292). ISCA.

  11. Jiang, D., Huo, L., & Song, H. (2018). Rethinking behaviors and activities of base stations in mobile cellular networks based on big data analysis. IEEE Transactions on Network Science and Engineering, 1(1), 1–12.

    MathSciNet  Google Scholar 

  12. Qian, X. J., Meng, H., & Soong, F. K. (2011). On mispronunciation lexicon generation using joint sequence multigrams in computer-aided pronunciation training (CAPT). In INTERSPEECH 2011 (pp. 865–868). ISCA.

  13. Tsubota, Y., Kawahara, T., & Dantsuji, M. (2002) Recognition and verification of English by Japanese students for computer-assisted language learning system. In Proceedings of ICSLP (pp. 1205–1208).

  14. Oh, Y. R., Yoon, J. S., & Kim, H. K. (2007). Acoustic model adaptation based on pronunciation variability analysis for non-native speech recognition. Speech Communication, 49, 59–70.

    Article  Google Scholar 

  15. Peter Ladefoged, F., & Keith Johnson, S. (2015). A course in phonetics (7th ed.). Haidian: Peking University Press.

    Google Scholar 

  16. Thurgood, G., & La Polla, R. J. (2003). The Sino-Tibetan languages. London: Routledge.

    Google Scholar 

  17. Jiang, D., Huo, L., & Li, Y. (2018). Fine-granularity inference and estimations to network traffic for SDN. PLoS ONE, 13(5), 1–23.

    Google Scholar 

  18. Huo, L., Jiang, D., & Lv, Z. (2018). Soft frequency reuse-based optimization algorithm for energy efficiency of multi-cell networks. Computers & Electrical Engineering, 66(2), 316–331.

    Article  Google Scholar 

  19. Xiangru, Z., & Zhining, Z. (1985). Uyghur language. China: National press.

    Google Scholar 

  20. Shifeng, F. (2009). Experimental phonology exploration. China: Peking University Press.

    Google Scholar 

  21. Lo, W. K., Zhang, S., & Meng, H. M. (2010). Automatic derivation of phonological rules for mispronunciation detection in a computer-assisted pronunciation training system. In INTERSPEECH 2010 (pp. 765–768).

  22. Zhu, J., Song, Y., Jiang, D., et al. (2018). A new deep-Q-learning-based transmission scheduling mechanism for the cognitive Internet of things. IEEE Internet of Things Journal, 5(4), 2375–2385.

    Article  Google Scholar 

  23. Troung, K. F. (2004). Automatic pronunciation error detection in Dutch as a second language: An acoustic-phonetic approach. Utrecht: Utrecht University.

    Google Scholar 

  24. Jiang, D., Wang, Y., Lv, Z., et al. (2019). Big data analysis-based network behavior insight of cellular networks for industry 4.0 applications. IEEE Transactions on Industrial Informatics.

    Article  Google Scholar 

  25. Jiang, D., Huo, L., Lv, Z., et al. (2018). A joint multi-criteria utility-based network selection approach for vehicle-to-infrastructure networking. IEEE Transactions on Intelligent Transportation Systems, 19(10), 3305–3319.

    Article  Google Scholar 

  26. Huo, L., & Jiang, D. (2019). Stackelberg game-based energy-efficient resource allocation for 5G cellular networks. Telecommunication System, 23(4), 1–11.

    Google Scholar 

  27. Sun, M., Jiang, D., Song, H., et al. (2017). Statistical resolution limit analysis of two closely-spaced signal sources using Rao test. IEEE Access, 2017(5), 22013–22022.

    Article  Google Scholar 

  28. Dong, B., & Zhao, Q. W. (2006). Automatic scoring of flat tongue and raised tongue in computer-assisted mandarin learning. In ISCSLP 2006 (pp. 2–7). IEEE.

  29. Chen, L., Jiang, D., Song, H., et al. (2018). A lightweight end-side user experience data collection system for quality evaluation of multimedia communications. IEEE Access, 6(2018), 15408–15419.

    Article  Google Scholar 

  30. Sun, M., Jiang, D., Song, H., et al. (2017). Statistical resolution limit analysis of two closely-spaced signal sources using Rao test. IEEE Access, 5, 22013–22022.

    Article  Google Scholar 

  31. Wang, S. J., & Li, H. Y. (2011). Research on the evaluation of spoken language scale intelligence for second language learning. Chinese Journal of information science, 25(6), 142–148.

    MathSciNet  Google Scholar 

  32. Gass, S., & Selinker, L. (1992). Language transfer in language learning (pp. 22–113). Amsterdam: John Benjamins Publishing Company.

    Book  Google Scholar 

  33. Wang, L., Feng, X., & Meng, H. M. (2008). Automatic generation and pruning of phonetic mispronunciations to support computer-aided pronunciation training. In INTERSPEECH 2008 (pp. 22– 26).

  34. Wang, L., Feng, X., & Meng, H. M. (2008). Mispronunciation detection based on cross-language phonological comparisons. In ICALIP 2008 (pp. 307–311).

  35. Arkin, G., & Hamdulla, A. (2018). Tone investigation of non-native Chinese speakers based on acoustic features. Technical Acoustics, 37(6), 572–578.

    Google Scholar 

  36. Arkin, G., & Hamdulla, A. (2018). Tone analysis of non-native Chinese speakers based on rules and statistics. Journal of Applied Acoustics, 37(3), 366–372.

    Google Scholar 

Download references


This work was supported by the National Natural Science Foundation of China (NSFC; Grants 61662078, and 61633013), National Key Research and Development Plan of China (2017YFC0820602).

Author information

Authors and Affiliations


Corresponding author

Correspondence to Askar Hamdulla.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Arkin, G., Hamdulla, A. & Ablimit, M. Analysis of phonemes and tones confusion rules obtained by ASR. Wireless Netw 27, 3471–3481 (2021).

Download citation

  • Published:

  • Issue Date:

  • DOI: