Skip to main content
Log in

Noise Robust Automatic Scoring Based on Deep Neural Network Acoustic Models with Lattice-Free MMI and Factorized Adaptation

  • Published:
Mobile Networks and Applications Aims and scope Submit manuscript

Abstract

Automatic scoring based on Automatic Speech Recognition (ASR) has been widely used in L2 (second language) speaking tests. In this paper, novel noise robust automatic scoring methods for L2 speaking tests based on Deep Neural Network (DNN) models with lattice-free Maximum Mutual Information (MMI) and factorized adaptation were proposed. Noise robust Goodness of Pronunciation (GOP) algorithms using lattice free MMI were implemented to improve the reliability of automatic scoring for L2 speaking tests through better utilizing sequential training power of lattice free MMI models. Factorized adaptation for DNN acoustic models was introduced to further improve performances of the proposed GOP scores in real speaking test environments by categorizing factors that cause mismatches between acoustic models and test data. Experimental results show that the proposed methods are noise robust and outperform conventional methods in assessment for speaking tests in real classroom environments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  1. Cheng J (2011) Automatic assessment of prosody in high-stakes english tests. In: Twelfth annual conference of the international speech communication association

  2. Luo D, Gu W, Luo R, Wang L (2016) Investigation of the effects of automatic scoring technology on human raters’ performances in L2 speech proficiency assessment. In: 10th international symposium on chinese spoken language processing, ISCSLP 2016, October 17-20, 2016. IEEE, Tianjin, China, pp 1–5

  3. Kanters S, Cucchiarini C, Strik H (2009) The goodness of pronunciation algorithm: a detailed performance study

  4. Sudhakara S, Ramanathi MK, Yarra C, Ghosh PK (2019) An improved goodness of pronunciation (gop) measure for pronunciation evaluation with dnn-hmm system considering hmm transition probabilities. In: INTERSPEECH, pp 954–958

  5. Zheng J, Huang C, Chu M, Soong FK, Ye W-P (2007) Generalized segment posterior probability for automatic mandarin pronunciation evaluation. In: 2007 IEEE international conference on acoustics, speech and signal processing-ICASSP’07, vol 4, IEEE, pp IV–201

  6. van Doremalen JJHC, Cucchiarini C, Strik H (2010) Using non-native error patterns to improve pronunciation verification

  7. Luo D, Shimomura N, Minematsu N, Yamauchi Y, Hirose K (2008) Automatic pronunciation evaluation of language learners’ utterances generated through shadowing. In: Ninth annual conference of the international speech communication association

  8. Hu W, Qian Y, Soong FK (2013) A new dnn-based high quality pronunciation evaluation for computer-aided language learning (call). In: Interspeech, pp 1886–1890

  9. Li K, Qian X, Meng H (2016) Mispronunciation detection and diagnosis in l2 english speech using multidistribution deep neural networks. IEEE/ACM Transactions on Audio Speech and Language Processing 25(1):193–207

    Article  Google Scholar 

  10. Luo D, Qiao Y, Minematsu N, Yamauchi Y, Hirose K (2009) Analysis and utilization of mllr speaker adaptation technique for learners’ pronunciation evaluation. In: Tenth annual conference of the international speech communication association

  11. Luo D, Qiao Y, Minematsu N, Yamauchi Y, Hirose K (2010) Regularized-mllr speaker adaptation for computer-assisted language learning system. In: Eleventh annual conference of the international speech communication association

  12. Luo D, Guan M, Xia L (2020) Automatic scoring of l2 english speech based on dnn acoustic models with lattice-free mmi. In: International conference on machine learning and intelligent communications. Springer, pp 113–122

  13. Witt SM, Young SJ (2000) Phone-level pronunciation scoring and assessment for interactive language learning. Speech Communication 30(2–3):95–108

    Article  Google Scholar 

  14. Li L, Zhao Y, Jiang D, Zhang Y, Wang F, Gonzalez I, Valentin E, Sahli H (2013) Hybrid deep neural network–hidden markov model (dnn-hmm) based speech emotion recognition. In: 2013 humaine association conference on affective computing and intelligent interaction. IEEE, pp 312–317

  15. Ravanelli M, Omologo M (2017) Contaminated speech training methods for robust dnn-hmm distant speech recognition. arXiv:1710.03538

  16. Bahl L, Brown P, De Souza P, Mercer R (1986) Maximum mutual information estimation of hidden markov model parameters for speech recognition. In: ICASSP’86. IEEE international conference on acoustics, speech, and signal processing, vol 11. IEEE, pp 49–52

  17. Povey D, Peddinti V, Galvez D, Ghahremani P, Manohar V, Na X, Wang Y, Khudanpur S (2016) Purely sequence-trained neural networks for asr based on lattice-free mmi. In: Interspeech, pp 2751–2755

  18. Yu S-Z, Kobayashi H (2003) An efficient forward-backward algorithm for an explicit-duration hidden markov model. IEEE Signal Processing Letters 10(1):11–14

    Article  Google Scholar 

  19. Fainberg J, Renals S, Bell P (2017) Factorised representations for neural network adaptation to diverse acoustic environments. In: INTERSPEECH, pp 749–753

  20. Povey D, Ghoshal A, Boulianne G, Burget L, Glembek O, Goel N, Hannemann M, Motlicek P, Qian Y, Schwarz P, et al (2011) The kaldi speech recognition toolkit. In: IEEE 2011 workshop on automatic speech recognition and understanding, number CONF. IEEE Signal Processing Society

  21. Panayotov V, Chen G, Povey D, Khudanpur S (2015) Librispeech: an asr corpus based on public domain audio books. In: 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 5206–5210

  22. Liu X, Zhang X (2019) Noma-based resource allocation for cluster-based cognitive industrial internet of things. IEEE Transactions on Industrial Informatics 16(8):5379–5388

    Article  Google Scholar 

  23. Liu X, Zhai XB, Lu W, Wu C (2019) Qos-guarantee resource allocation for multibeam satellite industrial internet of things with noma. IEEE Transactions on Industrial Informatics 17(3):2052–2061

    Article  Google Scholar 

  24. Liu X, Zhang X (2018) Rate and energy efficiency improvements for 5g-based iot with simultaneous transfer. IEEE Internet of Things Journal 6(4):5971–5980

    Article  Google Scholar 

  25. Liu X, Zhang X, Jia M, Fan L, Lu W, Zhai X (2018) 5g-based green broadband communication system design with simultaneous wireless information and power transfer. Physical Communication 28:130–137

    Article  Google Scholar 

  26. Li F, Lam K-Y, Liu X, Wang J, Zhao K, Wang L (2017) Joint pricing and power allocation for multibeam satellite systems with dynamic game model. IEEE Transactions on Vehicular Technology 67(3):2398–2408

    Article  Google Scholar 

Download references

Acknowledgements

This work is supported by Department of Education of Guangdong Province (Number: 2020KTSCX301).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dean Luo.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Luo, D., Xia, L. & Guan, M. Noise Robust Automatic Scoring Based on Deep Neural Network Acoustic Models with Lattice-Free MMI and Factorized Adaptation. Mobile Netw Appl 27, 1604–1611 (2022). https://doi.org/10.1007/s11036-021-01878-3

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11036-021-01878-3

Keywords

Navigation