Advertisement

Evaluation of English Speech Recognition for Japanese Learners Using DNN-Based Acoustic Models

  • Jiang Fu
  • Yuya Chiba
  • Takashi Nose
  • Akinori Ito
Conference paper
Part of the Smart Innovation, Systems and Technologies book series (SIST, volume 110)

Abstract

Regarding the assistance of computer-assisted language learning (CALL) systems to make foreign language learning easier, it is necessary to recognize the utterances of the learner with high accuracy. The quality of CALL systems mainly depends on the accuracy of automatic speech recognition (ASR). However, since the pronunciation of non-native speakers is greatly different from that of native speakers, existing ASR system cannot well recognize speech accurately. To solve this problem, this research projects an acoustic model based on deep neural networks (DNN), which is trained by using ERJ (English Read by Japanese) database collected from 202 Japanese learners. Compared with traditional ASR systems, this new system significantly promotes the speech recognition accuracy.

Keywords

Speech recognition Deep neural networks ERJ database CALL 

Notes

Acknowledgements

This work was supported by JSPS KAKENHI Grant Number JP17H00823.

References

  1. 1.
    Lee, S., Noh, H., Lee, J., Lee, K., Lee, G.G.: POSTECH approaches for dialog-based English conversation tutoring. In: Proceedings APSIPA ASC, pp. 794–803 (2010)Google Scholar
  2. 2.
    Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N., et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process. Mag. 29(6), 82–97 (2012)CrossRefGoogle Scholar
  3. 3.
    Raux, A., Eskenazi, M.: Using task-oriented spoken dialogue systems for language learning: potential, practical applications and challenges. In: Proceedings InSTIL/ICALL Symposium, pp. 147–150 (2004)Google Scholar
  4. 4.
    Witt, S., Young, S.J.: Language learning based on non-native speech recognition. In: Proceedings EUROSPEECH, pp. 633–636 (1997)Google Scholar
  5. 5.
    Minematsu, N., Kurata, G., Hirose, K.: Integration of MLLR adaptation with pronunciation proficiency adaptation for non-native speech recognition. In: Proceedings ICSLP, pp. 529–531 (2002)Google Scholar
  6. 6.
    Wang, Z., Schultz, T., Waibel, A.: Comparison of acoustic model adaptation techniques on non-native speech. In: Proceedings ICASSP, pp. 540–543 (2003)Google Scholar
  7. 7.
    Oh, Y.R., Yoon, J.S., Kim, H.K.: Acoustic model adaptation based on pronunciation variability analysis for non-native speech recognition. Speech Commun. 49(1), 59–70 (2007)CrossRefGoogle Scholar
  8. 8.
    Tan, T.P., Besacier, L.: Acoustic model interpolation for non-native speech recognition. In: Proceedings ICASSP, pp. 1009–1012 (2007)Google Scholar
  9. 9.
    Van Doremalen, J., Cucchiarini, C., Strik, H.: Optimizing automatic speech recognition for low-proficient non-native speakers. EURASIP EURASIP J. Audio, Speech, Music. Process. 2010(1), 973–954 (2010)Google Scholar
  10. 10.
    Wang, X., Yamamoto, S.: Second language speech recognition using multiple-pass decoding with lexicon represented by multiple reduced phoneme sets. In: Proceedings INTERSPEECH, pp. 1265–1269 (2015)Google Scholar
  11. 11.
    Chen, X., Cheng, J.: Deep neural network acoustic modeling for native and non-native Mandarin speech recognition. In: Proceedings ISCSLP, pp. 6–9 (2014)Google Scholar
  12. 12.
    Cheng, J., Chen, X., Metallinou, A.: Deep neural network acoustic models for spoken assessment applications. Speech Commun. 73, 14–27 (2015)CrossRefGoogle Scholar
  13. 13.
    Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P., et al.: The Kaldi speech recognition toolkit. In: IEEE Workshop on Automatic Speech Recognition and Understanding (2011)Google Scholar
  14. 14.
    Zhang, X., Trmal, J., Povey, D., Khudanpur, S.: Improving deep neural network acoustic models using generalized maxout networks. In: Proceedings ICASSP, pp. 215–219 (2014)Google Scholar
  15. 15.
    Makino, T., Aoki, R.: English read by Japanese phonetic corpus: an interim report. Res. Lang. 10(1), 79–95 (2012)CrossRefGoogle Scholar
  16. 16.
    Minematsu, N., Okabe, K., Ogaki, K., Hirose, K.: Measurement of objective intelligibility of Japanese accented English using ERJ (English Read by Japanese) database. In: Proceedings INTERSPEECH, pp. 1481–1484 (2011)Google Scholar
  17. 17.
    Luo, D., Qiao, Y., Minematsu, N., Yamauchi, Y., Hirose, K.: Regularized-MLLR speaker adaptation for computer-assisted language learning system. In: Proceedings INTERSPEECH, pp. 594–597 (2010)Google Scholar
  18. 18.
    Ito, A., Tsutsui, R., Makino, S., Suzuki, M.: Recognition of english utterances with grammatical and lexical mistakes for dialogue-based CALL system. In: Proceedings INTERSPEECH, pp. 2819–2822 (2008)Google Scholar
  19. 19.
    Wang, X., Kato, T., Yamamoto, S.: Phoneme set design based on integrated acoustic and linguistic features for second language speech recognition. IEICE Trans. Inf. Syst. 100(4), 857–864 (2017)CrossRefGoogle Scholar
  20. 20.
    Oshima, Y., Takamichi, S., Toda, T., Neubig, G., Sakti, S., Nakamura, S.: Non-native text-to-speech preserving speaker individuality based on partial correction of prosodic and phonetic characteristics. IEICE Trans. Inf. Syst. 99(12), 3132–3139 (2016)CrossRefGoogle Scholar
  21. 21.
    The CMU Pronouncing Dictionary. http://www.speech.cs.cmu.edu/cgi-bin/cmudict
  22. 22.
    Yoshioka, T., Chen, X., Gales, M.J.F.: Impact of single-microphone dereverberation on DNN-based meeting transcription systems. In: Proceedings ICASSP, pp. 5527–5531 (2014)Google Scholar
  23. 23.
    Mohamed, A., Hinton, G., Penn, G.: Understanding how deep belief networks perform acoustic modelling. In: Proceedings ICASSP, pp. 4273–4276 (2012)Google Scholar
  24. 24.
    Pan, J., Liu, C., Wang, Z., Hu, Y., Jiang, H.: Investigation of deep neural networks (DNN) for large vocabulary continuous speech recognition: why DNN surpasses GMMs in acoustic modeling. In: Proceedings ISCSLP, pp. 301–305 (2012)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Jiang Fu
    • 1
  • Yuya Chiba
    • 1
  • Takashi Nose
    • 1
  • Akinori Ito
    • 1
  1. 1.Graduate School of EngineeringTohoku UniversitySendaiJapan

Personalised recommendations