Merging of Native and Non-native Speech for Low-resource Accented ASR

  • Sarah Samson Juan
  • Laurent Besacier
  • Benjamin Lecouteux
  • Tien-Ping Tan
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9449)


This paper presents our recent study on low-resource automatic speech recognition (ASR) system with accented speech. We propose multi-accent Subspace Gaussian Mixture Models (SGMM) and accent-specific Deep Neural Networks (DNN) for improving non-native ASR performance. In the SGMM framework, we present an original language weighting strategy to merge the globally shared parameters of two models based on native and non-native speech respectively. In the DNN framework, a native deep neural net is fine-tuned to non-native speech. Over the non-native baseline, we achieved relative improvement of 15 % for multi-accent SGMM and 34 % for accent-specific DNN with speaker adaptation.


Automatic speech recognition Cross-lingual acoustic modelling Non-native speech Low-resource system Multi-accent SGMM Accent-specific DNN 


  1. 1.
    Arslan, M.J., Hansen, J.L.: A study of the temporal features and frequency characteristics in american english foreign accent. J. Acoust. Soc. 102(1), 28–40 (1996)CrossRefGoogle Scholar
  2. 2.
    Bouselmi, G., Fohr, D., Haton, J.P.: Fully automated non-native speech recognition using confusion-based acoustic model intergration. In: Proceedings of Eurospeech, Lisboa, pp. 1369–1372 (2005)Google Scholar
  3. 3.
    Chen, X., Cheng, J.: Deep neural network acoustic modeling for native and non-native mandarin speech recognition. In: Proceedings of International Symposium on Chinese Spoken Language Processing (2014)Google Scholar
  4. 4.
    Goronzy, S. (ed.): Robust Adaptation to Non-Native Accents in Automatic Speech Recognition. LNCS (LNAI), vol. 2560. Springer, Heidelberg (2002) zbMATHGoogle Scholar
  5. 5.
    Heigold, G., Vanhoucke, V., Senior, A., Nguyen, P., Ranzato, M., Devin, M., Dean, J.: Multilingual acoustic models using distributed deep neural networks. In: Proceedings of ICASSP (2013)Google Scholar
  6. 6.
    Hinton, G., Deng, L., Yu, D., Mohamed, A.R., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Dahl, T.S.G., Kingsbury, B.: Deep neural networks for acoustic modeling in speech recognition. IEEE Sig. Process. Mag. 29(6), 82–97 (2012)CrossRefGoogle Scholar
  7. 7.
    Hinton, G.E.: A practical guide to training restricted boltzmann machines. UTML Technical report 2010–003, Department of Computer Science, University of Toronto (2010)Google Scholar
  8. 8.
    Huang, C., Chang, E., Zhou, J., Lee, K.F.: Accent modeling based on pronunciation dictionary adaptation for large vocabulary mandarin speech recognition. In: Proceedings of the ICLSP, vol. 2, pp. 818–821 (2000)Google Scholar
  9. 9.
    Huang, J.T., Li, J., Yu, D., Deng, L., Gong, Y.: Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers. In: Proceedings of ICASSP (2013)Google Scholar
  10. 10.
    Huang, Y., Yu, D., Liu, C., Gong, Y.: Multi-accent deep neural network acoustic model with accent-specific top layer using the KLD-regularized model adaptation. In: Proceedings of Interspeech (2014)Google Scholar
  11. 11.
    Imseng, D., Motlicek, P., Bourlard, H., Garner, P.N.: Using out-of-language data to improve under-resourced speech recognizer. Speech Commun. 56, 142–151 (2014)CrossRefGoogle Scholar
  12. 12.
    Kingsbury, B.: Lattice-based optimization of sequence classification criteria for neural network acoustic modeling. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 3761–3764, April 2009Google Scholar
  13. 13.
    Lu, L., Ghoshal, A., Renals, S.: Cross-lingual subspace gaussian mixture models for low-resource speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 22, 17–27 (2014)CrossRefGoogle Scholar
  14. 14.
    Miao, Y., Metze, F.: Improving low-resource CD-DNN-HMM using dropout and multilingual DNN training. In: Proceedings of INTERSPEECH, pp. 2237–2241 (2013)Google Scholar
  15. 15.
    Mohan, A., Ghalehjegh, S.H., Rose, R.C.: Dealing with acoustic mismatch for training multlingual subspace gaussian mixture models for speech recognition. In: Proceedings of ICASSP, pp. 4893–4896. IEEE, Kyoto, March 2012Google Scholar
  16. 16.
    Morgan, J.J.: Making a speech recognizer tolerate non-native speech through gaussian mixture merging. In: Proceedings of ICALL 2004, Venice (2004)Google Scholar
  17. 17.
    Povey, D., Burget, L., Agarwal, M., Akyazi, P., Feng, K., Ghoshal, A., Glembek, O., Goel, N., Karafiat, M., Rastrow, A., Rose, R.C., Schwarz, P., Thomas, S.: Subspace gaussian mixture models for speech recognition. In: Proceedings of ICASSP (2010)Google Scholar
  18. 18.
    Povey, D., Burget, L., Agarwal, M., Akyazi, P., Kai, F., Ghoshal, A., Glembek, O., Karafiàt, N.G.M., Rastrow, A., Rose, R.C., Schwartz, P., Thomas, S.: The subspace gaussian mixture model - a structured model for speech recognition. Comput. Speech Lang. 25, 404–439 (2011)CrossRefGoogle Scholar
  19. 19.
    Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlíc̆ek, P., Schwarz, P., Silovskỳ, J., Stemmer, G., Veselỳ, K.: The kaldi speech recognition toolkit. In: Society, I.S.P. (ed.) Proceedings of Workshop on Automatic Speech Recognition and Understanding, IEEE Catalog No.: CFP11SRW-USB, December 2011Google Scholar
  20. 20.
    Rousseau, A., Deléglise, P., Estève, Y.: TED-LIUM: an automatic speech recognition dedicated corpus. In: Proceedings of LREC, pp. 125–129. European Language Resources Association (ELRA) (2012)Google Scholar
  21. 21.
    Swietojanski, P., Ghoshal, A., Renals, S.: Unsupervised cross-lingual knowledge transfer in DNN-based LVCSR. In: Proceedings of ICASSP (2013)Google Scholar
  22. 22.
    Tan, T.P., Besacier, L.: Acoustic model interpolation for non-native speech recognition. In: Proceedings of ICASSP (2007)Google Scholar
  23. 23.
    Tan, T.P., Besacier, L., Lecouteux, B.: Acoustic model merging using acoustic models from multilingual speakers for automatic speech recognition. In: Proceedings of International Conference on Asian Language Processing (IALP) (2014)Google Scholar
  24. 24.
    Tong, R., Lim, B.P., Chen, N.F., Ma, B., Li, H.: Subspace gaussian mixture models for computer-assisted language learning. In: Proceedings of ICASSP, pp. 5347–5351. IEEE (2014)Google Scholar
  25. 25.
    Vu, N.T., Imseng, D., Povey, D., Motlíc̆ek, P., Schultz, T., Bourlard, H.: Multilingual deep neural network based acoustic modeling for rapid language adaptation. In: Proceedings of ICASSP (2014)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Sarah Samson Juan
    • 1
  • Laurent Besacier
    • 2
  • Benjamin Lecouteux
    • 2
  • Tien-Ping Tan
    • 3
  1. 1.Faculty of Computer Science and Information TechnologyUniversiti Malaysia SarawakKota SamarahanMalaysia
  2. 2.Grenoble Informatics Laboratory (LIG)University Grenoble-AlpesGrenobleFrance
  3. 3.School of Computer ScienceUniversiti Sains MalaysiaGelugorMalaysia

Personalised recommendations