Domain adaptation of lattice-free MMI based TDNN models for speech recognition

Long, Yanhua; Li, Yijie; Ye, Hone; Mao, Hongwei

doi:10.1007/s10772-017-9399-z

Domain adaptation of lattice-free MMI based TDNN models for speech recognition

Published: 01 February 2017

Volume 20, pages 171–178, (2017)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Yanhua Long¹,
Yijie Li²,
Hone Ye¹ &
…
Hongwei Mao¹

532 Accesses
7 Citations
Explore all metrics

Abstract

The recent proposed time-delay deep neural network (TDNN) acoustic models trained with lattice-free maximum mutual information (LF-MMI) criterion have been shown to give significant performance improvements over other deep neural network (DNN) models in variety speech recognition tasks. Meanwhile, the Kullback–Leibler divergence (KLD) regularization has been validated as an effective adaptation method for DNN acoustic models. However, to our best knowledge, no work has been reported on investigating whether the KLD-based method is also effective for LF-MMI based TDNN models, especially for the domain adaptation. In this study, we generalized the KLD regularized model adaptation to train domain-specific TDNN acoustic models. A few distinct and important observations have been obtained. Experiments were performed on the Cantonese accent, in-car and far-field noise Mandarin speech recognition tasks. Results demonstrated that the proposed domain adapted models can achieve around relative 7–29% word error rate reduction on these tasks, even when the adaptation utterances are only around 1 K.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Speech dereverberation for enhancement and recognition using dynamic features constrained deep neural networks and feature adaptation

Article Open access 13 January 2016

Experimenting with Hybrid TDNN/HMM Acoustic Models for Russian Speech Recognition

Investigating Lattice-Free Acoustic Modeling for Children Automatic Speech Recognition in Low-Resource Settings Under Mismatched Conditions

Article 23 April 2024

References

Bell, P., Gales, M., Lanchantin, P., Liu, X., Long, Y., Renals, S., et al. (2012). Transcription of multi-genre media archives using out-of-domain data. In Proceedings of Workshop on Spoken Language Technology, IEEE (pp. 324–329).
Christensen, H., Aniol, M. B., Bell, P., Green, P., Hain, T., King, S., et al. (2013). Combining in-domain and out-of-domain speech data for automatic recognition of disordered speech. In Proceedings of Interspeech, ISCA (pp. 3642–3645).
Fainberg, J., Bell, P., Lincoln, M., & Renals, S. (2016). Improving children’s speech recognition through out-of-domain data augmentation. In Proceedings of Interspeech, ISCA (pp. 1598–1602).
Gauvain, J., & Lee, C. (1992). MAP estimation of continuous density HMM: Theory and applications. In Proceedings of Workshop on Speech and Natural Language, Association for Computational Linguistics (pp. 185–190).
Huang, Y., Yu, D., Liu, C., & Gong, Y. (2014). Multi-accent deep neural network acoustic model with accent-specific top layer using the KLD-regularized model adaptation. In Proceedings of Interspeech, ISCA (pp. 2977–2981).
Huang, Z., Tang, J., Xue, S., & Dai, L. (2016). Speaker adaptation of RNN-BLSTM for speech recognition based on speaker code. In Proceedings of ICASSP, IEEE (pp. 5305–5309).
Legetter, c, & Woodland, P. (1995). Maximum likelihood linear regression for speaker adaptation of continuous density Hidden Markov models. Computer Speech and Language, 9, 171–185.
Article Google Scholar
Mirsamadi, S., & Hansen, J. (2015). A study on deep neural network acoustic model adaptation for robust far-field speech recognition. In Proceedings of Interspeech, ISCA (pp. 2430–2434).
Peddinti, V., Povey, D., & Khudanpur, S. (2015). A time delay neural network architecture for different modeling of long temporal contexts. In Proceedings of Interspeech, ISCA (pp. 3214–3218).
Povey, D., (2005). Discriminative training for large vocabulary speech recognition. PhD dissertation, Cambridge University.
Povey, D., (2016). Kaldi code repository. Retrieved from https://github.com/kaldi-asr/kaldi.
Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., et. al. (2011). The Kaldi speech recognition toolkit. In Proceedings of ASRU, IEEE (pp. No. EPFL–CONF–192584).
Povey, D., Peddinti, V., Galvez, D., Ghahrmani, P., Manohar, V., Na, X., et al. (2016). Purely sequence-trained neural networks for ASR based on lattice-free MMI. In Proceedings of Interspeech, ISCA (pp. 2751–2755).
Qian, Y., Tan, T., Yu, D., & Zhang, Y. (2016). Integrated adaptation with multi-factor joint-learning for far-field speech recognition. In Proceedings of ICASSP, IEEE (pp. 5770–5774).
Sak, H., Senior, A., Rao, K., & Beaufays, F. (2015). Fast and accurate recurrent neural network acoustic models for speech recognition. In Proceedings of Interspeech, ISCA (pp. 1468–1472).
Saon, G., Soltau, H., Nahamoo, D., & Picheny, M. (2013). Speaker adaptation of neural network acoustic models using i-vectors. In Proceedings of ASRU, Olomouc (pp. 55–59).
Senior, A., & Lopez-Moreno, I. (2014). Improving DNN speaker independence with i-vector inputs. In Proceedings of ICASSP, IEEE (pp. 225–229).
Senior, A., Sak, H., de Chaumont Quitry, F., Sainath, T., & Rao, K. (2015). Acoustic modeling with CD-CTC-SMBR LSTM RNNs. In Proceedings of ASRU, IEEE (pp. 604–609).
Toth, L., & Gosztolya, G. (2016). Adaptation of DNN acoustic models using KL-divergence regularization and multi-task training, In Proceedings of SPECOM. (pp. 108–115).
Xue, S., Abdel-Hamid, O., Jiang, H., Dai, L., & Liu, Q. (2014). Fast adaptation of deep neural network based on discriminant codes for speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22, 1713–1725.
Article Google Scholar
Yu, D., & Deng, L. (2014). Automatic speech recognition: A deep learning approach (1st ed.). New York: Springer.
MATH Google Scholar
Yu, D., Yao, K., Su, H., Li, G., & Seide, F. (2013). KL-divergence regularized deep neural network adaptation for improved large vocabulary speech recognition. In Proceedings of ICASSP, IEEE (pp. 7893–7897).

Download references

Acknowledgements

This work was funded by the Shanghai Science and Technology Development Funds (Grant No.14YF1409300), and the Research Foundation of Young Teachers Program in Universities of Shanghai (Grant No. ZZshsf14026). Thanks to Beijing Unisound Information Technology Co., Ltd (http://www.unisound.com/) for providing the data sets of system training and test.

Author information

Authors and Affiliations

Department of Electronical and Information Engineering, Shanghai Normal University, Shanghai, 200234, China
Yanhua Long, Hone Ye & Hongwei Mao
Beijing Unisound Information Technology Co., Ltd., Beijing, 100191, China
Yijie Li

Authors

Yanhua Long
View author publications
You can also search for this author in PubMed Google Scholar
Yijie Li
View author publications
You can also search for this author in PubMed Google Scholar
Hone Ye
View author publications
You can also search for this author in PubMed Google Scholar
Hongwei Mao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yanhua Long.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Long, Y., Li, Y., Ye, H. et al. Domain adaptation of lattice-free MMI based TDNN models for speech recognition. Int J Speech Technol 20, 171–178 (2017). https://doi.org/10.1007/s10772-017-9399-z

Download citation

Received: 06 October 2016
Accepted: 10 January 2017
Published: 01 February 2017
Issue Date: March 2017
DOI: https://doi.org/10.1007/s10772-017-9399-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Domain adaptation of lattice-free MMI based TDNN models for speech recognition

Abstract

Access this article

Similar content being viewed by others

Speech dereverberation for enhancement and recognition using dynamic features constrained deep neural networks and feature adaptation

Experimenting with Hybrid TDNN/HMM Acoustic Models for Russian Speech Recognition

Investigating Lattice-Free Acoustic Modeling for Children Automatic Speech Recognition in Low-Resource Settings Under Mismatched Conditions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Domain adaptation of lattice-free MMI based TDNN models for speech recognition

Abstract

Access this article

Similar content being viewed by others

Speech dereverberation for enhancement and recognition using dynamic features constrained deep neural networks and feature adaptation

Experimenting with Hybrid TDNN/HMM Acoustic Models for Russian Speech Recognition

Investigating Lattice-Free Acoustic Modeling for Children Automatic Speech Recognition in Low-Resource Settings Under Mismatched Conditions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation