Deep Neural Network-Based Noise Estimation for Robust ASR in Dual-Microphone Smartphones

López-Espejo, Iván; Peinado, Antonio M.; Gomez, Angel M.; Martín-Doñas, Juan M.

doi:10.1007/978-3-319-49169-1_12

Iván López-Espejo²¹,
Antonio M. Peinado²¹,
Angel M. Gomez²¹ &
…
Juan M. Martín-Doñas²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10077))

Included in the following conference series:

International Conference on Advances in Speech and Language Technologies for Iberian Languages

783 Accesses
5 Citations
3 Altmetric

Abstract

The performance of many noise-robust automatic speech recognition (ASR) methods, such as vector Taylor series (VTS) feature compensation, heavily depends on an estimation of the noise that contaminates speech. Therefore, providing accurate noise estimates for this kind of methods is crucial as well as a challenge. In this paper we investigate the use of deep neural networks (DNNs) to perform noise estimation in dual-microphone smartphones. Thanks to the powerful regression capabilities of DNNs, accurate noise estimates can be obtained by just using simple features as well as exploiting the power level difference (PLD) between the two microphones of the smartphone when employed in close-talk conditions. This is confirmed by our word recognition results on the AURORA2-2C (AURORA2 - 2 Channels - Conversational Position) database by largely outperforming single- and dual-channel noise estimation algorithms from the state-of-the-art when used together with a VTS feature compensation method.

I. López-Espejo et al. This work has been supported by the Spanish MINECO TEC2013-46690-P project.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Li, J., Deng, L., Gong, Y., Haeb-Umbach, R.: An overview of noise-robust automatic speech recognition. IEEE Trans. Audio Speech Lang. Process. 22(4), 745–777 (2014)
Article Google Scholar
Moreno, P.J., et al.: A vector Taylor series approach for environment-independent speech recognition. In: ICASSP, Atlanta, USA, pp. 733–736 (1996)
Google Scholar
Wu, J., Droppo, J., Deng, L., Acero, A.: A noise-robust ASR front-end using Wiener filter constructed from MMSE estimation of clean speech and noise. In: ASRU, Virgin Islands, pp. 321–326 (2003)
Google Scholar
Rangachari, S., Loizou, P.C.: A noise-estimation algorithm for highly non-stationary environments. Speech Commun. 48, 220–231 (2006)
Article Google Scholar
Cohen, I.: Noise spectrum estimation in adverse environments: IMCRA. IEEE Trans. Speech Audio Process. 11, 466–475 (2003)
Article Google Scholar
Martin, R.: Noise power spectral density estimation based on optimal smoothing and minimum statistics. IEEE Trans. Speech Audio Process. 9, 504–512 (2001)
Article Google Scholar
Hendriks, R.C., Heusdens, R., Jensen, J.: MMSE based noise PSD tracking with low complexity. In: ICASSP, Dallas, USA (2010)
Google Scholar
Jeub, M., et al.: Noise reduction for dual-microphone mobile phones exploiting power level differences. In: ICASSP, Kyoto, Japan, pp. 1693–1696 (2012)
Google Scholar
Zhang, J., et al.: A fast two-microphone noise reduction algorithm based on power level ratio for mobile phone. In: ISCSLP, Hong-Kong, pp. 206–209 (2012)
Google Scholar
López-Espejo, I., et al.: Feature enhancement for robust speech recognition on smartphones with dual-microphone. In: EUSIPCO, Lisbon, Portugal (2014)
Google Scholar
López-Espejo, I., González, J.A., Gómez, Á.M., Peinado, A.M.: A deep neural network approach for missing-data mask estimation on dual-microphone smartphones: application to noise-robust speech recognition. In: Navarro Mesa, J.L., Ortega, A., Teixeira, A., Hernández Pérez, E., Quintana Morales, P., Ravelo García, A., Guerra Moreno, I., Toledano, D.T. (eds.) IberSPEECH 2014. LNCS (LNAI), vol. 8854, pp. 119–128. Springer, Heidelberg (2014). doi:10.1007/978-3-319-13623-3_13
Google Scholar
Wang, Y., Wang, D.L.: Towards scaling up classication-based speech separation. IEEE Trans. Audio Speech Lang. Process. 21(7), 1381–1390 (2013)
Article Google Scholar
Vincent, E.: Is audio signal processing still useful in the era of machine learning? In: WASPAA, New York, USA (2015)
Google Scholar
Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Comput. 18, 1527–1554 (2006)
Article MathSciNet MATH Google Scholar
Hinton, G.E., Salakhutdinov, R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)
Article MathSciNet MATH Google Scholar
Seltzer, M.L., et al.: An investigation of deep neural networks for noise robust speech recognition. In: ICASSP, Vancouver, Canada, pp. 7398–7402 (2013)
Google Scholar
Xu, Y., Du, J., Dai, L.R.: A regression approach to speech enhancement based on deep neural networks. IEEE Trans. Audio Speech Lang. Process. 23(1), 7–19 (2015)
Article Google Scholar
Segura, J.C., et al.: Model-based compensation of the additive noise for continuous speech recognition. Experiments using the AURORA II database and tasks. In: EUROSPEECH, Aalborg, Denmark (2001)
Google Scholar
Pearce, D., Hirsch, H.G.: The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. In: ICSLP, Beijing, China (2000)
Google Scholar
ETSI ES 201 108 - Distributed speech recognition; Front-end feature extraction algorithm; Compression algorithms
Google Scholar
Hinton, G.E.: A practical guide to training restricted Boltzmann machines. UTML TR 2010–003 (2010)
Google Scholar
Theano Library. http://deeplearning.net/software/theano/

Download references

Author information

Authors and Affiliations

Department of Signal Theory, Telematics and Communications, University of Granada, Granada, Spain
Iván López-Espejo, Antonio M. Peinado, Angel M. Gomez & Juan M. Martín-Doñas

Authors

Iván López-Espejo
View author publications
You can also search for this author in PubMed Google Scholar
Antonio M. Peinado
View author publications
You can also search for this author in PubMed Google Scholar
Angel M. Gomez
View author publications
You can also search for this author in PubMed Google Scholar
Juan M. Martín-Doñas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Iván López-Espejo .

Editor information

Editors and Affiliations

INESC-ID/IST, Universidade de Lisboa, Lisbon, Portugal
Alberto Abad
I3A/University of Zaragoza, Zaragoza, Spain
Alfonso Ortega
DETI/IEETA, University of Aveiro, Aveiro, Portugal
António Teixeira
AtlantTIC Research Center, Universidad de Vigo, Vigo, Spain
Carmen García Mateo
Universitat Politècnica de València, Valencia, Spain
Carlos D. Martínez Hinarejos
University of Coimbra, Coimbra, Portugal
Fernando Perdigão
INESC-ID/ISCTE-IUL, Lisbon, Portugal
Fernando Batista
INESC-ID/IST, Universidade de Lisboa, Lisbon, Portugal
Nuno Mamede

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

López-Espejo, I., Peinado, A.M., Gomez, A.M., Martín-Doñas, J.M. (2016). Deep Neural Network-Based Noise Estimation for Robust ASR in Dual-Microphone Smartphones. In: Abad, A., et al. Advances in Speech and Language Technologies for Iberian Languages. IberSPEECH 2016. Lecture Notes in Computer Science(), vol 10077. Springer, Cham. https://doi.org/10.1007/978-3-319-49169-1_12

Download citation

DOI: https://doi.org/10.1007/978-3-319-49169-1_12
Published: 04 November 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49168-4
Online ISBN: 978-3-319-49169-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics