Cochlea-inspired speech recognition interface

Russo, Mladen; Stella, Maja; Sikora, Marjan; Šarić, Matko

doi:10.1007/s11517-019-01963-6

Cochlea-inspired speech recognition interface

Original Article
Published: 04 March 2019

Volume 57, pages 1393–1403, (2019)
Cite this article

Medical & Biological Engineering & Computing Aims and scope Submit manuscript

Mladen Russo ORCID: orcid.org/0000-0002-9363-6723¹,
Maja Stella¹,
Marjan Sikora¹ &
…
Matko Šarić¹

457 Accesses
Explore all metrics

Abstract

Automatic speech recognition (ASR) technology provides a natural interface for human-machine interaction. Typical ASR systems can achieve high performance in quiet environments but, unlike humans, perform poorly in real-world situations. To better simulate the human auditory periphery and improve the performance in realistic noisy scenarios, we propose two models of speech recognition front-ends based on a biophysical cochlear model. The first front-end is based on the method of signal reconstruction from a basilar membrane response. When applied to noisy speech, this method results in improved signal quality. This method can be used as a preprocessing step in a standard ASR system and can also be used as a noise reduction technique for other applications. The second front-end we propose is based on the construction of speech recognition coefficients directly from a basilar membrane response. Experimental results using a continuous-density hidden Markov model (HMM) recognizer demonstrate significant improvement in performance compared to standard Mel-frequency cepstral coefficients (MFCC) in various types of noisy conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comprehensive survey on automatic speech recognition using neural networks

Article 15 August 2023

PSYCHOACOUSTICS-WEB: A free online tool for the estimation of auditory thresholds

Article Open access 06 May 2024

A Review on Sound Source Localization Systems

Article 05 May 2022

References

Muhammad G (2015) Automatic speech recognition using interlaced derivative pattern for cloud based healthcare system. Clust Comput 18(2):795
Article Google Scholar
Lee S, Kang S, Han DK, Ko H (2016) Dialogue enabling speech-to-text user assistive agent system for hearing-impaired person. Med Biol Eng Comput 54(6):915
Article Google Scholar
Jain V et al (2008) An expert system for predicting the effects of speech interference due to noise pollution on humans using fuzzy approach. Expert Syst Appl 35(4):1978
Article Google Scholar
Mporas I, Kocsis O, Ganchev T, Fakotakis N (2010) Robust speech interaction in motorcycle environment. Expert Syst Appl 37(3):1827
Article Google Scholar
Gong Y (1995) Speech recognition in noisy environments: a survey. Speech Commun 16(3):261. https://doi.org/10.1016/0167-6393(94)00059-J
Article Google Scholar
Li J, Deng L, Gong Y, Haeb-Umbach R (2014) An overview of noise-robust automatic speech recognition. IEEE/ACM Trans Audio Speech, Lang Process 22(4):745
Article Google Scholar
Lippmann RP (1997) Speech recognition by machines and humans. Speech Commun 22(1):1
Article Google Scholar
Stolcke A, Droppo J (2017) Comparing human and machine errors in conversational speech transcription. In: INTERSPEECH (2017)
Tchorz J, Kollmeier B (1999) A model of auditory perception as front end for automatic speech recognition. J Acoust Soc Amer 106(4):2040
Article CAS Google Scholar
Hermansky H (1990) Perceptual linear predictive (PLP) analysis of speech. J Acoust Soc Amer 87(4):1738
Article CAS Google Scholar
Holmberg M, Gelbart D, Hemmert W (2006) Automatic speech recognition with an adaptation model motivated by auditory processing. IEEE Trans Audio, Speech, Lang Process 14(1):43
Article Google Scholar
Jankowski Jr CR, Vo HDH, Lippmann RP (1995) A comparison of signal processing front ends for automatic word recognition. IEEE Trans Speech Audio Process 3(4):286
Article Google Scholar
Seneff S (1986) A computational model for the peripheral auditory system: application of speech recognition research. In: Proceedings ICASSP’86, vol 11, pp 1983–1986 (1986)
Ghitza O (1994) Auditory models and human performance in tasks related to speech coding and speech recognition. IEEE Trans Speech Audio Process 2(1):115
Article Google Scholar
Kim DS, Lee SY, Kil R (1999) Auditory processing of speech signals for robust speech recognition in real-world noisy environments. IEEE Trans Speech Audio Process 7(1):55
Article Google Scholar
Moritz N, Anemüller J, Kollmeier B (2015) An auditory inspired amplitude modulation filter bank for robust feature extraction in automatic speech recognition. IEEE/ACM IEEE Trans Audio Speech Lang Process 23 (11):1926
Google Scholar
Stern RM, Morgan N (2012) Hearing is believing: biologically inspired methods for robust automatic speech recognition. IEEE Sig Process Mag 29(6):34
Article Google Scholar
Mammano F, Nobili R (1993) Biophysics of the cochlea: linear approximation. J Acoust Soc Amer 93 (6):3320
Article CAS Google Scholar
Nobili R, Mammano F (1996) Biophysics of the cochlea. II: stationary nonlinear phenomenology. J Acoust Soc Amer 99(4):2244
Article CAS Google Scholar
Munkong R, Juang BH (2008) Auditory perception and cognition. IEEE Signal Proc Mag 25(3):98. https://doi.org/10.1109/MSP.2008.918418
Article Google Scholar
Patterson R, Nimmo-Smith I, Holdsworth J, Rice P (1988) An efficient auditory filterbank based on the gammatone function. APU report 2341
van Netten SM, Duifhuis H (1983) Modelling an active, nonlinear cochlea. In: Boer ED, Viergever MA (eds) Mechanics of hearing. (Nijhoff/Delft Univ. Press, 1983), pp 143–151
Nobili R, Vetesnik A, Turicchia L, Mammano F (2003) Otoacoustic emissions from residual oscillations of the cochlear basilar membrane in a human ear model. J Assoc Res Otolaryngol 4(4):478
Article Google Scholar
Russo M, Rozic N, Stella M (2011) Biophysical cochlear model: time-frequency analysis and signal reconstruction. Acta Acustica united with Acustica 97(4):632
Article Google Scholar
ITU-T. Rec. P.862. Perceptual evaluation of speech quality (PESQ): an objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs (2001)
Robles L, Ruggero MA (2001) Mechanics of the mammalian cochlea. Physiol Rev 81(3):1305
Article CAS Google Scholar
Kim Y, Xin J, Qi Y (2006) A study of hearing aid gain functions based on a nonlinear nonlocal feedforward cochlea model. Hear Res 215(1-2):84
Article Google Scholar
Iwano K, Seki T, Furui S (2002) Noise robust speech recognition using F0 contour extracted by hough transform. In: INTERSPEECH (2002)
Gu L, Rose K (2001) Perceptual harmonic cepstral coefficients for speech recognition in noisy environment. In: Proceedings ICASSP’01, vol 1, pp 125–128 (2001)
Meister H, Landwehr M, Pyschny V, Grugel L, Walger M (2011) Use of intonation contours for speech recognition in noise by cochlear implant recipients. J Acoust Soc Amer, 129(5). https://asa.scitation.org/doi/10.1121/1.3574501
Young SJ, Evermann G, Gales MJF, Hain T, Kershaw D, Moore G, Odell J, Ollason D, Povey D, Valtchev V, Woodland PC (2006) The HTK Book, version, 3.4, (Cambridge University Engineering Department, 2006)
Martin R (2001) Noise power spectral density estimation based on optimal smoothing and minimum statistics. IEEE Trans Speech Audio Process 9(5):504
Article Google Scholar

Download references

Funding

This work has been fully supported by the Croatian Science Foundation under project number UIP-2014-09-3875.

Author information

Authors and Affiliations

Laboratory for Smart Environment Technologies, FESB - University of Split, Split, Croatia
Mladen Russo, Maja Stella, Marjan Sikora & Matko Šarić

Authors

Mladen Russo
View author publications
You can also search for this author in PubMed Google Scholar
Maja Stella
View author publications
You can also search for this author in PubMed Google Scholar
Marjan Sikora
View author publications
You can also search for this author in PubMed Google Scholar
Matko Šarić
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mladen Russo.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Russo, M., Stella, M., Sikora, M. et al. Cochlea-inspired speech recognition interface. Med Biol Eng Comput 57, 1393–1403 (2019). https://doi.org/10.1007/s11517-019-01963-6

Download citation

Received: 04 October 2018
Accepted: 14 February 2019
Published: 04 March 2019
Issue Date: 19 June 2019
DOI: https://doi.org/10.1007/s11517-019-01963-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Cochlea-inspired speech recognition interface

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey on automatic speech recognition using neural networks

PSYCHOACOUSTICS-WEB: A free online tool for the estimation of auditory thresholds

A Review on Sound Source Localization Systems

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Cochlea-inspired speech recognition interface

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey on automatic speech recognition using neural networks

PSYCHOACOUSTICS-WEB: A free online tool for the estimation of auditory thresholds

A Review on Sound Source Localization Systems

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation