Medical & Biological Engineering & Computing

, Volume 57, Issue 6, pp 1393–1403 | Cite as

Cochlea-inspired speech recognition interface

  • Mladen RussoEmail author
  • Maja Stella
  • Marjan Sikora
  • Matko Šarić
Original Article


Automatic speech recognition (ASR) technology provides a natural interface for human-machine interaction. Typical ASR systems can achieve high performance in quiet environments but, unlike humans, perform poorly in real-world situations. To better simulate the human auditory periphery and improve the performance in realistic noisy scenarios, we propose two models of speech recognition front-ends based on a biophysical cochlear model. The first front-end is based on the method of signal reconstruction from a basilar membrane response. When applied to noisy speech, this method results in improved signal quality. This method can be used as a preprocessing step in a standard ASR system and can also be used as a noise reduction technique for other applications. The second front-end we propose is based on the construction of speech recognition coefficients directly from a basilar membrane response. Experimental results using a continuous-density hidden Markov model (HMM) recognizer demonstrate significant improvement in performance compared to standard Mel-frequency cepstral coefficients (MFCC) in various types of noisy conditions.

Graphical Abstract

Speech recognition model based on cochlear front-end


Speech recognition interface Biophysical cochlear model Noise robustness 


Funding information

This work has been fully supported by the Croatian Science Foundation under project number UIP-2014-09-3875.


  1. 1.
    Muhammad G (2015) Automatic speech recognition using interlaced derivative pattern for cloud based healthcare system. Clust Comput 18(2):795Google Scholar
  2. 2.
    Lee S, Kang S, Han DK, Ko H (2016) Dialogue enabling speech-to-text user assistive agent system for hearing-impaired person. Med Biol Eng Comput 54(6):915Google Scholar
  3. 3.
    Jain V et al (2008) An expert system for predicting the effects of speech interference due to noise pollution on humans using fuzzy approach. Expert Syst Appl 35(4):1978Google Scholar
  4. 4.
    Mporas I, Kocsis O, Ganchev T, Fakotakis N (2010) Robust speech interaction in motorcycle environment. Expert Syst Appl 37(3):1827Google Scholar
  5. 5.
    Gong Y (1995) Speech recognition in noisy environments: a survey. Speech Commun 16(3):261. Google Scholar
  6. 6.
    Li J, Deng L, Gong Y, Haeb-Umbach R (2014) An overview of noise-robust automatic speech recognition. IEEE/ACM Trans Audio Speech, Lang Process 22(4):745Google Scholar
  7. 7.
    Lippmann RP (1997) Speech recognition by machines and humans. Speech Commun 22(1):1Google Scholar
  8. 8.
    Stolcke A, Droppo J (2017) Comparing human and machine errors in conversational speech transcription. In: INTERSPEECH (2017)Google Scholar
  9. 9.
    Tchorz J, Kollmeier B (1999) A model of auditory perception as front end for automatic speech recognition. J Acoust Soc Amer 106(4):2040Google Scholar
  10. 10.
    Hermansky H (1990) Perceptual linear predictive (PLP) analysis of speech. J Acoust Soc Amer 87(4):1738Google Scholar
  11. 11.
    Holmberg M, Gelbart D, Hemmert W (2006) Automatic speech recognition with an adaptation model motivated by auditory processing. IEEE Trans Audio, Speech, Lang Process 14(1):43Google Scholar
  12. 12.
    Jankowski Jr CR, Vo HDH, Lippmann RP (1995) A comparison of signal processing front ends for automatic word recognition. IEEE Trans Speech Audio Process 3(4):286Google Scholar
  13. 13.
    Seneff S (1986) A computational model for the peripheral auditory system: application of speech recognition research. In: Proceedings ICASSP’86, vol 11, pp 1983–1986 (1986)Google Scholar
  14. 14.
    Ghitza O (1994) Auditory models and human performance in tasks related to speech coding and speech recognition. IEEE Trans Speech Audio Process 2(1):115Google Scholar
  15. 15.
    Kim DS, Lee SY, Kil R (1999) Auditory processing of speech signals for robust speech recognition in real-world noisy environments. IEEE Trans Speech Audio Process 7(1):55Google Scholar
  16. 16.
    Moritz N, Anemüller J, Kollmeier B (2015) An auditory inspired amplitude modulation filter bank for robust feature extraction in automatic speech recognition. IEEE/ACM IEEE Trans Audio Speech Lang Process 23 (11):1926Google Scholar
  17. 17.
    Stern RM, Morgan N (2012) Hearing is believing: biologically inspired methods for robust automatic speech recognition. IEEE Sig Process Mag 29(6):34Google Scholar
  18. 18.
    Mammano F, Nobili R (1993) Biophysics of the cochlea: linear approximation. J Acoust Soc Amer 93 (6):3320Google Scholar
  19. 19.
    Nobili R, Mammano F (1996) Biophysics of the cochlea. II: stationary nonlinear phenomenology. J Acoust Soc Amer 99(4):2244Google Scholar
  20. 20.
    Munkong R, Juang BH (2008) Auditory perception and cognition. IEEE Signal Proc Mag 25(3):98. Google Scholar
  21. 21.
    Patterson R, Nimmo-Smith I, Holdsworth J, Rice P (1988) An efficient auditory filterbank based on the gammatone function. APU report 2341Google Scholar
  22. 22.
    van Netten SM, Duifhuis H (1983) Modelling an active, nonlinear cochlea. In: Boer ED, Viergever MA (eds) Mechanics of hearing. (Nijhoff/Delft Univ. Press, 1983), pp 143–151Google Scholar
  23. 23.
    Nobili R, Vetesnik A, Turicchia L, Mammano F (2003) Otoacoustic emissions from residual oscillations of the cochlear basilar membrane in a human ear model. J Assoc Res Otolaryngol 4(4):478Google Scholar
  24. 24.
    Russo M, Rozic N, Stella M (2011) Biophysical cochlear model: time-frequency analysis and signal reconstruction. Acta Acustica united with Acustica 97(4):632Google Scholar
  25. 25.
    ITU-T. Rec. P.862. Perceptual evaluation of speech quality (PESQ): an objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs (2001)Google Scholar
  26. 26.
    Robles L, Ruggero MA (2001) Mechanics of the mammalian cochlea. Physiol Rev 81(3):1305Google Scholar
  27. 27.
    Kim Y, Xin J, Qi Y (2006) A study of hearing aid gain functions based on a nonlinear nonlocal feedforward cochlea model. Hear Res 215(1-2):84Google Scholar
  28. 28.
    Iwano K, Seki T, Furui S (2002) Noise robust speech recognition using F0 contour extracted by hough transform. In: INTERSPEECH (2002)Google Scholar
  29. 29.
    Gu L, Rose K (2001) Perceptual harmonic cepstral coefficients for speech recognition in noisy environment. In: Proceedings ICASSP’01, vol 1, pp 125–128 (2001)Google Scholar
  30. 30.
    Meister H, Landwehr M, Pyschny V, Grugel L, Walger M (2011) Use of intonation contours for speech recognition in noise by cochlear implant recipients. J Acoust Soc Amer, 129(5).
  31. 31.
    Young SJ, Evermann G, Gales MJF, Hain T, Kershaw D, Moore G, Odell J, Ollason D, Povey D, Valtchev V, Woodland PC (2006) The HTK Book, version, 3.4, (Cambridge University Engineering Department, 2006)Google Scholar
  32. 32.
    Martin R (2001) Noise power spectral density estimation based on optimal smoothing and minimum statistics. IEEE Trans Speech Audio Process 9(5):504Google Scholar

Copyright information

© International Federation for Medical and Biological Engineering 2019

Authors and Affiliations

  1. 1.Laboratory for Smart Environment TechnologiesFESB - University of SplitSplitCroatia

Personalised recommendations