Auditory-like filterbank: An optimal speech processor for efficient human speech communication

GHOSH, PRASANTA KUMAR; GOLDSTEIN, LOUIS M; NARAYANAN, SHRIKANTH S

doi:10.1007/s12046-011-0042-4

Auditory-like filterbank: An optimal speech processor for efficient human speech communication

Published: 22 November 2011

Volume 36, pages 699–712, (2011)
Cite this article

Sadhana Aims and scope Submit manuscript

PRASANTA KUMAR GHOSH¹,
LOUIS M GOLDSTEIN² &
SHRIKANTH S NARAYANAN¹

86 Accesses
Explore all metrics

Abstract

The transmitter and the receiver in a communication system have to be designed optimally with respect to one another to ensure reliable and efficient communication. Following this principle, we derive an optimal filterbank for processing speech signal in the listener’s auditory system (receiver), so that maximum information about the talker’s (transmitter) message can be obtained from the filterbank output, leading to efficient communication between the talker and the listener. We consider speech data of 45 talkers from three different languages for designing optimal filterbanks separately for each of them. We find that the computationally derived optimal filterbanks are similar to the empirically established auditory (cochlear) filterbank in the human ear. We also find that the output of the empirically established auditory filterbank provides more than 90% of the maximum information about the talker’s message provided by the output of the optimal filterbank. Our experimental findings suggest that the auditory filterbank in human ear functions as a near-optimal speech processor for achieving efficient speech communication between humans.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Multi-scale Subconvolutional U-Net with Time-Frequency Attention Mechanism for Single Channel Speech Enhancement

Article 28 May 2024

Strategy for developing a speech recognition model specialized for patients with depression or Parkinson’s disease with small size speech database

Article 23 May 2024

AGH corpus of Polish speech

Article Open access 06 May 2015

References

Browman C P, Goldstein L 1989 Articulatory gestures as phonological units, Phonology 6(2): 201–251
Article Google Scholar
Browman C P, Goldstein L 1990 Gestural specification using dynamically-defined articulatory structures, J. Phonetics 18: 299–320
Google Scholar
Chatterjee M, Zwislocki J J 1998 Cochlear mechanisms of frequency and intensity coding. II. Dynamic range and the code for loudness, Hear. Res. 124(1–2): 170–181
Article Google Scholar
Cover T M, Thomas J A 1991 Elements of information theory (New York: Wiley Interscience)
Book MATH Google Scholar
Darbellay G A, Vajda I 1999 Estimation of the information by an adaptive partition of the observation space, IEEE Trans. Inform. Theory 45: 1315–1321
Article MathSciNet MATH Google Scholar
Duda R O, Hart P E 2000 Pattern classification and scene analysis (New York: Wiley-Interscience)
Google Scholar
Ghosh P K, Goldstein L M, Narayanan S S 2011 Processing speech signal using auditory-like filterbank provides least uncertainty about articulatory gestures, J. Acoust. Soc. Am. 129(6): 4014–4022
Google Scholar
Goldstein L, Chitoran I, Selkirk E 2007 Syllable structure as coupled oscillator modes: evidence from Georgian vs. Tashlhiyt Berber, Proc. 16th International Congress of Phonetic Sciences, Saarbrucken, Germany, pp. 241–244
Johnson K 2003 Acoustic and auditory phonetics (MA, USA: Wiley-Blackwell) 2nd edition
Nave C R Place theory. Accessed 13/03/2011. URL http://hyperphysics.phy-astr.gsu.edu/hbase/sound/place.html
Pathmanathan J S, Kim D O 2001 A computational model for the AVCN marginal shell with medial olivocochlear feedback: generation of a wide dynamic range, Neurocomputing 38: 807–815
Article Google Scholar
Perkell S J, Cohen M, Svirsky M, Matthies M, Garabieta I, Jackson M 1992 Electro-magnetic midsagittal articulometer systems for transducing speech articulatory movements, J. Acoust. Soc. Am. 92: 3078–3096
Article Google Scholar
Saltzman E L, Munhall K G 1989 A dynamical approach to gestural patterning in speech production, Ecol. Psychol. 1: 333–382
Article Google Scholar
Shannon C E 1948 A mathematical theory of communication, Bell Syst. Tech. J. 27: 379–423
MathSciNet MATH Google Scholar
Smith E C, Lewicki M S 2006 Efficient auditory coding, Nature 439: 978–982
Article Google Scholar
Strang G, Nguyen T 1996 Wavelets and filter banks (Wellesley, MA: Wellesley-Cambridge Press)
Google Scholar
Westbury J R 1994 X-ray microbeam speech production database user’s handbook version 1.0. http://www2.uni-jena.de/~x1siad/uwxrmbdb.html (date last viewed 6/15/2010)
Wikibooks. Anatomy and physiology of animals/the senses. Accessed 13/03/2011. URL http://en.wikibooks.org/wiki/Anatomy_and_Physiology_of_Animals/The_Senses
Yanagawa M 2006 Articulatory timing in first and second language: a cross-linguistic study. Doctoral dissertation, Yale University
Zwicker E, Terhardt E 1980 Analytical expressions for critical-band rate and critical bandwidth as a function of frequency, J. Acoust. Soc. Am. 68: 1523–1525
Article Google Scholar

Download references

Author information

Authors and Affiliations

Signal Analysis and Interpretation Laboratory, Department of Electrical Engineering, University of Southern California, Los Angeles, CA, 90089, USA
PRASANTA KUMAR GHOSH & SHRIKANTH S NARAYANAN
Department of Linguistics, University of Southern California, Los Angeles, CA, 90089, USA
LOUIS M GOLDSTEIN

Authors

PRASANTA KUMAR GHOSH
View author publications
You can also search for this author in PubMed Google Scholar
LOUIS M GOLDSTEIN
View author publications
You can also search for this author in PubMed Google Scholar
SHRIKANTH S NARAYANAN
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to PRASANTA KUMAR GHOSH.

Rights and permissions

Reprints and permissions

About this article

Cite this article

GHOSH, P.K., GOLDSTEIN, L.M. & NARAYANAN, S.S. Auditory-like filterbank: An optimal speech processor for efficient human speech communication. Sadhana 36, 699–712 (2011). https://doi.org/10.1007/s12046-011-0042-4

Download citation

Published: 22 November 2011
Issue Date: October 2011
DOI: https://doi.org/10.1007/s12046-011-0042-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Auditory-like filterbank: An optimal speech processor for efficient human speech communication

Abstract

Access this article

Similar content being viewed by others

A Multi-scale Subconvolutional U-Net with Time-Frequency Attention Mechanism for Single Channel Speech Enhancement

Strategy for developing a speech recognition model specialized for patients with depression or Parkinson’s disease with small size speech database

AGH corpus of Polish speech

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Auditory-like filterbank: An optimal speech processor for efficient human speech communication

Abstract

Access this article

Similar content being viewed by others

A Multi-scale Subconvolutional U-Net with Time-Frequency Attention Mechanism for Single Channel Speech Enhancement

Strategy for developing a speech recognition model specialized for patients with depression or Parkinson’s disease with small size speech database

AGH corpus of Polish speech

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation