Perception of Speech and Sound

Kollmeier, Birger; Brand, Thomas; Meyer, Bernd

doi:10.1007/978-3-540-49127-9_4

Birger Kollmeier Prof.⁴,
Thomas Brand Dr.⁵ &
Bernd Meyer Ph.D⁶

Part of the book series: Springer Handbooks ((SHB))

8648 Accesses
18 Citations

Abstract

The transformation of acoustical signals into auditory sensations can be characterized by psychophysical quantities such as loudness, tonality, or perceived pitch. The resolution limits of the auditory system produce spectral and temporal masking phenomena and impose constraints on the perception of amplitude modulations. Binaural hearing (i.e., utilizing the acoustical difference across both ears) employs interaural time and intensity differences to produce localization and binaural unmasking phenomena such as the binaural intelligibility level difference, i.e., the speech reception threshold difference between listening to speech in noise monaurally versus listening with both ears.

The acoustical information available to the listener for perceiving speech even under adverse conditions can be characterized using the articulation index, the speech transmission index, and the speech intelligibility index. They can objectively predict speech reception thresholds as a function of spectral content, signal-to-noise ratio, and preservation of amplitude modulations in the speech waveform that enter the listenerʼs ear. The articulatory or phonetic information available to and received by the listener can be characterized by speech feature sets. Transinformation analysis allows one to detect the relative transmission error connected with each of these speech features. The comparison across man and machine in speech recognition allows one to test hypotheses and models of human speech perception. Conversely, automatic speech recognition may be improved by introducing human signal-processing principles into machine processing algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 579.00; Price excludes VAT (USA)

Hardcover Book: USD 729.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Abbreviations

ASR:: automatic speech recognition
BILD:: binaural intelligibility level difference
CMR:: co-modulation masking release
EC:: equalization and cancelation
ERB:: equivalent rectangular bandwidth
HSR:: human speech recognition
JND:: just-noticeable difference
RMS:: root mean square
SI:: speech intelligibility
SII:: speech intelligibility index
SNR:: signal-to-noise ratio
SPL:: sound pressure level
SRT:: speech reception threshold
TI:: transinformation index

References

H. Fastl, E. Zwicker: Psychoacoustics: Facts and Models (Springer, Berlin-Heidelberg 2005)
Google Scholar
E. Zwicker, G. Flottorp, S.S. Stevens: Critical bandwidth in loudness summation, J. Acoust. Soc. Am. 29, 548 (1957)
Article Google Scholar
T. Dau, B. Kollmeier, A. Kohlrausch: Modeling auditory processing of amplitude modulation. I. Detection and masking with narrow-band carriers, J. Acoust. Soc. Am. 102, 2892-2905 (1997)
Article Google Scholar
M.R. Schroeder: Computer Speech: Recognition, Compression, Synthesis (Springer, Berlin-Heidelberg 2005)
MATH Google Scholar
B.C.J. Moore, R.D. Patterson: Auditory Frequency Selectivity (Plenum, New York 1986)
Book Google Scholar
A.J. Houtsma: Pitch perception. In: Handbook of Perception and Cognition: Hearing, ed. by B.C.J. Moore (Academic, London 1995) pp. 267-295
Google Scholar
J. Verhey, D. Pressnitzer, I.M. Winter: The psychphysics and physiology of co-modulation masking release, Exp. Brain Res. 153, 405-417 (2003)
Article Google Scholar
R. Beutelmann, T. Brand: Prediction of speech intelligibility in spatial noise and reverberation for normal-hearing and hearing-impaired listeners J. Acoust. Soc. Am. 120(1), 33-42 (2006)
Google Scholar
H.S. Colburn, N.I. Durlach: Models of binaural interaction. In: Handbook of Perception, Vol. 4 (Academic, New York 1978) pp. 467-518
Google Scholar
J.P. Penrod: Speech threshold and word recognition/ discrimination testing. In: Handbook of Clinical Audiology, 4th edn, ed. by J. Katz (Williams and Wilkins, Baltimore 1994) pp. 147-164
Google Scholar
R. Plomp, A. Mimpen: Improving the reliability of testing the speech-reception threshold for sentences, Audiology 18, 43-52 (1979)
Article Google Scholar
B. Hagerman: Sentences for testing speech intelligibility in noise, Scand. Audiol. 11, 79-87 (1982)
Article Google Scholar
M. Nilsson, S.D. Soli, J.A. Sullivan: Development of the hearing in noise test for the measurement of speech reception thresholds in quiet and in noise, J. Acoust. Soc. Am. 95(2), 1085-1099 (1994)
Article Google Scholar
B. Kollmeier, M. Wesselkamp: Development and evaluation of a German sentence test for objective and subjective speech intelligibility assessment, J. Acoust. Soc. Am. 102, 2412-2421 (1997)
Article Google Scholar
K. Wagener, V. Kühnel, B. Kollmeier: Entwicklung und Evaluation eines Satztests für die deutsche Sprache I: Design des Oldenburger Satztests (Development and evaluation of a German sentence test I: Design of the Oldenburg sentence test), Zeitschrift für Audiologie 38, 4-15 (1999)
Google Scholar
K. Wagener, J.L. Josvassen, R. Ardenkjaer: Design, optimization and evaluation of a Danish sentence test in noise, Int. J. Audiol. 42(1), 10-17 (2003)
Article Google Scholar
T. Brand, B. Kollmeier: Efficient adaptive procedures for threshold and concurrent slope estimates for psychophysics and speech intelligibility tests, J. Acoust. Soc. Am. 111(6), 2801-2810 (2002)
Article Google Scholar
A. Bronkhorst: The cocktail party phenomenon: a review of research on speech intelligibility in multiple-talker conditions, Acustica 86, 117-128 (2000)
Google Scholar
N.R. French, J.C. Steinberg: Factors governing the intelligibility of speech sounds, J. Acoust. Soc. Am. 19, 90-119 (1947)
Article Google Scholar
H. Fletcher, R.H. Galt: The perception of speech and its relation to telephony, J. Acoust. Soc. Am. 22, 89-151 (1950)
Article Google Scholar
ANSI: Methods for the calculation of the articulation index, ANSI S3.5-1969 (American National Standards Institute, New York 1969)
Google Scholar
ANSI: Methods for calculation of the speech intelligibility index, ANSI S3.5-1997 (American National Standards Institute, New York 1997)
Google Scholar
T. Houtgast, H.J.M. Steeneken: A review of the MTF concept in room acoustics and its use for estimating speech intelligibility in auditoria, J. Acoust. Soc. Am. 77, 1069-1077 (1985)
Article Google Scholar
IEC: Sound system equipment - Part 16: Objective rating of speech intelligibility by speech transmission index. INTERNATIONAL STANDARD 60268-16 Second edition 1998-03 (1998)
Google Scholar
H. Müsch, S. Buus: Using statistical decision theory to predict speech intelligibility. I. Model structure, J. Acoust. Soc. Am. 109, 2896-2909 (2001)
Article Google Scholar
H. Müsch, S. Buus: Using statistical decision theory to predict speech intelligibility, II. Measurement and prediction of consonant-discrimination performance J. Acoust. Soc. Am. 109, 2910-2920 (2001)
Google Scholar
I. Holube, B. Kollmeier: Speech intelligibility prediction in hearing-impaired listeners based on a psychoacoustically motivated perception model, J. Acoust. Soc. Am. 100, 1703-1716 (1996)
Article Google Scholar
R. Lippmann: Speech recognition by machines and humans, Speech Commun. 22, 1-15 (1997)
Article Google Scholar
S. Greenberg, W.A. Ainsworth, A.N. Popper: Speech Processing in the Auditory System. In: Handbook of Auditory research, Vol. 18, ed. by R.R. Fay (Springer, New York 2004)
Google Scholar
M.R. Schroeder, H.W. Strube: Flat spectrum speech, J. Acoust. Soc. Am. 79, 1580-1583 (1986)
Article Google Scholar
R.V. Shannon, F.G. Zeng, V. Kamth, J. Wygonsky, M. Ekelid: Speech recognition with primarily temporal cues, Science 270, 303-304 (1995)
Article Google Scholar
R. Jakobson, C.G.M. Fant, M. Halle: Preliminaries to speech analysis: the distinctive features and their correlates (MIT Press, Cambridge 1963)
Google Scholar
G.A. Miller, P.E. Nicely: An analysis of perceptual confusions among some english consonants, J. Acoust. Soc. Am. 27, 338-352 (1955)
Article Google Scholar
M.D. Wang, R.C. Bilger: Consonant confusions in noise: a study of perceptual features, J. Acoust. Soc. Am. 54, 1248-1266 (1973)
Article Google Scholar
J. Tchorz, B. Kollmeier: A model of auditory perception as front end for automatic speech recognition, J. Acoust. Soc. Am. 106(4), 2040-2050 (1999)
Article Google Scholar
M. Hansen, B. Kollmeier: Objective modeling of speech quality with a psychoacoustically validated auditory model, J. Audio Eng. Soc. 48(5), 395-408 (2000)
Google Scholar
C.E. Schreiner, G. Langner: Periodicity coding in the inferior colliculus of the cat II. Topographical organization, J. Neurophys. 60, 1823-1840 (1988)
Google Scholar
M. Kleinschmidt: Methods for capturing spectro-temporal modulations in automatic speech recognition, Acustica united with Acta Acustica 88(3), 416-422 (2002)
Google Scholar
D.A. Depireux, J.Z. Simon, D.J. Klein, S.A. Shamma: Spectro-temporal response field characterization with dynamic ripples in ferret primary auditory cortex, J. Neurophysiol. 85(3), 1220-1234 (2001)
Google Scholar
C. Kaernbach: The Memory of Noise, Exp. Psychol. 1(4), 240-248 (2004)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Medizinische Physik, Universität Oldenburg, 26111, Oldenburg, Germany
Birger Kollmeier Prof.
Sektion Medizinphysik, Carl von Ossietzky Universität Oldenburg, Haus des Hörens, Marie-Curie-Str. 2, 26121, Oldenburg, Germany
Thomas Brand Dr.
Medical Physics Section, Haus des Hörens, Carl von Ossietzky Universität Oldenburg, Marie-Curie-Str. 2, 26121, Oldenburg, Germany
Bernd Meyer Ph.D

Authors

Birger Kollmeier Prof.
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Brand Dr.
View author publications
You can also search for this author in PubMed Google Scholar
Bernd Meyer Ph.D
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Birger Kollmeier Prof. , Thomas Brand Dr. or Bernd Meyer Ph.D .

Editor information

Editors and Affiliations

INRS-EMT, University of Quebec, 800 de la Gauchetiere Ouest, H5A 1K6, Montreal, Quebec, Canada
Jacob Benesty Dr.
Avayalabs Research, 233 Mount Airy Road, 07920, Basking Ridge, NJ, USA
M. Mohan Sondhi Ph.D.
Alcatel-Lucent, Bell Laboratories, 600 Mountain Avenue, 07974, Murray Hill, NJ, USA
Yiteng Arden Huang Dr.

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Kollmeier, B., Brand, T., Meyer, B. (2008). Perception of Speech and Sound. In: Benesty, J., Sondhi, M.M., Huang, Y.A. (eds) Springer Handbook of Speech Processing. Springer Handbooks. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-49127-9_4

Download citation

DOI: https://doi.org/10.1007/978-3-540-49127-9_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49125-5
Online ISBN: 978-3-540-49127-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics