Abstract
The transformation of acoustical signals into auditory sensations can be characterized by psychophysical quantities such as loudness, tonality, or perceived pitch. The resolution limits of the auditory system produce spectral and temporal masking phenomena and impose constraints on the perception of amplitude modulations. Binaural hearing (i.e., utilizing the acoustical difference across both ears) employs interaural time and intensity differences to produce localization and binaural unmasking phenomena such as the binaural intelligibility level difference, i.e., the speech reception threshold difference between listening to speech in noise monaurally versus listening with both ears.
The acoustical information available to the listener for perceiving speech even under adverse conditions can be characterized using the articulation index, the speech transmission index, and the speech intelligibility index. They can objectively predict speech reception thresholds as a function of spectral content, signal-to-noise ratio, and preservation of amplitude modulations in the speech waveform that enter the listenerʼs ear. The articulatory or phonetic information available to and received by the listener can be characterized by speech feature sets. Transinformation analysis allows one to detect the relative transmission error connected with each of these speech features. The comparison across man and machine in speech recognition allows one to test hypotheses and models of human speech perception. Conversely, automatic speech recognition may be improved by introducing human signal-processing principles into machine processing algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Abbreviations
- ASR:
-
automatic speech recognition
- BILD:
-
binaural intelligibility level difference
- CMR:
-
co-modulation masking release
- EC:
-
equalization and cancelation
- ERB:
-
equivalent rectangular bandwidth
- HSR:
-
human speech recognition
- JND:
-
just-noticeable difference
- RMS:
-
root mean square
- SI:
-
speech intelligibility
- SII:
-
speech intelligibility index
- SNR:
-
signal-to-noise ratio
- SPL:
-
sound pressure level
- SRT:
-
speech reception threshold
- TI:
-
transinformation index
References
H. Fastl, E. Zwicker: Psychoacoustics: Facts and Models (Springer, Berlin-Heidelberg 2005)
E. Zwicker, G. Flottorp, S.S. Stevens: Critical bandwidth in loudness summation, J. Acoust. Soc. Am. 29, 548 (1957)
T. Dau, B. Kollmeier, A. Kohlrausch: Modeling auditory processing of amplitude modulation. I. Detection and masking with narrow-band carriers, J. Acoust. Soc. Am. 102, 2892-2905 (1997)
M.R. Schroeder: Computer Speech: Recognition, Compression, Synthesis (Springer, Berlin-Heidelberg 2005)
B.C.J. Moore, R.D. Patterson: Auditory Frequency Selectivity (Plenum, New York 1986)
A.J. Houtsma: Pitch perception. In: Handbook of Perception and Cognition: Hearing, ed. by B.C.J. Moore (Academic, London 1995) pp. 267-295
J. Verhey, D. Pressnitzer, I.M. Winter: The psychphysics and physiology of co-modulation masking release, Exp. Brain Res. 153, 405-417 (2003)
R. Beutelmann, T. Brand: Prediction of speech intelligibility in spatial noise and reverberation for normal-hearing and hearing-impaired listeners J. Acoust. Soc. Am. 120(1), 33-42 (2006)
H.S. Colburn, N.I. Durlach: Models of binaural interaction. In: Handbook of Perception, Vol. 4 (Academic, New York 1978) pp. 467-518
J.P. Penrod: Speech threshold and word recognition/ discrimination testing. In: Handbook of Clinical Audiology, 4th edn, ed. by J. Katz (Williams and Wilkins, Baltimore 1994) pp. 147-164
R. Plomp, A. Mimpen: Improving the reliability of testing the speech-reception threshold for sentences, Audiology 18, 43-52 (1979)
B. Hagerman: Sentences for testing speech intelligibility in noise, Scand. Audiol. 11, 79-87 (1982)
M. Nilsson, S.D. Soli, J.A. Sullivan: Development of the hearing in noise test for the measurement of speech reception thresholds in quiet and in noise, J. Acoust. Soc. Am. 95(2), 1085-1099 (1994)
B. Kollmeier, M. Wesselkamp: Development and evaluation of a German sentence test for objective and subjective speech intelligibility assessment, J. Acoust. Soc. Am. 102, 2412-2421 (1997)
K. Wagener, V. Kühnel, B. Kollmeier: Entwicklung und Evaluation eines Satztests für die deutsche Sprache I: Design des Oldenburger Satztests (Development and evaluation of a German sentence test I: Design of the Oldenburg sentence test), Zeitschrift für Audiologie 38, 4-15 (1999)
K. Wagener, J.L. Josvassen, R. Ardenkjaer: Design, optimization and evaluation of a Danish sentence test in noise, Int. J. Audiol. 42(1), 10-17 (2003)
T. Brand, B. Kollmeier: Efficient adaptive procedures for threshold and concurrent slope estimates for psychophysics and speech intelligibility tests, J. Acoust. Soc. Am. 111(6), 2801-2810 (2002)
A. Bronkhorst: The cocktail party phenomenon: a review of research on speech intelligibility in multiple-talker conditions, Acustica 86, 117-128 (2000)
N.R. French, J.C. Steinberg: Factors governing the intelligibility of speech sounds, J. Acoust. Soc. Am. 19, 90-119 (1947)
H. Fletcher, R.H. Galt: The perception of speech and its relation to telephony, J. Acoust. Soc. Am. 22, 89-151 (1950)
ANSI: Methods for the calculation of the articulation index, ANSI S3.5-1969 (American National Standards Institute, New York 1969)
ANSI: Methods for calculation of the speech intelligibility index, ANSI S3.5-1997 (American National Standards Institute, New York 1997)
T. Houtgast, H.J.M. Steeneken: A review of the MTF concept in room acoustics and its use for estimating speech intelligibility in auditoria, J. Acoust. Soc. Am. 77, 1069-1077 (1985)
IEC: Sound system equipment - Part 16: Objective rating of speech intelligibility by speech transmission index. INTERNATIONAL STANDARD 60268-16 Second edition 1998-03 (1998)
H. Müsch, S. Buus: Using statistical decision theory to predict speech intelligibility. I. Model structure, J. Acoust. Soc. Am. 109, 2896-2909 (2001)
H. Müsch, S. Buus: Using statistical decision theory to predict speech intelligibility, II. Measurement and prediction of consonant-discrimination performance J. Acoust. Soc. Am. 109, 2910-2920 (2001)
I. Holube, B. Kollmeier: Speech intelligibility prediction in hearing-impaired listeners based on a psychoacoustically motivated perception model, J. Acoust. Soc. Am. 100, 1703-1716 (1996)
R. Lippmann: Speech recognition by machines and humans, Speech Commun. 22, 1-15 (1997)
S. Greenberg, W.A. Ainsworth, A.N. Popper: Speech Processing in the Auditory System. In: Handbook of Auditory research, Vol. 18, ed. by R.R. Fay (Springer, New York 2004)
M.R. Schroeder, H.W. Strube: Flat spectrum speech, J. Acoust. Soc. Am. 79, 1580-1583 (1986)
R.V. Shannon, F.G. Zeng, V. Kamth, J. Wygonsky, M. Ekelid: Speech recognition with primarily temporal cues, Science 270, 303-304 (1995)
R. Jakobson, C.G.M. Fant, M. Halle: Preliminaries to speech analysis: the distinctive features and their correlates (MIT Press, Cambridge 1963)
G.A. Miller, P.E. Nicely: An analysis of perceptual confusions among some english consonants, J. Acoust. Soc. Am. 27, 338-352 (1955)
M.D. Wang, R.C. Bilger: Consonant confusions in noise: a study of perceptual features, J. Acoust. Soc. Am. 54, 1248-1266 (1973)
J. Tchorz, B. Kollmeier: A model of auditory perception as front end for automatic speech recognition, J. Acoust. Soc. Am. 106(4), 2040-2050 (1999)
M. Hansen, B. Kollmeier: Objective modeling of speech quality with a psychoacoustically validated auditory model, J. Audio Eng. Soc. 48(5), 395-408 (2000)
C.E. Schreiner, G. Langner: Periodicity coding in the inferior colliculus of the cat II. Topographical organization, J. Neurophys. 60, 1823-1840 (1988)
M. Kleinschmidt: Methods for capturing spectro-temporal modulations in automatic speech recognition, Acustica united with Acta Acustica 88(3), 416-422 (2002)
D.A. Depireux, J.Z. Simon, D.J. Klein, S.A. Shamma: Spectro-temporal response field characterization with dynamic ripples in ferret primary auditory cortex, J. Neurophysiol. 85(3), 1220-1234 (2001)
C. Kaernbach: The Memory of Noise, Exp. Psychol. 1(4), 240-248 (2004)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Kollmeier, B., Brand, T., Meyer, B. (2008). Perception of Speech and Sound. In: Benesty, J., Sondhi, M.M., Huang, Y.A. (eds) Springer Handbook of Speech Processing. Springer Handbooks. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-49127-9_4
Download citation
DOI: https://doi.org/10.1007/978-3-540-49127-9_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49125-5
Online ISBN: 978-3-540-49127-9
eBook Packages: EngineeringEngineering (R0)