Skip to main content

Part of the book series: Springer Handbooks ((SHB))

Abstract

The transformation of acoustical signals into auditory sensations can be characterized by psychophysical quantities such as loudness, tonality, or perceived pitch. The resolution limits of the auditory system produce spectral and temporal masking phenomena and impose constraints on the perception of amplitude modulations. Binaural hearing (i.e., utilizing the acoustical difference across both ears) employs interaural time and intensity differences to produce localization and binaural unmasking phenomena such as the binaural intelligibility level difference, i.e., the speech reception threshold difference between listening to speech in noise monaurally versus listening with both ears.

The acoustical information available to the listener for perceiving speech even under adverse conditions can be characterized using the articulation index, the speech transmission index, and the speech intelligibility index. They can objectively predict speech reception thresholds as a function of spectral content, signal-to-noise ratio, and preservation of amplitude modulations in the speech waveform that enter the listenerʼs ear. The articulatory or phonetic information available to and received by the listener can be characterized by speech feature sets. Transinformation analysis allows one to detect the relative transmission error connected with each of these speech features. The comparison across man and machine in speech recognition allows one to test hypotheses and models of human speech perception. Conversely, automatic speech recognition may be improved by introducing human signal-processing principles into machine processing algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 579.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 729.00
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Abbreviations

ASR:

automatic speech recognition

BILD:

binaural intelligibility level difference

CMR:

co-modulation masking release

EC:

equalization and cancelation

ERB:

equivalent rectangular bandwidth

HSR:

human speech recognition

JND:

just-noticeable difference

RMS:

root mean square

SI:

speech intelligibility

SII:

speech intelligibility index

SNR:

signal-to-noise ratio

SPL:

sound pressure level

SRT:

speech reception threshold

TI:

transinformation index

References

  1. H. Fastl, E. Zwicker: Psychoacoustics: Facts and Models (Springer, Berlin-Heidelberg 2005)

    Google Scholar 

  2. E. Zwicker, G. Flottorp, S.S. Stevens: Critical bandwidth in loudness summation, J. Acoust. Soc. Am. 29, 548 (1957)

    Article  Google Scholar 

  3. T. Dau, B. Kollmeier, A. Kohlrausch: Modeling auditory processing of amplitude modulation. I. Detection and masking with narrow-band carriers, J. Acoust. Soc. Am. 102, 2892-2905 (1997)

    Article  Google Scholar 

  4. M.R. Schroeder: Computer Speech: Recognition, Compression, Synthesis (Springer, Berlin-Heidelberg 2005)

    MATH  Google Scholar 

  5. B.C.J. Moore, R.D. Patterson: Auditory Frequency Selectivity (Plenum, New York 1986)

    Book  Google Scholar 

  6. A.J. Houtsma: Pitch perception. In: Handbook of Perception and Cognition: Hearing, ed. by B.C.J. Moore (Academic, London 1995) pp. 267-295

    Google Scholar 

  7. J. Verhey, D. Pressnitzer, I.M. Winter: The psychphysics and physiology of co-modulation masking release, Exp. Brain Res. 153, 405-417 (2003)

    Article  Google Scholar 

  8. R. Beutelmann, T. Brand: Prediction of speech intelligibility in spatial noise and reverberation for normal-hearing and hearing-impaired listeners J. Acoust. Soc. Am. 120(1), 33-42 (2006)

    Google Scholar 

  9. H.S. Colburn, N.I. Durlach: Models of binaural interaction. In: Handbook of Perception, Vol. 4 (Academic, New York 1978) pp. 467-518

    Google Scholar 

  10. J.P. Penrod: Speech threshold and word recognition/ discrimination testing. In: Handbook of Clinical Audiology, 4th edn, ed. by J. Katz (Williams and Wilkins, Baltimore 1994) pp. 147-164

    Google Scholar 

  11. R. Plomp, A. Mimpen: Improving the reliability of testing the speech-reception threshold for sentences, Audiology 18, 43-52 (1979)

    Article  Google Scholar 

  12. B. Hagerman: Sentences for testing speech intelligibility in noise, Scand. Audiol. 11, 79-87 (1982)

    Article  Google Scholar 

  13. M. Nilsson, S.D. Soli, J.A. Sullivan: Development of the hearing in noise test for the measurement of speech reception thresholds in quiet and in noise, J. Acoust. Soc. Am. 95(2), 1085-1099 (1994)

    Article  Google Scholar 

  14. B. Kollmeier, M. Wesselkamp: Development and evaluation of a German sentence test for objective and subjective speech intelligibility assessment, J. Acoust. Soc. Am. 102, 2412-2421 (1997)

    Article  Google Scholar 

  15. K. Wagener, V. Kühnel, B. Kollmeier: Entwicklung und Evaluation eines Satztests für die deutsche Sprache I: Design des Oldenburger Satztests (Development and evaluation of a German sentence test I: Design of the Oldenburg sentence test), Zeitschrift für Audiologie 38, 4-15 (1999)

    Google Scholar 

  16. K. Wagener, J.L. Josvassen, R. Ardenkjaer: Design, optimization and evaluation of a Danish sentence test in noise, Int. J. Audiol. 42(1), 10-17 (2003)

    Article  Google Scholar 

  17. T. Brand, B. Kollmeier: Efficient adaptive procedures for threshold and concurrent slope estimates for psychophysics and speech intelligibility tests, J. Acoust. Soc. Am. 111(6), 2801-2810 (2002)

    Article  Google Scholar 

  18. A. Bronkhorst: The cocktail party phenomenon: a review of research on speech intelligibility in multiple-talker conditions, Acustica 86, 117-128 (2000)

    Google Scholar 

  19. N.R. French, J.C. Steinberg: Factors governing the intelligibility of speech sounds, J. Acoust. Soc. Am. 19, 90-119 (1947)

    Article  Google Scholar 

  20. H. Fletcher, R.H. Galt: The perception of speech and its relation to telephony, J. Acoust. Soc. Am. 22, 89-151 (1950)

    Article  Google Scholar 

  21. ANSI: Methods for the calculation of the articulation index, ANSI S3.5-1969 (American National Standards Institute, New York 1969)

    Google Scholar 

  22. ANSI: Methods for calculation of the speech intelligibility index, ANSI S3.5-1997 (American National Standards Institute, New York 1997)

    Google Scholar 

  23. T. Houtgast, H.J.M. Steeneken: A review of the MTF concept in room acoustics and its use for estimating speech intelligibility in auditoria, J. Acoust. Soc. Am. 77, 1069-1077 (1985)

    Article  Google Scholar 

  24. IEC: Sound system equipment - Part 16: Objective rating of speech intelligibility by speech transmission index. INTERNATIONAL STANDARD 60268-16 Second edition 1998-03 (1998)

    Google Scholar 

  25. H. Müsch, S. Buus: Using statistical decision theory to predict speech intelligibility. I. Model structure, J. Acoust. Soc. Am. 109, 2896-2909 (2001)

    Article  Google Scholar 

  26. H. Müsch, S. Buus: Using statistical decision theory to predict speech intelligibility, II. Measurement and prediction of consonant-discrimination performance J. Acoust. Soc. Am. 109, 2910-2920 (2001)

    Google Scholar 

  27. I. Holube, B. Kollmeier: Speech intelligibility prediction in hearing-impaired listeners based on a psychoacoustically motivated perception model, J. Acoust. Soc. Am. 100, 1703-1716 (1996)

    Article  Google Scholar 

  28. R. Lippmann: Speech recognition by machines and humans, Speech Commun. 22, 1-15 (1997)

    Article  Google Scholar 

  29. S. Greenberg, W.A. Ainsworth, A.N. Popper: Speech Processing in the Auditory System. In: Handbook of Auditory research, Vol. 18, ed. by R.R. Fay (Springer, New York 2004)

    Google Scholar 

  30. M.R. Schroeder, H.W. Strube: Flat spectrum speech, J. Acoust. Soc. Am. 79, 1580-1583 (1986)

    Article  Google Scholar 

  31. R.V. Shannon, F.G. Zeng, V. Kamth, J. Wygonsky, M. Ekelid: Speech recognition with primarily temporal cues, Science 270, 303-304 (1995)

    Article  Google Scholar 

  32. R. Jakobson, C.G.M. Fant, M. Halle: Preliminaries to speech analysis: the distinctive features and their correlates (MIT Press, Cambridge 1963)

    Google Scholar 

  33. G.A. Miller, P.E. Nicely: An analysis of perceptual confusions among some english consonants, J. Acoust. Soc. Am. 27, 338-352 (1955)

    Article  Google Scholar 

  34. M.D. Wang, R.C. Bilger: Consonant confusions in noise: a study of perceptual features, J. Acoust. Soc. Am. 54, 1248-1266 (1973)

    Article  Google Scholar 

  35. J. Tchorz, B. Kollmeier: A model of auditory perception as front end for automatic speech recognition, J. Acoust. Soc. Am. 106(4), 2040-2050 (1999)

    Article  Google Scholar 

  36. M. Hansen, B. Kollmeier: Objective modeling of speech quality with a psychoacoustically validated auditory model, J. Audio Eng. Soc. 48(5), 395-408 (2000)

    Google Scholar 

  37. C.E. Schreiner, G. Langner: Periodicity coding in the inferior colliculus of the cat II. Topographical organization, J. Neurophys. 60, 1823-1840 (1988)

    Google Scholar 

  38. M. Kleinschmidt: Methods for capturing spectro-temporal modulations in automatic speech recognition, Acustica united with Acta Acustica 88(3), 416-422 (2002)

    Google Scholar 

  39. D.A. Depireux, J.Z. Simon, D.J. Klein, S.A. Shamma: Spectro-temporal response field characterization with dynamic ripples in ferret primary auditory cortex, J. Neurophysiol. 85(3), 1220-1234 (2001)

    Google Scholar 

  40. C. Kaernbach: The Memory of Noise, Exp. Psychol. 1(4), 240-248 (2004)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Birger Kollmeier Prof. , Thomas Brand Dr. or Bernd Meyer Ph.D .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Kollmeier, B., Brand, T., Meyer, B. (2008). Perception of Speech and Sound. In: Benesty, J., Sondhi, M.M., Huang, Y.A. (eds) Springer Handbook of Speech Processing. Springer Handbooks. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-49127-9_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-49127-9_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-49125-5

  • Online ISBN: 978-3-540-49127-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics