Skip to main content

Extracting Language Content from Speech Sounds: The Information Theoretic Approach

  • Chapter
  • First Online:
Speech Perception

Part of the book series: Springer Handbook of Auditory Research ((SHAR,volume 74))

Abstract

Speech comprehension involves recovering a speaker’s intended meaning from the speech sounds that they produce. While the sensory-driven components of this process have been widely investigated, the impact of speech content (i.e., linguistic information) on sensory processing is much less understood. Here we summarize the growing body of research demonstrating that neural processing of speech sounds is influenced by morpheme- and word-level statistical properties of the information conveyed. We introduce and review evidence that information theoretic measures such as entropy and surprisal are apparent in neural responses. These findings help uncover fundamental organizational principles of the language system: what units are stored and how they are accessed. Modeling sensitivity to the information content of the speech signal helps explain the interface between (i) auditory processes operating on speech sounds and (ii) the words and meanings that those sounds convey.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  • Adams RA, Stephan KE, Brown HR, Frith CD, Friston KJ (2013) The computational anatomy of psychosis. Front Psych 4:47

    Google Scholar 

  • Arsenault JS, Buchsbaum BR (2015) Distributed neural representations of phonological features during speech perception. J Neurosci 35(2):634–642

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Balling LW, Baayen RH (2012) Probability and surprisal in auditory comprehension of morphologically complex words. Cognition 125(1):80–106

    Article  PubMed  Google Scholar 

  • Bender EM, Koller A (2020) Climbing towards nlu: on meaning, form, and understanding in the age of data. In: Proceedings of the 58th annual meeting of the Association for Computational Linguistics, pp 5185–5198

    Google Scholar 

  • Binder JR, Frost JA, Hammeke TA, Bellgowan PS, Springer JA, Kaufman JN, Possing ET (2000) Human temporal lobe activation by speech and nonspeech sounds. Cereb Cortex 10(5):512–528

    Article  CAS  PubMed  Google Scholar 

  • Blank H, Davis MH (2016) Prediction errors but not sharpened signals simulate multivoxel fmri patterns during speech perception. PLoS Biol 14(11):e1002577

    Article  PubMed  PubMed Central  Google Scholar 

  • Bonte M, Parviainen T, Hytönen K, Salmelin R (2006) Time course of top-down and bottom-up influences on syllable processing in the auditory cortex. Cereb Cortex 16(1):115–123

    Article  PubMed  Google Scholar 

  • Bozic M, Tyler LK, Ives DT, Randall B, Marslen-Wilson WD (2010) Bihemispheric foundations for human speech comprehension. Proc Natl Acad Sci 107(40):17439–17444

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Brodbeck C, Hong LE, Simon JZ (2018) Rapid transformation from auditory to linguistic representations of continuous speech. Curr Biol 28(24):3976–3983

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Broderick MP, Anderson AJ, Di Liberto GM, Crosse MJ, Lalor EC (2018) Electrophysiological correlates of semantic dissimilarity reflect the comprehension of natural, narrative speech. Curr Biol 28(5):803–809

    Article  CAS  PubMed  Google Scholar 

  • Cairns P, Shillcock R, Chater N, Levy J (1997) Bootstrapping word boundaries: a bottom-up corpus-based approach to speech segmentation. Cogn Psychol 33(2):111–153

    Article  CAS  PubMed  Google Scholar 

  • Chang EF, Rieger JW, Johnson K, Berger MS, Barbaro NM, Knight RT (2010) Categorical speech representation in human superior temporal gyrus. Nat Neurosci 13(11):1428

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Chomsky N (2000) New horizons in the study of language and mind. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Chomsky N, Halle M (1968) The sound pattern of English, 1st edn. Harper and Row

    Google Scholar 

  • Daube C, Ince RA, Gross J (2019) Simple acoustic features can explain phoneme-based predictions of cortical responses to speech. Curr Biol 29(12):1924–1937

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Davis MH (2016) The neurobiology of lexical access. In: Hickok G, Small SL (eds) Neurobiology of language. Elsevier, pp 541–555

    Chapter  Google Scholar 

  • Davis MH, Gaskell MG (2009) A complementary systems account of word learning: neural and behavioural evidence. Philos Trans R Soc Lond B Biol Sci 364(1536):3773–3800

    Article  PubMed  PubMed Central  Google Scholar 

  • Davis MH, Johnsrude IS (2003) Hierarchical processing in spoken language comprehension. J Neurosci 23(8):3423–3431

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Davis MH, Sohoglu E (2020) Three functions of prediction error for bayesian inference in speech perception. In: Poeppel D, Mangun G, Gazzaniga MS (eds) The cognitive neurosciences, 6th edn. MIT Press, pp 177–189

    Chapter  Google Scholar 

  • De Saussure F (2011) Course in general linguistics. Columbia University Press, New York

    Google Scholar 

  • Di Liberto GM, O’Sullivan JA, Lalor EC (2015) Low-frequency cortical entrainment to speech reflects phoneme-level processing. Curr Biol 25(19):2457–2465

    Article  PubMed  Google Scholar 

  • Di Liberto GM, Wong D, Melnik GA, de Cheveigné A (2019) Low-frequency cortical responses to natural speech reflect probabilistic phonotactics. NeuroImage 196:237–247

    Article  PubMed  Google Scholar 

  • Donhauser PW, Baillet S (2020) Two distinct neural timescales for predictive speech processing. Neuron 105(2):385–393

    Article  CAS  PubMed  Google Scholar 

  • Elman JL (1990) Finding structure in time. Cogn Sci 14(2):179–211

    Article  Google Scholar 

  • Ettinger A, Linzen T, Marantz A (2014) The role of morphology in phoneme prediction: evidence from MEG. Brain Lang 129:14–23

    Article  PubMed  Google Scholar 

  • Evans S, Davis MH (2015) Hierarchical organization of auditory and motor representations in speech perception: evidence from searchlight similarity analysis. Cereb Cortex 25(12):4772–4788

    Article  PubMed  PubMed Central  Google Scholar 

  • Feldman H, Friston K (2010) Attention, uncertainty, and free-energy. Front Hum Neurosci 4:215

    Article  PubMed  PubMed Central  Google Scholar 

  • Formisano E, De Martino F, Bonte M, Goebel R (2008) “who” is saying “what”? Brain-based decoding of human voice and speech. Science 322(5903):970–973

    Article  CAS  PubMed  Google Scholar 

  • Friston K (2005) A theory of cortical responses. Philos Trans R Soc Lond B Biol Sci 360(1456):815–836

    Article  PubMed  PubMed Central  Google Scholar 

  • Gagnepain P, Henson RN, Davis MH (2012) Temporal predictive codes for spoken words in auditory cortex. Curr Biol 22(7):615–621

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Gaston P, Marantz A (2018) The time course of contextual cohort effects in auditory processing of category-ambiguous words: Meg evidence for a single “clash” as noun or verb. Lang Cogn Neurosci 33(4):402–423

    Article  Google Scholar 

  • Graves A, Mohamed A-R, Hinton G (2013) Speech recognition with deep recurrent neural networks. In: 2013 IEEE international conference on acoustics, speech and signal processing. IEEE, pp 6645–6649

    Chapter  Google Scholar 

  • Gwilliams L (2020) Hierarchical oscillators in speech comprehension: a commentary on Meyer, Sun, and Martin. Lang Cogn Neurosci 35(9):1–5

    Google Scholar 

  • Gwilliams L, King J-R (2020) Recurrent processes support a cascade of hierarchical decisions. elife 9:e56603

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Gwilliams L, Marantz A (2015) Non-linear processing of a linear speech stream: the influence of morphological structure on the recognition of spoken arabic words. Brain Lang 147:1–13

    Article  CAS  PubMed  Google Scholar 

  • Gwilliams LE, Monahan PJ, Samuel AG (2015) Sensitivity to morphological composition in spoken word recognition: evidence from grammatical and lexical identification tasks. J Exp Psychol Learn Mem Cogn 41(6):1663

    Article  PubMed  Google Scholar 

  • Gwilliams L, Poeppel D, Marantz A, Linzen T (2017) Phonological (un) certainty weights lexical activation. arXiv preprint:1711.06729

    Google Scholar 

  • Gwilliams L, Linzen T, Poeppel D, Marantz A (2018) In spoken word recognition, the future predicts the past. J Neurosci 38(35):7585–7599

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Gwilliams L, King J-R, Marantz A, Poeppel D (2020) Neural dynamics of phoneme sequencing in real speech jointly encode order and invariant content. bioRxiv

    Google Scholar 

  • Hickok G, Poeppel D (2007) The cortical organization of speech processing. Nat Rev Neurosci 8(5):393–402

    Article  CAS  PubMed  Google Scholar 

  • Huth AG, De Heer WA, Griffiths TL, Theunissen FE, Gallant JL (2016) Natural speech reveals the semantic maps that tile human cerebral cortex. Nature 532(7600):453–458

    Article  PubMed  PubMed Central  Google Scholar 

  • Jusczyk PW, Luce PA, Charles-Luce J (1994) Infants’ sensitivity to phonotactic patterns in the native language. J Mem Lang 33(5):630

    Article  Google Scholar 

  • Kilian-Hütten N, Vroomen J, Formisano E (2011) Brain activation during audiovisual exposure anticipates future perception of ambiguous speech. NeuroImage 57(4):1601–1607

    Article  PubMed  Google Scholar 

  • Kleinschmidt DF, Jaeger TF (2015) Robust speech perception: recognize the familiar, generalize to the similar, and adapt to the novel. Psychol Rev 122(2):148

    Article  PubMed  PubMed Central  Google Scholar 

  • Klimovich-Gray A, Tyler LK, Randall B, Kocagoncu E, Devereux B, Marslen-Wilson WD (2019) Balancing prediction and sensory input in speech comprehension: the spatiotemporal dynamics of word recognition in context. J Neurosci 39(3):519–527

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Kocagoncu E, Clarke A, Devereux BJ, Tyler LK (2017) Decoding the cortical dynamics of sound-meaning mapping. J Neurosci 37(5):1312–1319

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Lau E, Phillips C, Poeppel D (2008) A cortical network for semantics:(de) constructing the N400. Nat Rev Neurosci 9(12):920–933

    Article  CAS  PubMed  Google Scholar 

  • MacKay DJ (2003) Information theory, inference and learning algorithms. Cambridge university press, Cambridge

    Google Scholar 

  • Manning CD, Schütze H (1999) Foundations of statistical natural language processing. MIT press, Boston

    Google Scholar 

  • Marslen-Wilson WD, Welsh A (1978) Processing interactions and lexical access during word recognition in continuous speech. Cogn Psychol 10(1):29–63

    Article  Google Scholar 

  • Mattys SL, Davis MH, Bradlow AR, Scott SK (2012) Speech recognition in adverse conditions: a review. Lang Cogn Process. 27(7–8):953–978

    Article  Google Scholar 

  • Mesgarani N, Cheung C, Johnson K, Chang EF (2014) Phonetic feature encoding in human superior temporal gyrus. Science 343(6174):1006–1010

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Mitchell TM, Shinkareva SV, Carlson A, Chang KM, Malave VL, Mason RA, Just MA (2008) Predicting human brain activity associated with the meanings of nouns. Science 320(5880):1191–1195

    Article  CAS  PubMed  Google Scholar 

  • Moore BC (2008) Basic auditory processes involved in the analysis of speech sounds. Philos Trans R Soc Lond B Biol Sci 363(1493):947–963

    Article  PubMed  Google Scholar 

  • Mumford D (1992) On the computational architecture of the neocortex. Biol Cybern 66(3):241–251

    Article  CAS  PubMed  Google Scholar 

  • Norris D, McQueen JM (2008) Shortlist b: a bayesian model of continuous speech recognition. Psychol Rev 115(2):357

    Article  PubMed  Google Scholar 

  • O’Shaughnessy D (2008) Automatic speech recognition: history, methods and challenges. Pattern Recogn 41(10):2965–2979

    Article  Google Scholar 

  • Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543

    Google Scholar 

  • Rao RP, Ballard DH (1999) Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. Nat Neurosci 2(1):79–87

    Article  CAS  PubMed  Google Scholar 

  • Rauschecker JP, Scott SK (2009) Maps and streams in the auditory cortex: nonhuman primates illuminate human speech processing. Nat Neurosci 12(6):718–724

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Scott SK, Blank CC, Rosen S, Wise RJ (2000) Identification of a pathway for intelligible speech in the left temporal lobe. Brain 123(12):2400–2406

    Article  PubMed  Google Scholar 

  • Shamma SA (1985) Speech processing in the auditory system: the representation of speech sounds in the responses of the auditory nerve. J Acoust Soc Am 78(5):1612–1621

    Article  CAS  PubMed  Google Scholar 

  • Shannon CE (1948) A mathematical theory of communication. Bell Syst Tech J 27(3):379–423

    Article  Google Scholar 

  • Sohoglu E, Davis MH (2016) Perceptual learning of degraded speech by minimizing prediction error. Proc Natl Acad Sci 113(12):E1747–E1756

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Stevens KN, Blumstein SE (1981) The search for invariant acoustic correlates of phonetic features. In: Perspectives on the study of speech. Psychology Press, pp 1–38

    Google Scholar 

  • Wurm LH (1997) Auditory processing of prefixed English words is both continuous and decompositional. J Mem Lang 37(3):438–461

    Article  Google Scholar 

  • Zhuang J, Randall B, Stamatakis EA, Marslen-Wilson WD, Tyler LK (2011) The interaction of lexical semantics and cohort competition in spoken word recognition: an fmri study. J Cogn Neurosci 23(12):3778–3790

    Article  PubMed  Google Scholar 

  • Zhuang J, Tyler LK, Randall B, Stamatakis EA, Marslen-Wilson WD (2014) Optimally efficient neural systems for processing spoken language. Cereb Cortex 24(4):908–918

    Article  PubMed  Google Scholar 

  • Zwitserlood, P. (1989). The locus of the effects of sentential-semantic context in spoken-word processing. Cognition, 32(1), 25–64

    Google Scholar 

Download references

Conflict of Interest

  • The authors declare that they have no conflict of interest.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Laura Gwilliams .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Gwilliams, L., Davis, M.H. (2022). Extracting Language Content from Speech Sounds: The Information Theoretic Approach. In: Holt, L.L., Peelle, J.E., Coffin, A.B., Popper, A.N., Fay, R.R. (eds) Speech Perception. Springer Handbook of Auditory Research, vol 74. Springer, Cham. https://doi.org/10.1007/978-3-030-81542-4_5

Download citation

Publish with us

Policies and ethics