Skip to main content
Log in

Phonetic perspectives on modelling information in the speech signal

  • Published:
Sadhana Aims and scope Submit manuscript

Abstract

This paper reassesses conventional assumptions about the informativeness of the acoustic speech signal, and shows how recent research on systematic variability in the acoustic signal is consistent with an alternative linguistic model that is more biologically plausible and compatible with recent advances in modelling embodied visual perception and action. Standard assumptions about the information available from the speech signal, especially strengths and limitations of phonological features and phonemes, are reviewed, and compared with an alternative approach based on Firthian prosodic analysis (FPA). FPA places more emphasis than standard models on the linguistic and interactional function of an utterance, de-emphasizes the need to identify phonemes, and uses formalisms that force us to recognize that every perceptual decision is context- and task-dependent. Examples of perceptually-significant phonetic detail that is neglected by standard models are discussed. Similarities between the theoretical approach recommended and current work on perception–action robots are explored.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Allen J S, Miller J L 2004 Listener sensitivity to individual talker differences in voice-onset-time. J. Acoust. Soc. Am. 116: 3171–3183

    Article  Google Scholar 

  • Baker R 2008 The production and perception of morphologically and grammatically conditioned phonetic detail (Cambridge, University of Cambridge)

    Google Scholar 

  • Baker R, Smith R, Hawkins S 2007 Phonetic differences between mis- and dis- in English prefixed and pseudo-prefixed words. 16th Int. Congr. Phonetic Sciences W J Barry, J Trouvain (eds.) (Saarbrücken: http://www.icphs2007.de/). 553–556, Paper ID 1507

  • Baker R, Smith R, Hawkins S Phonetic detail that distinguishes prefixed from pseudo-prefixed words. J. Phonetics (under revision)

  • Bell-Berti F, Harris K S 1981 A temporal model of speech production. Phonetica 38: 9–20

    Article  Google Scholar 

  • Benguerel A-P, Cowan H A 1974 Coarticulation of upper lip protrusion in French. Phonetica 30: 41–55

    Article  Google Scholar 

  • Bradlow A R, Nygaard L C, Pisoni D B 1999 Effects of talker, rate and amplitude variation on recognition memory for spoken words. Percept. Psychophys. 61(2): 206–219

    Article  Google Scholar 

  • Clark J, Yallop C, Fletcher J 2006 An Introduction to phonetics and phonology (3rd ed.): (Oxford: John Wiley and Sons Ltd.)

    Google Scholar 

  • Clift R 2001 Meaning in interaction: The case of actually. Language 77(2): 245–290

    Article  Google Scholar 

  • Coleman J S 2003 Discovering the acoustic correlates of phonological contrasts. J. Phonetics 31: 351–372

    Article  Google Scholar 

  • Cruttenden A 2001 Gimson’s Pronunciation of English (6th ed.) (Latest edition of An introduction to the pronunciation of English by A.C. Gimson. London: Arnold)

  • Duffy S A, Pisoni D B 1992 Comprehension of synthetic speech produced by rule: A review and theoretical interpretation. Lang. Speech 35: 351–389

    Google Scholar 

  • Fougeron C 2001 Articulatory properties of initial segments in several prosodic constituents in French. J. Phonetics 29: 109–135

    Article  Google Scholar 

  • Garcia Lecumberri M L, Cooke M P 2006 Effect of masker type on native and non-native consonant perception in noise. J. Acoust. Soc. Am. 119(4): 2445–2454

    Article  Google Scholar 

  • Gaskell M G, Marslen-Wilson W 1997 Integrating form and meaning: A distributed model of speech perception. Lang. Cognit. Process. 12: 613–656

    Article  Google Scholar 

  • Gaskell M G, Marslen-Wilson W D 2001 Lexical ambiguity and spoken word recognition: Bridging the gap. J. Mem. Lang. 44: 325–349

    Article  Google Scholar 

  • Goldsmith J A 1990 Autosegmental and metrical phonology (Oxford: Basil Blackwell)

    Google Scholar 

  • Goldsmith J A 1994 Disentangling autosegments: a response. J. Linguistics 30: 499–507

    Article  Google Scholar 

  • Grossberg S 2003 Resonant neural dynamics of speech perception. J. Phonetics 31: 423–445

    Article  Google Scholar 

  • Hawkins S 2003 Roles and representations of systematic fine phonetic detail in speech understanding. J. Phonetics 31: 373–405

    Article  Google Scholar 

  • Hawkins S 2010a Phonetic variation as communicative system: Perception of the particular and the abstract, in C Fougeron, B Kühnert, M d’Imperio, N Vallée (eds.) Laboratory Phonology 10: Variability, Phonetic Detail and Phonological Representation Berlin: Mouton de Gruyter, 479–510

  • Hawkins S 2010b Phonological features, auditory objects, and illusions. J. Phonetics 38(1): 60–89

    Article  Google Scholar 

  • Hawkins S, Nguyen N 2004 Influence of syllable-coda voicing on the acoustic properties of syllable-onset /l/ in English. J. Phonetics 32(2): 199–231

    Article  Google Scholar 

  • Hawkins S, Smith R H 2001 Polysp: A polysystemic, phonetically-rich approach to speech understanding. Italian J. Linguistics-Rivista di Linguistica 13: 99–188. http://kiri.ling.cam.ac.uk/sarah/TIPS/hawkins-smith-101.pdf

  • Hay J, Drager K 2010 Stuffed toys and speech perception. Linguistics 48(4): 865–892

    Article  Google Scholar 

  • Heid S, Hawkins S 2000 An acoustical study of long domain /r/ and /l/ coarticulation, Speech Production: Models and Data, and CREST Workshop on Models of Speech Production: Motor Planning and Articulatory Modelling, Munich: Institut für Phonetik und Sprachliche Kommunikation, Ludwig-Maximilians-Universität, 77–80

  • Heinrich A, Flory Y, Hawkins S 2010 Influence of English r-resonances on intelligibility of speech in noise for native English and German listeners. Speech Commun. 52: 1038–1055

    Article  Google Scholar 

  • Hertz S R 1991 Streams, phones and transitions: toward a new phonological and phonetic model of formant timing. J. Phonetics 19: 91–109

    Google Scholar 

  • Hertz S R 2006 A model of the regularities underlying speaker variation. Proc. Interspeech (revised version) available from http://linguistics.cornell.edu/people/Hertz.cfm

  • Hertz S R, Huffman M K 1992 A nucleus-based timing model applied to multi-dialect speech synthesis by rule 2nd International Conference on Spoken Language Processing: ICSLP-1992, Banff, Alberta, Canada, 1171–1174

  • Jones D 1967 The phoneme (Cambridge: Cambridge University Press, reissued 1976, 2009)

  • Just M A, Cherkassky V L, Aryal S, Mitchell T M 2010 A neurosemantic theory of concrete noun representation based on the underlying brain codes. PLoS ONE 5(1): e8622

    Article  Google Scholar 

  • Keating P A, Cho T, Fougeron C, Hsu C-S 2004 Domain-specific articulatory strengthening in four languages. Phonetic Interpretation: Papers in Laboratory Phonology VI, J K Local, R A Ogden, R A M Temple (eds.), Cambridge: Cambridge University Press, 143–161

  • Klatt D H 1976 Linguistic uses of segmental duration in English: Acoustic and perceptual evidence. J. Acoust. Soc. Am. 59(5): 1208–1221

    Article  Google Scholar 

  • Klatt D H 1989 Review of selected models of speech perception in W D Marslen-Wilson (ed.), Lexical representation and process Cambridge, MA: MIT Press, 169–226

  • Large E W, Jones M R 1999 The dynamics of attending: How people track time-varying events. Psychol. Rev. 106(1): 119–159

    Article  Google Scholar 

  • Local J K 2003 Variable domains and variable relevance: Interpreting phonetic exponents. J. Phonetics 31: 321–339

    Article  Google Scholar 

  • Local J K 2007 Phonetic detail and the organisation of talk-in-interaction. 16th Int. Congr. Phonetic Sciences W J Barry, J Trouvain (eds.) (Saarbrücken: http://www.icphs2007.de/). 1–10, Paper ID 1785

  • Local J K, Walker G 2005 Methodological imperatives for investigating the phonetic organization adn phonological structures of spontaneous speech. Phonetica 62: 1–11

    Article  Google Scholar 

  • Lodge K 2003 A declarative treatment of the phonetics and phonology of German rhymal /r/. Lingua 113: 931–951

    Article  Google Scholar 

  • Lodge K 2009 A critical introduction to phonetics (London: Conitinuum International Publishing Group)

    Google Scholar 

  • Mattys S, White L, Melhorn J F 2005 Integration of multiple speech segmentation cues: A hierarchical framework. J. Exp. Psychol.: General 134(4): 477–500

    Article  Google Scholar 

  • McClelland J L, Elman J L 1986 The TRACE model of speech perception. Cognitive Psychol. 18(1): 1–86

    Article  Google Scholar 

  • McClelland J L, Mirman D, Holt L L 2006 Are there interactive processes in speech perception? TRENDS Cognit. Sci. 10: 363–369

    Article  Google Scholar 

  • Miller G A, Heise G A, Lichten W 1951 The intelligibility of speech as a function of the context of the test materials. J. Exp. Psychol. 41: 329–335

    Article  Google Scholar 

  • Mitchell T M, Shinkareva S V, Carlson A, Chang K-M, Malave V L, Mason R A, Just M A 2008 Predicting human brain activity associated with the meanings of nouns. Science 320: 1191–1195

    Article  Google Scholar 

  • Moore R K 2007 Spoken language processing: Piecing together the puzzle. Speech Commun. 49(5): 418–435

    Article  Google Scholar 

  • Norris D G 1994 Shortlist: a connectionist model of continuous speech recognition. Cognition 52: 189–234

    Article  Google Scholar 

  • Norris D G, McQueen J M, Cutler A 2000 Merging information in speech recognition: Feedback is never necessary. Behav. Brain Sci. 23: 299–370

    Article  Google Scholar 

  • Nygaard L C, Pisoni D B 1998 Talker-specific learning in speech perception. Percept. Psychophys. 60: 355–376

    Article  Google Scholar 

  • Ogden R 1993 What Firthian prosodic analysis has to say to us. Computational Phonology: Edinburgh Working Papers in Cognitive Science 8: 107–127

    Google Scholar 

  • Ogden R A 1999 A declarative account of strong and weak auxiliaries in English. Phonology 16: 55–92

    Article  Google Scholar 

  • Ogden R A 2004 Non-modal voice quality and turn-taking in Finnish. Sound patterns in interaction, E Couper-Kuhlen, C Ford (eds.), Amsterdam: Benjamins, 29–62

  • Ogden R A, Local J K 1994 Disentangling autosegments from prosodies: a note on the misrepresentation of a research tradition in phonology. J. Linguist. 30: 477–498

    Article  Google Scholar 

  • Ogden R A, Routarinne S 2005 The communicative functions of final rises in Finnish intonation. Phonetica 62(2–4): 160–175

    Article  Google Scholar 

  • Ogden R A, Hawkins S, House J, Huckvale M, Local J K, Carter P, Dankovicová J, Heid S 2000 ProSynth: An integrated prosodic approach to device-independent, natural-sounding speech synthesis. Comput. Speech Lang. 14: 177–210

    Article  Google Scholar 

  • Piccolino Boniforti M A, Ludusan B, Hawkins S, Norris D 2010 Same phonemic sequence, different acoustic pattern and grammatical status. A model. F Cutugno, P Maturi, R Savy, G Abete, I Alfano (eds.), Parlare con le persone, parlare alle macchine: la dimensione interazionale della comunicazione verbale. VI Convegno Nazionale AISV - Associazione Italiana di Scienze della Voce., Naples, Italy, 279–291

  • Pickett J M, Pollack I 1963 Intelligibility of excerpts from fluent speech: Effects of rate of utterance and duration of excerpt. Lang. Speech 6: 151–164

    Google Scholar 

  • Pierrehumbert J 2003 Probabilistic phonology: Discrimination and robustness, in R Bod, J Hay, S Jannedy (eds.), Probability theory in linguistics Cambridge, MA: MIT Press., 177–228

  • Pisoni D B, Lively S E, Logan J S 1994 Perceptual learning of nonnative speech contrasts: Implications for theories of speech perception, in D Goodman, H C Nusbaum (eds.), The development of speech perception: The transition from speech sounds to spoken words, Cambridge, MA/ London: MIT Press, 121–166

  • Plug L 2005 From words to actions: The phonetics of Eigenlijk in two communicative contexts. Phonetica 62(2–4): 131–145

    Article  Google Scholar 

  • Post B, D’Imperio M, Gussenhoven C 2007 Fine phonetic detail and intonational meaning. 16th Int. Cong. Phonetic Sciences W J Barry, J Trouvain (eds.), (Saarbrücken: http://www.icphs2007.de/). 191–196, Paper ID 1723

  • Pulvermüller F 1999 Words in the brain’s language. Behav. Brain Sci. 22: 253–336

    Article  Google Scholar 

  • Raposo A, Moss H E, Stamatakis E A, Tyler L K 2009 Modulation of motor and premotor cortices by actions, action words, and action sentences. Neuropsychologia 47: 388–396

    Article  Google Scholar 

  • Roy D 2005a Grounding words in perception and action: Computational insights. Trends Cognit. Sci. 9(8): 389–396

    Article  Google Scholar 

  • Roy D 2005b Semiotic schemas: A framework for grounding language in action and perception. Artif. Intell. 167(1–2): 170–205

    Article  Google Scholar 

  • Sprague N, Ballard D, Robinson A 2007 Modeling embodied visual behaviors. ACM Trans. Appl. Percep. 4(2): Article 11

    Article  Google Scholar 

  • Tily H, Gahl S, Inbal A, Snider N, Kothari A, Bresnan J 2009 Syntactic probabilities affect pronunciation variation in spontaneous speech. Lang. Cognit. 1–2: 147–165

    Article  Google Scholar 

  • Turk A, Shattuck-Hufnagel S 2000 Word-boundary-related duration patterns in English. J. Phonetics 28: 397–440

    Article  Google Scholar 

  • Tyler L K, Randall B, Stamatakis E A 2008 Cortical differentiation for nouns and verbs depends on grammatical markers. J. Cognit. Neurosci. 20(8): 1381–1389

    Article  Google Scholar 

  • West P 1999 Perception of distributed coarticulatory properties of English /l/ and /ɹ/. J. Phonetics 27(4): 405–426

    Article  Google Scholar 

  • Wiese R 1997 Underspecification and the description of Chinese vowels, in J L Wang (ed.), Studies in Chinese phonology Berlin: Mouton de Gruyter, 219–249

  • Yu C, Ballard D H 2004 A multimodal learning interface for grounding spoken language in sensory perceptions. ACM Trans. on Appl. Percept. 1(1): 57–80

    Article  Google Scholar 

  • Yu C, Ballard D, Aslin R N 2003 The role of embodied intention in early lexical acquisition. Meeting of the Cognitive Science Soc. Boston, MA

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to S HAWKINS.

Rights and permissions

Reprints and permissions

About this article

Cite this article

HAWKINS, S. Phonetic perspectives on modelling information in the speech signal. Sadhana 36, 555–586 (2011). https://doi.org/10.1007/s12046-011-0038-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12046-011-0038-0

Keywords

Navigation