Abstract
This paper reassesses conventional assumptions about the informativeness of the acoustic speech signal, and shows how recent research on systematic variability in the acoustic signal is consistent with an alternative linguistic model that is more biologically plausible and compatible with recent advances in modelling embodied visual perception and action. Standard assumptions about the information available from the speech signal, especially strengths and limitations of phonological features and phonemes, are reviewed, and compared with an alternative approach based on Firthian prosodic analysis (FPA). FPA places more emphasis than standard models on the linguistic and interactional function of an utterance, de-emphasizes the need to identify phonemes, and uses formalisms that force us to recognize that every perceptual decision is context- and task-dependent. Examples of perceptually-significant phonetic detail that is neglected by standard models are discussed. Similarities between the theoretical approach recommended and current work on perception–action robots are explored.
Similar content being viewed by others
References
Allen J S, Miller J L 2004 Listener sensitivity to individual talker differences in voice-onset-time. J. Acoust. Soc. Am. 116: 3171–3183
Baker R 2008 The production and perception of morphologically and grammatically conditioned phonetic detail (Cambridge, University of Cambridge)
Baker R, Smith R, Hawkins S 2007 Phonetic differences between mis- and dis- in English prefixed and pseudo-prefixed words. 16th Int. Congr. Phonetic Sciences W J Barry, J Trouvain (eds.) (Saarbrücken: http://www.icphs2007.de/). 553–556, Paper ID 1507
Baker R, Smith R, Hawkins S Phonetic detail that distinguishes prefixed from pseudo-prefixed words. J. Phonetics (under revision)
Bell-Berti F, Harris K S 1981 A temporal model of speech production. Phonetica 38: 9–20
Benguerel A-P, Cowan H A 1974 Coarticulation of upper lip protrusion in French. Phonetica 30: 41–55
Bradlow A R, Nygaard L C, Pisoni D B 1999 Effects of talker, rate and amplitude variation on recognition memory for spoken words. Percept. Psychophys. 61(2): 206–219
Clark J, Yallop C, Fletcher J 2006 An Introduction to phonetics and phonology (3rd ed.): (Oxford: John Wiley and Sons Ltd.)
Clift R 2001 Meaning in interaction: The case of actually. Language 77(2): 245–290
Coleman J S 2003 Discovering the acoustic correlates of phonological contrasts. J. Phonetics 31: 351–372
Cruttenden A 2001 Gimson’s Pronunciation of English (6th ed.) (Latest edition of An introduction to the pronunciation of English by A.C. Gimson. London: Arnold)
Duffy S A, Pisoni D B 1992 Comprehension of synthetic speech produced by rule: A review and theoretical interpretation. Lang. Speech 35: 351–389
Fougeron C 2001 Articulatory properties of initial segments in several prosodic constituents in French. J. Phonetics 29: 109–135
Garcia Lecumberri M L, Cooke M P 2006 Effect of masker type on native and non-native consonant perception in noise. J. Acoust. Soc. Am. 119(4): 2445–2454
Gaskell M G, Marslen-Wilson W 1997 Integrating form and meaning: A distributed model of speech perception. Lang. Cognit. Process. 12: 613–656
Gaskell M G, Marslen-Wilson W D 2001 Lexical ambiguity and spoken word recognition: Bridging the gap. J. Mem. Lang. 44: 325–349
Goldsmith J A 1990 Autosegmental and metrical phonology (Oxford: Basil Blackwell)
Goldsmith J A 1994 Disentangling autosegments: a response. J. Linguistics 30: 499–507
Grossberg S 2003 Resonant neural dynamics of speech perception. J. Phonetics 31: 423–445
Hawkins S 2003 Roles and representations of systematic fine phonetic detail in speech understanding. J. Phonetics 31: 373–405
Hawkins S 2010a Phonetic variation as communicative system: Perception of the particular and the abstract, in C Fougeron, B Kühnert, M d’Imperio, N Vallée (eds.) Laboratory Phonology 10: Variability, Phonetic Detail and Phonological Representation Berlin: Mouton de Gruyter, 479–510
Hawkins S 2010b Phonological features, auditory objects, and illusions. J. Phonetics 38(1): 60–89
Hawkins S, Nguyen N 2004 Influence of syllable-coda voicing on the acoustic properties of syllable-onset /l/ in English. J. Phonetics 32(2): 199–231
Hawkins S, Smith R H 2001 Polysp: A polysystemic, phonetically-rich approach to speech understanding. Italian J. Linguistics-Rivista di Linguistica 13: 99–188. http://kiri.ling.cam.ac.uk/sarah/TIPS/hawkins-smith-101.pdf
Hay J, Drager K 2010 Stuffed toys and speech perception. Linguistics 48(4): 865–892
Heid S, Hawkins S 2000 An acoustical study of long domain /r/ and /l/ coarticulation, Speech Production: Models and Data, and CREST Workshop on Models of Speech Production: Motor Planning and Articulatory Modelling, Munich: Institut für Phonetik und Sprachliche Kommunikation, Ludwig-Maximilians-Universität, 77–80
Heinrich A, Flory Y, Hawkins S 2010 Influence of English r-resonances on intelligibility of speech in noise for native English and German listeners. Speech Commun. 52: 1038–1055
Hertz S R 1991 Streams, phones and transitions: toward a new phonological and phonetic model of formant timing. J. Phonetics 19: 91–109
Hertz S R 2006 A model of the regularities underlying speaker variation. Proc. Interspeech (revised version) available from http://linguistics.cornell.edu/people/Hertz.cfm
Hertz S R, Huffman M K 1992 A nucleus-based timing model applied to multi-dialect speech synthesis by rule 2nd International Conference on Spoken Language Processing: ICSLP-1992, Banff, Alberta, Canada, 1171–1174
Jones D 1967 The phoneme (Cambridge: Cambridge University Press, reissued 1976, 2009)
Just M A, Cherkassky V L, Aryal S, Mitchell T M 2010 A neurosemantic theory of concrete noun representation based on the underlying brain codes. PLoS ONE 5(1): e8622
Keating P A, Cho T, Fougeron C, Hsu C-S 2004 Domain-specific articulatory strengthening in four languages. Phonetic Interpretation: Papers in Laboratory Phonology VI, J K Local, R A Ogden, R A M Temple (eds.), Cambridge: Cambridge University Press, 143–161
Klatt D H 1976 Linguistic uses of segmental duration in English: Acoustic and perceptual evidence. J. Acoust. Soc. Am. 59(5): 1208–1221
Klatt D H 1989 Review of selected models of speech perception in W D Marslen-Wilson (ed.), Lexical representation and process Cambridge, MA: MIT Press, 169–226
Large E W, Jones M R 1999 The dynamics of attending: How people track time-varying events. Psychol. Rev. 106(1): 119–159
Local J K 2003 Variable domains and variable relevance: Interpreting phonetic exponents. J. Phonetics 31: 321–339
Local J K 2007 Phonetic detail and the organisation of talk-in-interaction. 16th Int. Congr. Phonetic Sciences W J Barry, J Trouvain (eds.) (Saarbrücken: http://www.icphs2007.de/). 1–10, Paper ID 1785
Local J K, Walker G 2005 Methodological imperatives for investigating the phonetic organization adn phonological structures of spontaneous speech. Phonetica 62: 1–11
Lodge K 2003 A declarative treatment of the phonetics and phonology of German rhymal /r/. Lingua 113: 931–951
Lodge K 2009 A critical introduction to phonetics (London: Conitinuum International Publishing Group)
Mattys S, White L, Melhorn J F 2005 Integration of multiple speech segmentation cues: A hierarchical framework. J. Exp. Psychol.: General 134(4): 477–500
McClelland J L, Elman J L 1986 The TRACE model of speech perception. Cognitive Psychol. 18(1): 1–86
McClelland J L, Mirman D, Holt L L 2006 Are there interactive processes in speech perception? TRENDS Cognit. Sci. 10: 363–369
Miller G A, Heise G A, Lichten W 1951 The intelligibility of speech as a function of the context of the test materials. J. Exp. Psychol. 41: 329–335
Mitchell T M, Shinkareva S V, Carlson A, Chang K-M, Malave V L, Mason R A, Just M A 2008 Predicting human brain activity associated with the meanings of nouns. Science 320: 1191–1195
Moore R K 2007 Spoken language processing: Piecing together the puzzle. Speech Commun. 49(5): 418–435
Norris D G 1994 Shortlist: a connectionist model of continuous speech recognition. Cognition 52: 189–234
Norris D G, McQueen J M, Cutler A 2000 Merging information in speech recognition: Feedback is never necessary. Behav. Brain Sci. 23: 299–370
Nygaard L C, Pisoni D B 1998 Talker-specific learning in speech perception. Percept. Psychophys. 60: 355–376
Ogden R 1993 What Firthian prosodic analysis has to say to us. Computational Phonology: Edinburgh Working Papers in Cognitive Science 8: 107–127
Ogden R A 1999 A declarative account of strong and weak auxiliaries in English. Phonology 16: 55–92
Ogden R A 2004 Non-modal voice quality and turn-taking in Finnish. Sound patterns in interaction, E Couper-Kuhlen, C Ford (eds.), Amsterdam: Benjamins, 29–62
Ogden R A, Local J K 1994 Disentangling autosegments from prosodies: a note on the misrepresentation of a research tradition in phonology. J. Linguist. 30: 477–498
Ogden R A, Routarinne S 2005 The communicative functions of final rises in Finnish intonation. Phonetica 62(2–4): 160–175
Ogden R A, Hawkins S, House J, Huckvale M, Local J K, Carter P, Dankovicová J, Heid S 2000 ProSynth: An integrated prosodic approach to device-independent, natural-sounding speech synthesis. Comput. Speech Lang. 14: 177–210
Piccolino Boniforti M A, Ludusan B, Hawkins S, Norris D 2010 Same phonemic sequence, different acoustic pattern and grammatical status. A model. F Cutugno, P Maturi, R Savy, G Abete, I Alfano (eds.), Parlare con le persone, parlare alle macchine: la dimensione interazionale della comunicazione verbale. VI Convegno Nazionale AISV - Associazione Italiana di Scienze della Voce., Naples, Italy, 279–291
Pickett J M, Pollack I 1963 Intelligibility of excerpts from fluent speech: Effects of rate of utterance and duration of excerpt. Lang. Speech 6: 151–164
Pierrehumbert J 2003 Probabilistic phonology: Discrimination and robustness, in R Bod, J Hay, S Jannedy (eds.), Probability theory in linguistics Cambridge, MA: MIT Press., 177–228
Pisoni D B, Lively S E, Logan J S 1994 Perceptual learning of nonnative speech contrasts: Implications for theories of speech perception, in D Goodman, H C Nusbaum (eds.), The development of speech perception: The transition from speech sounds to spoken words, Cambridge, MA/ London: MIT Press, 121–166
Plug L 2005 From words to actions: The phonetics of Eigenlijk in two communicative contexts. Phonetica 62(2–4): 131–145
Post B, D’Imperio M, Gussenhoven C 2007 Fine phonetic detail and intonational meaning. 16th Int. Cong. Phonetic Sciences W J Barry, J Trouvain (eds.), (Saarbrücken: http://www.icphs2007.de/). 191–196, Paper ID 1723
Pulvermüller F 1999 Words in the brain’s language. Behav. Brain Sci. 22: 253–336
Raposo A, Moss H E, Stamatakis E A, Tyler L K 2009 Modulation of motor and premotor cortices by actions, action words, and action sentences. Neuropsychologia 47: 388–396
Roy D 2005a Grounding words in perception and action: Computational insights. Trends Cognit. Sci. 9(8): 389–396
Roy D 2005b Semiotic schemas: A framework for grounding language in action and perception. Artif. Intell. 167(1–2): 170–205
Sprague N, Ballard D, Robinson A 2007 Modeling embodied visual behaviors. ACM Trans. Appl. Percep. 4(2): Article 11
Tily H, Gahl S, Inbal A, Snider N, Kothari A, Bresnan J 2009 Syntactic probabilities affect pronunciation variation in spontaneous speech. Lang. Cognit. 1–2: 147–165
Turk A, Shattuck-Hufnagel S 2000 Word-boundary-related duration patterns in English. J. Phonetics 28: 397–440
Tyler L K, Randall B, Stamatakis E A 2008 Cortical differentiation for nouns and verbs depends on grammatical markers. J. Cognit. Neurosci. 20(8): 1381–1389
West P 1999 Perception of distributed coarticulatory properties of English /l/ and /ɹ/. J. Phonetics 27(4): 405–426
Wiese R 1997 Underspecification and the description of Chinese vowels, in J L Wang (ed.), Studies in Chinese phonology Berlin: Mouton de Gruyter, 219–249
Yu C, Ballard D H 2004 A multimodal learning interface for grounding spoken language in sensory perceptions. ACM Trans. on Appl. Percept. 1(1): 57–80
Yu C, Ballard D, Aslin R N 2003 The role of embodied intention in early lexical acquisition. Meeting of the Cognitive Science Soc. Boston, MA
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
HAWKINS, S. Phonetic perspectives on modelling information in the speech signal. Sadhana 36, 555–586 (2011). https://doi.org/10.1007/s12046-011-0038-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12046-011-0038-0