Phonetic perspectives on modelling information in the speech signal

HAWKINS, S

doi:10.1007/s12046-011-0038-0

Phonetic perspectives on modelling information in the speech signal

Published: 22 November 2011

Volume 36, pages 555–586, (2011)
Cite this article

Sadhana Aims and scope Submit manuscript

S HAWKINS¹

125 Accesses
Explore all metrics

Abstract

This paper reassesses conventional assumptions about the informativeness of the acoustic speech signal, and shows how recent research on systematic variability in the acoustic signal is consistent with an alternative linguistic model that is more biologically plausible and compatible with recent advances in modelling embodied visual perception and action. Standard assumptions about the information available from the speech signal, especially strengths and limitations of phonological features and phonemes, are reviewed, and compared with an alternative approach based on Firthian prosodic analysis (FPA). FPA places more emphasis than standard models on the linguistic and interactional function of an utterance, de-emphasizes the need to identify phonemes, and uses formalisms that force us to recognize that every perceptual decision is context- and task-dependent. Examples of perceptually-significant phonetic detail that is neglected by standard models are discussed. Similarities between the theoretical approach recommended and current work on perception–action robots are explored.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Allen J S, Miller J L 2004 Listener sensitivity to individual talker differences in voice-onset-time. J. Acoust. Soc. Am. 116: 3171–3183
Article Google Scholar
Baker R 2008 The production and perception of morphologically and grammatically conditioned phonetic detail (Cambridge, University of Cambridge)
Google Scholar
Baker R, Smith R, Hawkins S 2007 Phonetic differences between mis- and dis- in English prefixed and pseudo-prefixed words. 16th Int. Congr. Phonetic Sciences W J Barry, J Trouvain (eds.) (Saarbrücken: http://www.icphs2007.de/). 553–556, Paper ID 1507
Baker R, Smith R, Hawkins S Phonetic detail that distinguishes prefixed from pseudo-prefixed words. J. Phonetics (under revision)
Bell-Berti F, Harris K S 1981 A temporal model of speech production. Phonetica 38: 9–20
Article Google Scholar
Benguerel A-P, Cowan H A 1974 Coarticulation of upper lip protrusion in French. Phonetica 30: 41–55
Article Google Scholar
Bradlow A R, Nygaard L C, Pisoni D B 1999 Effects of talker, rate and amplitude variation on recognition memory for spoken words. Percept. Psychophys. 61(2): 206–219
Article Google Scholar
Clark J, Yallop C, Fletcher J 2006 An Introduction to phonetics and phonology (3rd ed.): (Oxford: John Wiley and Sons Ltd.)
Google Scholar
Clift R 2001 Meaning in interaction: The case of actually. Language 77(2): 245–290
Article Google Scholar
Coleman J S 2003 Discovering the acoustic correlates of phonological contrasts. J. Phonetics 31: 351–372
Article Google Scholar
Cruttenden A 2001 Gimson’s Pronunciation of English (6th ed.) (Latest edition of An introduction to the pronunciation of English by A.C. Gimson. London: Arnold)
Duffy S A, Pisoni D B 1992 Comprehension of synthetic speech produced by rule: A review and theoretical interpretation. Lang. Speech 35: 351–389
Google Scholar
Fougeron C 2001 Articulatory properties of initial segments in several prosodic constituents in French. J. Phonetics 29: 109–135
Article Google Scholar
Garcia Lecumberri M L, Cooke M P 2006 Effect of masker type on native and non-native consonant perception in noise. J. Acoust. Soc. Am. 119(4): 2445–2454
Article Google Scholar
Gaskell M G, Marslen-Wilson W 1997 Integrating form and meaning: A distributed model of speech perception. Lang. Cognit. Process. 12: 613–656
Article Google Scholar
Gaskell M G, Marslen-Wilson W D 2001 Lexical ambiguity and spoken word recognition: Bridging the gap. J. Mem. Lang. 44: 325–349
Article Google Scholar
Goldsmith J A 1990 Autosegmental and metrical phonology (Oxford: Basil Blackwell)
Google Scholar
Goldsmith J A 1994 Disentangling autosegments: a response. J. Linguistics 30: 499–507
Article Google Scholar
Grossberg S 2003 Resonant neural dynamics of speech perception. J. Phonetics 31: 423–445
Article Google Scholar
Hawkins S 2003 Roles and representations of systematic fine phonetic detail in speech understanding. J. Phonetics 31: 373–405
Article Google Scholar
Hawkins S 2010a Phonetic variation as communicative system: Perception of the particular and the abstract, in C Fougeron, B Kühnert, M d’Imperio, N Vallée (eds.) Laboratory Phonology 10: Variability, Phonetic Detail and Phonological Representation Berlin: Mouton de Gruyter, 479–510
Hawkins S 2010b Phonological features, auditory objects, and illusions. J. Phonetics 38(1): 60–89
Article Google Scholar
Hawkins S, Nguyen N 2004 Influence of syllable-coda voicing on the acoustic properties of syllable-onset /l/ in English. J. Phonetics 32(2): 199–231
Article Google Scholar
Hawkins S, Smith R H 2001 Polysp: A polysystemic, phonetically-rich approach to speech understanding. Italian J. Linguistics-Rivista di Linguistica 13: 99–188. http://kiri.ling.cam.ac.uk/sarah/TIPS/hawkins-smith-101.pdf
Hay J, Drager K 2010 Stuffed toys and speech perception. Linguistics 48(4): 865–892
Article Google Scholar
Heid S, Hawkins S 2000 An acoustical study of long domain /r/ and /l/ coarticulation, Speech Production: Models and Data, and CREST Workshop on Models of Speech Production: Motor Planning and Articulatory Modelling, Munich: Institut für Phonetik und Sprachliche Kommunikation, Ludwig-Maximilians-Universität, 77–80
Heinrich A, Flory Y, Hawkins S 2010 Influence of English r-resonances on intelligibility of speech in noise for native English and German listeners. Speech Commun. 52: 1038–1055
Article Google Scholar
Hertz S R 1991 Streams, phones and transitions: toward a new phonological and phonetic model of formant timing. J. Phonetics 19: 91–109
Google Scholar
Hertz S R 2006 A model of the regularities underlying speaker variation. Proc. Interspeech (revised version) available from http://linguistics.cornell.edu/people/Hertz.cfm
Hertz S R, Huffman M K 1992 A nucleus-based timing model applied to multi-dialect speech synthesis by rule 2nd International Conference on Spoken Language Processing: ICSLP-1992, Banff, Alberta, Canada, 1171–1174
Jones D 1967 The phoneme (Cambridge: Cambridge University Press, reissued 1976, 2009)
Just M A, Cherkassky V L, Aryal S, Mitchell T M 2010 A neurosemantic theory of concrete noun representation based on the underlying brain codes. PLoS ONE 5(1): e8622
Article Google Scholar
Keating P A, Cho T, Fougeron C, Hsu C-S 2004 Domain-specific articulatory strengthening in four languages. Phonetic Interpretation: Papers in Laboratory Phonology VI, J K Local, R A Ogden, R A M Temple (eds.), Cambridge: Cambridge University Press, 143–161
Klatt D H 1976 Linguistic uses of segmental duration in English: Acoustic and perceptual evidence. J. Acoust. Soc. Am. 59(5): 1208–1221
Article Google Scholar
Klatt D H 1989 Review of selected models of speech perception in W D Marslen-Wilson (ed.), Lexical representation and process Cambridge, MA: MIT Press, 169–226
Large E W, Jones M R 1999 The dynamics of attending: How people track time-varying events. Psychol. Rev. 106(1): 119–159
Article Google Scholar
Local J K 2003 Variable domains and variable relevance: Interpreting phonetic exponents. J. Phonetics 31: 321–339
Article Google Scholar
Local J K 2007 Phonetic detail and the organisation of talk-in-interaction. 16th Int. Congr. Phonetic Sciences W J Barry, J Trouvain (eds.) (Saarbrücken: http://www.icphs2007.de/). 1–10, Paper ID 1785
Local J K, Walker G 2005 Methodological imperatives for investigating the phonetic organization adn phonological structures of spontaneous speech. Phonetica 62: 1–11
Article Google Scholar
Lodge K 2003 A declarative treatment of the phonetics and phonology of German rhymal /r/. Lingua 113: 931–951
Article Google Scholar
Lodge K 2009 A critical introduction to phonetics (London: Conitinuum International Publishing Group)
Google Scholar
Mattys S, White L, Melhorn J F 2005 Integration of multiple speech segmentation cues: A hierarchical framework. J. Exp. Psychol.: General 134(4): 477–500
Article Google Scholar
McClelland J L, Elman J L 1986 The TRACE model of speech perception. Cognitive Psychol. 18(1): 1–86
Article Google Scholar
McClelland J L, Mirman D, Holt L L 2006 Are there interactive processes in speech perception? TRENDS Cognit. Sci. 10: 363–369
Article Google Scholar
Miller G A, Heise G A, Lichten W 1951 The intelligibility of speech as a function of the context of the test materials. J. Exp. Psychol. 41: 329–335
Article Google Scholar
Mitchell T M, Shinkareva S V, Carlson A, Chang K-M, Malave V L, Mason R A, Just M A 2008 Predicting human brain activity associated with the meanings of nouns. Science 320: 1191–1195
Article Google Scholar
Moore R K 2007 Spoken language processing: Piecing together the puzzle. Speech Commun. 49(5): 418–435
Article Google Scholar
Norris D G 1994 Shortlist: a connectionist model of continuous speech recognition. Cognition 52: 189–234
Article Google Scholar
Norris D G, McQueen J M, Cutler A 2000 Merging information in speech recognition: Feedback is never necessary. Behav. Brain Sci. 23: 299–370
Article Google Scholar
Nygaard L C, Pisoni D B 1998 Talker-specific learning in speech perception. Percept. Psychophys. 60: 355–376
Article Google Scholar
Ogden R 1993 What Firthian prosodic analysis has to say to us. Computational Phonology: Edinburgh Working Papers in Cognitive Science 8: 107–127
Google Scholar
Ogden R A 1999 A declarative account of strong and weak auxiliaries in English. Phonology 16: 55–92
Article Google Scholar
Ogden R A 2004 Non-modal voice quality and turn-taking in Finnish. Sound patterns in interaction, E Couper-Kuhlen, C Ford (eds.), Amsterdam: Benjamins, 29–62
Ogden R A, Local J K 1994 Disentangling autosegments from prosodies: a note on the misrepresentation of a research tradition in phonology. J. Linguist. 30: 477–498
Article Google Scholar
Ogden R A, Routarinne S 2005 The communicative functions of final rises in Finnish intonation. Phonetica 62(2–4): 160–175
Article Google Scholar
Ogden R A, Hawkins S, House J, Huckvale M, Local J K, Carter P, Dankovicová J, Heid S 2000 ProSynth: An integrated prosodic approach to device-independent, natural-sounding speech synthesis. Comput. Speech Lang. 14: 177–210
Article Google Scholar
Piccolino Boniforti M A, Ludusan B, Hawkins S, Norris D 2010 Same phonemic sequence, different acoustic pattern and grammatical status. A model. F Cutugno, P Maturi, R Savy, G Abete, I Alfano (eds.), Parlare con le persone, parlare alle macchine: la dimensione interazionale della comunicazione verbale. VI Convegno Nazionale AISV - Associazione Italiana di Scienze della Voce., Naples, Italy, 279–291
Pickett J M, Pollack I 1963 Intelligibility of excerpts from fluent speech: Effects of rate of utterance and duration of excerpt. Lang. Speech 6: 151–164
Google Scholar
Pierrehumbert J 2003 Probabilistic phonology: Discrimination and robustness, in R Bod, J Hay, S Jannedy (eds.), Probability theory in linguistics Cambridge, MA: MIT Press., 177–228
Pisoni D B, Lively S E, Logan J S 1994 Perceptual learning of nonnative speech contrasts: Implications for theories of speech perception, in D Goodman, H C Nusbaum (eds.), The development of speech perception: The transition from speech sounds to spoken words, Cambridge, MA/ London: MIT Press, 121–166
Plug L 2005 From words to actions: The phonetics of Eigenlijk in two communicative contexts. Phonetica 62(2–4): 131–145
Article Google Scholar
Post B, D’Imperio M, Gussenhoven C 2007 Fine phonetic detail and intonational meaning. 16th Int. Cong. Phonetic Sciences W J Barry, J Trouvain (eds.), (Saarbrücken: http://www.icphs2007.de/). 191–196, Paper ID 1723
Pulvermüller F 1999 Words in the brain’s language. Behav. Brain Sci. 22: 253–336
Article Google Scholar
Raposo A, Moss H E, Stamatakis E A, Tyler L K 2009 Modulation of motor and premotor cortices by actions, action words, and action sentences. Neuropsychologia 47: 388–396
Article Google Scholar
Roy D 2005a Grounding words in perception and action: Computational insights. Trends Cognit. Sci. 9(8): 389–396
Article Google Scholar
Roy D 2005b Semiotic schemas: A framework for grounding language in action and perception. Artif. Intell. 167(1–2): 170–205
Article Google Scholar
Sprague N, Ballard D, Robinson A 2007 Modeling embodied visual behaviors. ACM Trans. Appl. Percep. 4(2): Article 11
Article Google Scholar
Tily H, Gahl S, Inbal A, Snider N, Kothari A, Bresnan J 2009 Syntactic probabilities affect pronunciation variation in spontaneous speech. Lang. Cognit. 1–2: 147–165
Article Google Scholar
Turk A, Shattuck-Hufnagel S 2000 Word-boundary-related duration patterns in English. J. Phonetics 28: 397–440
Article Google Scholar
Tyler L K, Randall B, Stamatakis E A 2008 Cortical differentiation for nouns and verbs depends on grammatical markers. J. Cognit. Neurosci. 20(8): 1381–1389
Article Google Scholar
West P 1999 Perception of distributed coarticulatory properties of English /l/ and /ɹ/. J. Phonetics 27(4): 405–426
Article Google Scholar
Wiese R 1997 Underspecification and the description of Chinese vowels, in J L Wang (ed.), Studies in Chinese phonology Berlin: Mouton de Gruyter, 219–249
Yu C, Ballard D H 2004 A multimodal learning interface for grounding spoken language in sensory perceptions. ACM Trans. on Appl. Percept. 1(1): 57–80
Article Google Scholar
Yu C, Ballard D, Aslin R N 2003 The role of embodied intention in early lexical acquisition. Meeting of the Cognitive Science Soc. Boston, MA

Download references

Author information

Authors and Affiliations

Centre for Music and Science, Faculty of Music, University of Cambridge, Cambridge, CB3 9DP, UK
S HAWKINS

Authors

S HAWKINS
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to S HAWKINS.

Rights and permissions

Reprints and permissions

About this article

Cite this article

HAWKINS, S. Phonetic perspectives on modelling information in the speech signal. Sadhana 36, 555–586 (2011). https://doi.org/10.1007/s12046-011-0038-0

Download citation

Published: 22 November 2011
Issue Date: October 2011
DOI: https://doi.org/10.1007/s12046-011-0038-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Phonetic perspectives on modelling information in the speech signal

Abstract

Access this article

Similar content being viewed by others

Spoken Language Processing: Where Do We Go from Here?

Extracting Language Content from Speech Sounds: The Information Theoretic Approach

Speech rhythms and their neural foundations

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Phonetic perspectives on modelling information in the speech signal

Abstract

Access this article

Similar content being viewed by others

Spoken Language Processing: Where Do We Go from Here?

Extracting Language Content from Speech Sounds: The Information Theoretic Approach

Speech rhythms and their neural foundations

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation