The perceptual flow of phonetic information

  • Steven GreenbergEmail author
  • Thomas U. Christiansen
Perceptual/Cognitive Constraints on the Structure of Speech Communication: In Honor of Randy Diehl


Over a long and distinguished career, Randy Diehl has elucidated the brain mechanisms underlying spoken language processing. The present study touches on two of Randy’s central interests, phonetic features and Bayesian statistics. How does the brain go from sound to meaning? Traditional approaches to the study of speech intelligibility and word recognition are unlikely to provide a definitive answer. A finer-grained, Bayesian-inspired approach may help. In this study, listeners identified 11 Danish consonants spoken in a Consonant + Vowel + [l] environment. Each syllable was filtered so that only a portion of the original audio spectrum was presented. Three-quarter-octave bands of speech, centered at 750, 1,500, and 3,000 Hz, were presented individually and in combination. The conditional, posterior probabilities associated with decoding the phonetic-features Voicing, Manner, and Place of Articulation were computed from confusion matrices to delineate the perceptual flow of phonetic information processing. Analysis of the conditional probabilities associated with both correct and incorrect feature decoding suggest that Manner of articulation is linked to the decoding of Voicing (but not vice-versa), and that decoding of Place of articulation is associated with decoding of Manner of articulation (but not the converse). Such feature-decoding asymmetries may reflect processing strategies in which the decoding of lower-level features, such as Voicing and Manner, is leveraged to enhance the recognition of more complex linguistic elements (e.g., phonetic segments, syllables, and words), especially in adverse listening conditions. Such asymmetric feature decoding patterns are consistent with a hierarchical, perceptual flow model of phonetic processing.


Speech perception Phonology Bayesian modeling 



This research was funded by the Carlsberg Foundation, Technical University of Denmark, and the United States Air Force Office of Scientific Research. The authors thank Torsten Dau for helpful suggestions and comments on various aspects of this research, as well as Andy Lotto and an anonymous reviewer for helpful suggestions on improving the original draft of this paper.


  1. Abramson, A., & Lisker, L. (1970). Discriminability along the voicing continuum: Cross language tests. Proceedings of the 6th International Congress of Phonetic Sciences (pp. 569–573).Google Scholar
  2. Allen, J. B. (2005). Consonant recognition and the articulation index. The Journal of the Acoustical Society of America, 117, 2212-2223.CrossRefGoogle Scholar
  3. ANSI (1969). Methods for the calculation of the articulation index. ANSI Standard S3.5-1969.Google Scholar
  4. ANSI (1997). Methods for the calculation of the speech intelligibility index. ANSI Standard S3.5-1997.Google Scholar
  5. Basbøll, Hans (2005). The phonology of Danish. Oxford: Oxford University Press.Google Scholar
  6. Bell, T. S., Dirks, D. D., & Trine, T. D. (1992). Frequency-importance functions for words in high- and low-context sentences. Journal of Speech and Hearing Research, 35, 950-959.CrossRefGoogle Scholar
  7. Bonatti, L., Peña, M., Nespor, M., & Mehler, J. (2005). Linguistic constraints on statistical computations: The role of consonants and vowels in continuous speech processing. Psychological Science, 16, 451–459.CrossRefGoogle Scholar
  8. Boothroyd, A., & Nittrouer, S. (1988). Mathematical treatment of context effects in phoneme and word recognition. The Journal of the Acoustical Society of America, 84, 101-114.CrossRefGoogle Scholar
  9. Braida, L. D. (1991). Crossmodal integration in the identification of consonant segments. Quarterly Journal of Experimental Psychology, 43, 647-677.CrossRefGoogle Scholar
  10. Chan, D., Fourcin, A., Gibbon, D., Granström, B., Huckvale, M., Kokkinakis, G., …, Zeiliger, J. (1995). EUROM—A spoken language resource for the EU, in Proceedings of the 6th European. Conference on Speech Communication and Technology (Eurospeech’95), pp. 867-870.Google Scholar
  11. Chang, S., Wester, M., & Greenberg, S. (2005). An elitist approach to automatic articulatory-acoustic feature classification for phonetic characterization of spoken language. Speech Communication, 47, 290-311.CrossRefGoogle Scholar
  12. Cheung, C., Hamilton, L. S., Johnson, K., & Chang, E. F. (2016). The auditory representation of speech sounds in the human motor cortex. eLife, 5, e12577.CrossRefGoogle Scholar
  13. Chomsky, N., & Halle, M. (1968). The sound pattern of English. New York: Harper & Row.Google Scholar
  14. Clements, G. N. (1985). The geometry of phonological features. Phonology Yearbook, 2, 225-252.CrossRefGoogle Scholar
  15. Cohen M. M., & Massaro, D. W. (1995). Perceiving visual and auditory information in consonant-vowel and vowel syllables, In C. Sorin, J. Mariani, H. Meloni,, & J. Schoentgen, (Eds.), Levels in speech communication: Relations and interactions (pp. 25-37). Amsterdam: Elsevier.Google Scholar
  16. Cole, R., Yan, Y., Mak, B., Fanty. M, & Bailey, T. (1996). The contribution of consonants versus vowels to word recognition in fluent speech. Proceedings of the International Conference on Acoustics, Speech and Signal Processing (pp. 853–856).Google Scholar
  17. Diehl, R., & Lindblom, B. (2004). In S. Greenberg, W. Ainsworth, A. Popper, & R. Fay (Eds.), Speech processing in the auditory system (pp.101-162). New York: Springer.Google Scholar
  18. Divenyi, P. (Ed.) (2004) Speech separation by humans and machines. Boston: Kluwer.Google Scholar
  19. Elhilali, M., Chi, T., & Shamma, S. (2003). A spectro-temporal modulation index (STMI) for assessment of speech intelligibility. Speech Communication, 41, 331-348.CrossRefGoogle Scholar
  20. Fletcher, H. (1953). Speech and hearing in communication. New York: Van Nostrand. Reprinted by the Acoustical Society of America, with a forward by J. Allen (1995).Google Scholar
  21. Frankel, J., Wester, M., & King, S. (2007). Articulatory feature recognition using dynamic Bayesian networks. Computer Speech and Language, 21, 620-640.CrossRefGoogle Scholar
  22. French, N. R., & Steinberg, J. C. (1949). Factors governing the intelligibility of speech sounds. The Journal of the Acoustical Society of America, 19, 90-119.CrossRefGoogle Scholar
  23. Ghosh, P. K., & Narayanan, S. (2011). Automatic speech recognition using articulatory features from subject-independent acoustic-to-articulatory inversion. The Journal of the Acoustical Society of America, 130, EL251-257. CrossRefGoogle Scholar
  24. Grant, K. W., Walden, B. E., & Seitz, P. F. (1998). Auditory-visual speech recognition by hearing-impaired subjects: Consonant recognition, sentence recognition, and auditory-visual integration. The Journal of the Acoustical Society of America, 103, 2677-2690.CrossRefGoogle Scholar
  25. Grant, K. W., & Braida, L. D. (1991). Evaluating the articulation index for auditory-visual input. The Journal of the Acoustical Society of America, 89, 2952-2960.CrossRefGoogle Scholar
  26. Greenberg, S., & Ainsworth, W. A. (2004). Speech processing in the auditory system: An Overview. In S. Greenberg, W. A. Ainsworth, A. R. Popper, & R. R. Fay (Eds.), Speech processing in the auditory system (pp. 1-62). New York: Springer.Google Scholar
  27. Greenberg, S., Carvey, H., & Hitchcock, L. (2002). The relation between stress accent and pronunciation variation in spontaneous American English discourse. Proceedings of the ISCA Workshop on Prosody and Speech Processing.Google Scholar
  28. Greenberg S., & Christiansen, T. U. (2008). Linguistic scene analysis and the importance of synergy, in T. Dau, J. M. Buchholz, J. M. Harte, T. U. Christiansen (Eds.), Auditory signal processing in hearing impaired listeners. Elsinore, Denmark: Danavox, (pp. 351-364).Google Scholar
  29. Grønnum, N. (1998). Illustrations of the IPA: Danish, Journal of the International Phonetics Association, 28, 99-105.CrossRefGoogle Scholar
  30. Hasegawa-Johnson, M., Baker, J., Borys, S., Chen, K., Coogan, E., Greenberg, S.,… Wang, T. (2005). Landmark-based speech recognition: Report of the 2004 Johns Hopkins summer workshop. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Vol.1.Google Scholar
  31. Jakobson, R. Fant, G., & Halle, M. (1963). Preliminaries to speech analysis: The distinctive features and their correlates. Cambridge, MA: MIT Press. [Originally published in 1952 as a research monograph by the MIT Research Laboratory of Electronics].Google Scholar
  32. Juneja, A. (2004). Speech recognition based on phonetic features and acoustic landmarks. Ph.D. thesis, University of Maryland.Google Scholar
  33. Kewley-Port, D. Pisoni, D. B., & Studdert-Kennedy, M. (1983). Perception of static and dynamic acoustic cues to place of articulation in initial stop consonants. The Journal of the Acoustical Society of America, 73, 1779-1793.CrossRefGoogle Scholar
  34. Kewley-Port, D., Burkle, T. Z., & Lee, J. H. (2007). Contribution of consonant versus vowel information to sentence intelligibility for young normal-hearing and elderly hearing-impaired listeners. The Journal of the Acoustical Society of America, 122, 2365-2375.CrossRefGoogle Scholar
  35. Kryter, K. D. (1962). Methods for the calculation and use of the articulation index. The Journal of the Acoustical Society of America, 34, 1689-1697.CrossRefGoogle Scholar
  36. Ladefoged, P. (1971). Preliminaries to linguistic phonetics. Chicago: University of Chicago Press.Google Scholar
  37. Ladefoged, P., & Maddieson, I. (1996). The sounds of the world’s languages. Oxford: Blackwell.Google Scholar
  38. Lee, J. H., & Kewley-Port, D. (2009). Intelligibility of interrupted sentences at subsegmental levels in young normal-hearing and elderly hearing-impaired listeners. The Journal of the Acoustical Society of America, 125, 1153-1163.CrossRefGoogle Scholar
  39. Liberman, A. M., Cooper, F. S., Shankweiler, D. P., & Studdert-Kennedy, M. (1967). Perception of the speech code. Psychological Review, 74, 431-461.CrossRefGoogle Scholar
  40. Lisker L, & Abramson A. (1964). A cross-language study of voicing in initial stops: Acoustical measurements. Word, 20, 384–442.CrossRefGoogle Scholar
  41. Livescu, K., Çetin, Ö., Hasegawa-Johnson, M., King, S., Bartels, C., Borges, N.,… Saenko, K. (2007). Articulatory feature-based methods for acoustic and audio-visual speech recognition: 2006 JHU summer workshop final report. Proceedings of IEEE International Conference on Acoustic, Speech, and Signal Processing (pp. 621–624).Google Scholar
  42. Marr, D. (1982). Vision: A computational investigation into the human representation and processing of visual information. New York: Freeman.Google Scholar
  43. McGurk H., & MacDonald, J. (1976). Hearing lips and seeing voices. Nature, 264, 746–748.CrossRefGoogle Scholar
  44. Massaro, D. W. (1987). Speech perception by ear & eye: A paradigm for psychological inquiry. Hinsdale, NJ: Lawrence Erlbaum.Google Scholar
  45. Mesgarani, N., Cheung, C., Johnson, K., & Chang, E. F. (2014). Phonetic feature encoding in human superior temporal gyrus. Science, 343, 1006-1010.CrossRefGoogle Scholar
  46. Miller G. A., & Nicely, P. (1955). An analysis of perceptual confusions among some English consonants. The Journal of the Acoustical Society of America, 27, 338-352.CrossRefGoogle Scholar
  47. Pavlovic, C. V. (1994) Band importance functions for audiological applications. Ear and Hearing, 15, 100-104.CrossRefGoogle Scholar
  48. Pavlovic, C. V. (2006). The speech intelligibility index standard and its relationship to the articulation index and the speech transmission index. The Journal of the Acoustical Society of America, 119, 3326.CrossRefGoogle Scholar
  49. Rasipurama, R., & Magimai-Doss, M. (2016). Articulatory feature based continuous speech recognition using probabilistic lexical modeling. Computer Speech & Language, 36, 233-259.CrossRefGoogle Scholar
  50. Redford M., & Diehl, R. L. (1999). The relative perceptual distinctiveness of initial and final consonants in CVC syllables. The Journal of the Acoustical Society of America, 106, 1555-1565.Google Scholar
  51. Steeneken, H. J., & Houtgast, T. (1980). A physical method for measuring speech-transmission quality. The Journal of the Acoustical Society of America, 67, 318-326.CrossRefGoogle Scholar
  52. Stevens, K. N. (2002). Toward a model for lexical access based on acoustic landmarks and distinctive features. The Journal of the Acoustical Society of America, 111, 1872-1891.CrossRefGoogle Scholar
  53. Sussman, H. M., McCaffrey, H. A., & Matthews, S. A. (1991). An investigation of locus equations as a source of relational invariance for stop consonant place categorization. The Journal of the Acoustical Society of America, 90, 1309-1325.CrossRefGoogle Scholar
  54. Trubetzkoy, N. (1969). Principles of phonology. Berkeley: University of California Press. Originally published in 1939 as Grundzige der Phonologie. Travaux du Cercle Linguistique de Prague, 7. Prague.Google Scholar
  55. van Wassenhove, V., Grant, K. W., & Poeppel, D. (2005). Visual speech speeds up the neural processing of auditory speech. Proceedings of the National Academy of Sciences, 102, 1181-1186.CrossRefGoogle Scholar

Copyright information

© The Psychonomic Society, Inc. 2019

Authors and Affiliations

  1. 1.Silicon SpeechHidden Valley LakeUSA
  2. 2.OticonSmørumDenmark

Personalised recommendations