Theories of Vowel Inherent Spectral Change

Part of the Modern Acoustics and Signal Processing book series (MASP)


In many dialects of North-American English, in addition to vowels which are traditionally described as true and phonetic diphthongs, several vowels traditionally described as monophthongs also have substantial formant movement. Vowel inherent spectral change (VISC) has also been found to be an important factor in the perception of vowel-phoneme identity. This chapter reviews literature pertinent to theories of the perceptually relevant aspects of VISC. Three basic hypotheses have been proposed, onset + offset, onset + slope, and onset + direction; each taking the position that initial formant values are relevant but then differing as to the relevant aspect of formant movement. Of these, the weight of evidence indicates that the onset + offset hypothesis is superior in terms of leading to higher correct-classification rates and higher correlation with listeners’ vowel identification responses. Models which fit curves to whole formant trajectories have, as yet, not been found to outperform simple models based on formant measurements taken at two points (onset and offset) in formant trajectories. A popular curve-fitting model (first-order discrete cosine-transform, DCT) is interpretable as a parameterization of the onset + offset hypothesis.


Discrete Cosine Transform Discrete Cosine Transform Coefficient Pitch Period Vowel Duration Formant Movement 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



Discrete cosine transform




First formant


Second formant


Third formant




Vowel inherent spectral change



The writing of this chapter began when the author was a PhD student at the Department of Linguistics, University of Alberta, and was supported by a Social Sciences and Humanities Research Council of Canada Doctoral Fellowship. Thanks to Terrance M. Nearey, Peter F. Assmann, James M. Hillenbrand, and Christian E. Stilp for comments on earlier versions of this chapter.


  1. Adank, P., van Hout, R., Smits, R.: An acoustic description of the vowels of Northern and Southern standard Dutch. J. Acoust. Soc. Am. 116, 1729–1738 (2004). doi: 10.1121/1.1779271 CrossRefGoogle Scholar
  2. Adank, P., van Hout, R., van de Velde, H.: An acoustic description of the vowels of Northern and Southern standard Dutch II: regional varieties. J. Acoust. Soc. Am. 121, 1130–1141 (2007). doi: 10.1121/1.2409492 CrossRefGoogle Scholar
  3. Andruski, J.E., Nearey, T.M.: On the sufficiency of compound target specification of isolated vowels in /bVb/ syllables. J. Acoust. Soc. Am. 91, 390–410 (1992). doi: 10.1121/1.402781 CrossRefGoogle Scholar
  4. Assmann, P.F., Katz, W.F.: Time-varying spectral change in the vowels of children and adults. J. Acoust. Soc. Am. 108, 1856–1866 (2000). doi: 10.1121/1.1289363 CrossRefGoogle Scholar
  5. Assmann, P.F., Katz, W.F.: Synthesis fidelity and time-varying spectral change in vowels. J. Acoust. Soc. Am. 117, 886–895 (2005). doi: 10.1121/1.1852549 CrossRefGoogle Scholar
  6. Assmann, P.F., Nearey, T.M., Bharadwaj, S.V.: Developmental patterns in children’s speech: patterns of spectral change in vowels. In: Morrison, G.S., Assmann, P.F. (eds.) Vowel Inherent Spectral Change (Chap. 9). Springer, Heidelberg (2013)Google Scholar
  7. Assmann, P.F., Nearey, T.M., Hogan, J.T.: Vowel identification: orthographic, perceptual, and acoustic aspects. J. Acoust. Soc. Am. 71, 975–989 (1982). doi: 10.1121/1.387579 CrossRefGoogle Scholar
  8. Bladon, A.: Diphthongs: a case study of dynamic articulatory processing. Speech Commun. 4, 145–154 (1985). doi: 10.1016/0167-6393(84)90040-2 CrossRefGoogle Scholar
  9. Bond, Z.S.: The effects of varying glide duration on diphthong identification. Lang. Speech 21, 253–278 (1978)Google Scholar
  10. Bond, Z.S.: Experiments with synthetic diphthongs. J. Phonetics 10, 259–264 (1982)Google Scholar
  11. Borzone de Manrique, A.M.: Acoustic analysis of Spanish diphthongs. Phonetica 36, 194–206 (1979)CrossRefGoogle Scholar
  12. Clermont, F.: Spectro-temporal description of diphthongs in F1–F2–F3 space. Speech Commun. 13, 377–390 (1993). doi: 10.1016/0167-6393(93)90036-K CrossRefGoogle Scholar
  13. Divenyi, P.: Perception of complete and incomplete formant transitions in vowels. J. Acoust. Soc. Am. 126, 1427–1439 (2009). doi: 10.1121/1.3167482 CrossRefGoogle Scholar
  14. Hillenbrand, J.M.: Static and dynamic approaches to understanding vowel perception. In: Morrison, G.S., Assmann, P.F. (eds.) Vowel inherent spectral change (Chap. 2). Heidelberg, Springer (2013)Google Scholar
  15. Fox, R.: Perceptual structure of monophthongs and diphthongs in English. Lang. Speech 26, 21–49 (1983)Google Scholar
  16. Fox, R.: Dynamic information in identification and discrimination of vowels. Phonetica 46, 97–116 (1989)CrossRefGoogle Scholar
  17. Fox, R.A., McGory, J.T.: Second language acquisition of a regional dialect of American English by native Japanese speakers. In: Bohn, O.-S., Munro, M.J. (eds.) Language experience in second language speech learning: in honor of James Emil Flege, pp. 117–134. John Benjamins, Amsterdam (2007)Google Scholar
  18. Gay, T.: Effects of speaking rate on diphthong formant movements. J. Acoust. Soc. Am. 44, 1570–1573 (1968). doi: 10.1121/1.1911298 CrossRefGoogle Scholar
  19. Gay, T.: A perceptual study of American English diphthongs. Lang. Speech 13, 65–88 (1970)Google Scholar
  20. Gottfried, M., Miller, J.D., Meyer, D.J.: Three approaches to the classification of American English diphthongs. Journal of Phonetics 21, 205–229 (1993)Google Scholar
  21. Hargus Ferguson, S., Kewley-Port, D.: Vowel intelligibility in clear and conversational speech for normal-hearing and hearing-impaired listeners. J. Acoust. Soc. Am. 112, 259–271 (2002). doi: 10.1121/1.1482078 CrossRefGoogle Scholar
  22. Harrington, J., Cassidy, S.: Dynamic and target theories of vowel classification: evidence from monophthongs and diphthongs in Australian English. Lang. Speech 37, 357–373 (1994). doi: 10.1177/002383099403700402 Google Scholar
  23. Hillenbrand, J.M., Nearey, T.M.: Identification of resynthesized /hVd/ syllables: Effects of formant contour. J. Acoust. Soc. Am. 105, 3509–3523 (1999). doi: 10.1121/1.424676 CrossRefGoogle Scholar
  24. Hillenbrand, J.M., Getty, L.A., Clark, M.J., Wheeler, K.: Acoustic characteristics of American English vowels. J. Acoust. Soc. Am. 97, 3099–3111 (1995). doi: 10.1121/1.411872 CrossRefGoogle Scholar
  25. Hillenbrand, J.M., Clark, M.J., Nearey, T.M.: Effect of consonant environment on vowel formant patterns. J. Acoust. Soc. Am. 109, 748–763 (2001). doi: 10.1121/1.1337959 CrossRefGoogle Scholar
  26. Holbrook, A., Fairbanks, G.: Diphthong formants and their movements. J. Speech Hear. Res. 5, 38–58 (1962)Google Scholar
  27. Huang, C.B.: Modelling human vowel identification using aspects of format trajectory and context. In: Tohkura, Y., Vatikiotis-Bateson, E., Sagisaka, Y. (eds.) Speech Perception, Production and Linguistic Structure, pp. 43–61. IOS, Tokyo, Ohmsha/Amsterdam (1992)Google Scholar
  28. Jacewicz, E., Fox, R.A.: Cross-dialectal differences in dynamic formant patterns in American English vowels. In: Morrison, G.S., Assmann, P.F. (eds.) Vowel Inherent Spectral Change (Chap. 8). Springer, Heidelberg (2013)Google Scholar
  29. Jacewicz, E., Fujimura, O., Fox, R.A.: Dynamics in diphthong perception. In: Sole, J., Recasens, D., Romero, J. (eds.) Proceedings of the 15th international congress of phonetic sciences, Barcelona. pp. 993–996. Causal Productions, Australia (2003)Google Scholar
  30. Jenkins, J.J., Strange, W.: Perception of dynamic information for vowels in syllable onsets and offsets. Percept Psychophysics 61, 1200–1210 (1999). doi: 10.3758/BF03207623 CrossRefGoogle Scholar
  31. Jenkins, J.J., Strange, W., Miranda, S.: Vowel identification in mixed-speaker silent-center syllables. J. Acoust. Soc. Am. 95, 1030–1043 (1994). doi: 10.1121/1.410014 CrossRefGoogle Scholar
  32. Kewley-Port, D., Goodman, S.G.: Thresholds for second formant transitions in front vowels. J. Acoust. Soc. Am. 118, 3252–3560 (2005). doi: 10.1121/1.2074667 CrossRefGoogle Scholar
  33. Lehiste, I., Peterson, G.E.: Transitions, glides, and diphthongs. J. Acoust. Soc. Am. 33, 268–277 (1961). doi: 10.1121/1.1908681 CrossRefGoogle Scholar
  34. Miller, J.D.: Auditory-perceptual interpretation of the vowel. J. Acoust. Soc. Am. 85, 2114–2134 (1989). doi: 10.1121/1.397862 CrossRefGoogle Scholar
  35. Moreton, E.: Realization of the English postvocalic [voice] contrast in F1 and F2. J. Phonetics 32, 1–33 (2004). doi: 10.1016/S0095-4470(03)00004-4 CrossRefGoogle Scholar
  36. Morrison, G.S., Nearey, T.M.: Testing theories of vowel inherent spectral change. J. Acoust. Soc. Am. 122, EL15–EL22 (2007) doi:10.1121/1.2739111Google Scholar
  37. Morrison, G.S.: Vowel inherent spectral change in forensic voice comparison. In: Morrison, G.S., Assmann, P.F. (eds.) Vowel Inherent Spectral Change (Chap. 11). Springer, Heidelberg (2013)Google Scholar
  38. Nábělek, A.K., Czyzewski, Z., Crowley, H.: Vowel boundaries for steady-state and linear formant trajectories. J. Acoust. Soc. Am. 94, 675–687 (1993). doi: 10.1121/1.406885 CrossRefGoogle Scholar
  39. Nábělek, A.K., Czyzewski, Z., Crowley, H.: Cues for perception of the diphthong /aɪ/ in either noise or reverberation. Part I. duration of the transition. J. Acoust. Soc. Am. 95, 2681–2693 (1994). doi: 10.1121/1.409837 CrossRefGoogle Scholar
  40. Nearey, T.M., Assmann, P.F.: Modeling the role of vowel inherent spectral change in vowel identification. J. Acoust. Soc. Am. 80, 1297–1308 (1986). doi: 10.1121/1.394433 CrossRefGoogle Scholar
  41. Nearey, T.M.: Evidence for the perceptual relevance of vowel-inherent spectral change for front vowels in Canadian English. In: Elenius, K., Branderud, P. (eds.) Proceedings of the 13th congress of phonetic sciences, Stockholm, (pp. 678–681). KTH, Sweden (1995)Google Scholar
  42. Nearey, T.M.: Vowel inherent spectral change in the vowels of North American English. In: Morrison, G.S., Assmann, P.F. (eds.) Vowel Inherent Spectral Change (Chap. 4). Springer, Heidelberg (2013)Google Scholar
  43. Neel, A.T.: Formant detail needed for vowel identification. Acoust. Res. Lett. Online 5, 125–131 (2004). doi: 10.1121/1.1764452 CrossRefGoogle Scholar
  44. Pols, L.C.W.: Spectral analysis and identification of Dutch vowels in monosyllabic words. PhD dissertation, University of Amsterdam. Amsterdam, Academishe pers B.V (1977)Google Scholar
  45. Strange, W., Jenkins, J.J., Johnson, T.L.: Dynamic specification of coarticulated vowels. J. Acoust. Soc. Am. 74, 695–705 (1983). doi: 10.1121/1.389855 CrossRefGoogle Scholar
  46. Strange, W., Jenkins, J.J.: Dynamic specification of coarticulated vowels: Research chronology, theory, and hypotheses. In: Morrison, G.S., Assmann, P.F. (eds.) Vowel Inherent Spectral Change (Chap. 5). Springer, Heidelberg (2013) doi: 10.1007/978-3-642-14209-3_5
  47. Watson, C., Harrington, J.: Acoustic evidence of dynamic formant trajectories in Australian English vowels. J. Acoust. Soc. Am. 106, 458–468 (1999). doi: 10.1121/1.427069 CrossRefGoogle Scholar
  48. Wise, C.M.: Acoustic structure of English diphthongs and semivowels vis-a-vis their phonetic symbolization. In: Zwirner, E., Bethge, W. (eds.) Proceedings of the 5th international congress on phonetic sciences, Münster pp. 589–593. Switzerland: S. Kager (1964)Google Scholar
  49. Zahorian, S.A., Jagharghi, A.J.: Speaker normalization of static and dynamic vowel spectral features. J. Acoust. Soc. Am. 90, 67–75 (1991). doi: 10.1121/1.402350 CrossRefGoogle Scholar
  50. Zahorian, S.A., Jagharghi, A.J.: Spectral-shape features versus formants as acoustic correlates for vowels. J. Acoust. Soc. Am. 94, 1966–1982 (1993). doi: 10.1121/1.407520 CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  1. 1.Forensic Voice Comparison Laboratory, School of Electrical Engineering and TelecommunicationsUniversity of New South WalesSydneyAustralia

Personalised recommendations