Skip to main content

Theories of Vowel Inherent Spectral Change

  • Chapter
  • First Online:
Vowel Inherent Spectral Change

Part of the book series: Modern Acoustics and Signal Processing ((MASP))

Abstract

In many dialects of North-American English, in addition to vowels which are traditionally described as true and phonetic diphthongs, several vowels traditionally described as monophthongs also have substantial formant movement. Vowel inherent spectral change (VISC) has also been found to be an important factor in the perception of vowel-phoneme identity. This chapter reviews literature pertinent to theories of the perceptually relevant aspects of VISC. Three basic hypotheses have been proposed, onset + offset, onset + slope, and onset + direction; each taking the position that initial formant values are relevant but then differing as to the relevant aspect of formant movement. Of these, the weight of evidence indicates that the onset + offset hypothesis is superior in terms of leading to higher correct-classification rates and higher correlation with listeners’ vowel identification responses. Models which fit curves to whole formant trajectories have, as yet, not been found to outperform simple models based on formant measurements taken at two points (onset and offset) in formant trajectories. A popular curve-fitting model (first-order discrete cosine-transform, DCT) is interpretable as a parameterization of the onset + offset hypothesis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    The terminology used here is based on Gottfried et al. (1993) “onset + offset”, “onset + slope”, and “onset + direction”, which will frequently be abbreviated to offset, slope, and direction. Nearey and Assmann’s (1986) “dual-target”, “target-plus-slope”, and “target-plus-direction” represent the same hypotheses. Gottfried et al. only tested F2 slope for their slope hypothesis, but both F1 and F2 slopes were tested in studies conducted by Nearey and colleagues. In contrast to Lehiste and Peterson’s (1961) use of the term target, Nearey and Assmann’s term dual-target does not imply that there must be steady states at the beginning and end of the diphthongs.

  2. 2.

    The interpretation of Gay’s (1970) results is hindered by contradictions between the description of his stimuli and the discussion of the results. Discussion and graphical results suggest that, in his Experiment II, F2 offset did not covary with duration so as to maintain a fixed slope, rather, F2 offset stepped up at a slower rate than duration. For example, for /ɔ/–/ɔɪ/ stimuli with an F2 onset of 840 Hz, the first three duration steps of 100, 110, and 120 ms all had an F2 offset of 1,320 Hz, and thus progressively shallower slopes of 4.80, 4.36, and 4.00 Hz/ms; the next two duration steps of 130 and 140 ms both had an F2 offset of 1,440 Hz, and thus slopes of 4.62 and 4.29 Hz/ms; etc.

  3. 3.

    Although the classical description of a diphthong includes an initial steady state, a glide, and a final steady state (Lehiste and Peterson 1961), there is usually no second steady state (see Holbrook and Fairbanks 1962), the first steady state may disappear at fast speaking rates (Gay 1968), and intelligible diphthongs can be synthesized using only a glide (Gay 1970).

  4. 4.

    There has been an ongoing disagreement between Strange and colleagues and Nearey and colleagues over the interpretation of silent-center results. Strange and Jenkins (2013 Chap. 5) review one of their earlier studies (Jenkins and Strange 1999) in which a silent-center condition (first 3 pitch periods and last 4 pitch periods of the vowel) resulted in listeners having high correct-classification rates for vowel-phoneme identity even when vowel-duration information was neutralized, but playing the last 8 pitch periods resulted in low correct-classification rates. They claim that these results argue “against the hypothesis that nucleus + offglide direction information provides the critical dynamic spectral information for AE vowels in continuous speech contexts”. On the contrary, the silent-center results are exactly what the onset + offset hypothesis would predict, and from their citations it appears that they are referring to what I am calling the onset + offset hypothesis. They claim that the last 8 pitch periods included “target plus offglide”, but unless the whole vowel was about 8 pitch periods long this cannot be the case (again interpreting their “target plus offglide” as equivalent to what I am calling onset + offset). In fact, the two shortest vowel-tokens they tested, tokens of /ɪ/ and /ʊ/ with durations of 12–14 pitch periods, had better correct-classification rates in the last-8-pitch periods condition than in the silent-center condition. The situation was reversed once the vowel tokens were about twice as long as 8 pitch periods (for all vowel phonemes with tokens averaging more than 15 pitch periods long). Correct-classification rates were very poor for the longest vowels, tokens of /æ/ and /a/ were 19–21 pitch periods long, and worst for tokens of /e/ and /o/ which were 18–21 pitch periods long and had greater formant movement than /æ/ and /a/. What Jenkins and Strange appear to have tested for the longer vowels is an offset-only hypothesis—I am not aware of anyone ever having seriously advocated such a hypothesis.

  5. 5.

    Assmann and Katz (2000) tested the perception of stimuli in which the F1 trajectory was flattened and F2 unchanged, and stimuli in which the F2 trajectory was flattened and F1 unchanged. Listeners’ correct identification rates for English nominal monophthongs and phonetic diphthongs significantly decreased when either formant was flattened. Although some vowels were affected more by F1 flattening, some were affected more by F2 flattening. The results indicate that a VISC theory applicable across vowel categories should refer to formant movement in both F1 and F2.

  6. 6.

    Zahorian and Jagharghi (1991, 1993) reserved the term discrete cosine transform (DCT) for a curve fitted to a spectrum at a single time frame (what is normally referred to as a cepstrum), and used the term discrete cosine series (DCS) for curves fitted to a time-ordered series of formant or cepstral coefficient values.

Abbreviations

DCT:

Discrete cosine transform

F:

Formant

F1:

First formant

F2:

Second formant

F3:

Third formant

t:

Time

VISC:

Vowel inherent spectral change

References

  • Adank, P., van Hout, R., Smits, R.: An acoustic description of the vowels of Northern and Southern standard Dutch. J. Acoust. Soc. Am. 116, 1729–1738 (2004). doi:10.1121/1.1779271

    Article  Google Scholar 

  • Adank, P., van Hout, R., van de Velde, H.: An acoustic description of the vowels of Northern and Southern standard Dutch II: regional varieties. J. Acoust. Soc. Am. 121, 1130–1141 (2007). doi:10.1121/1.2409492

    Article  Google Scholar 

  • Andruski, J.E., Nearey, T.M.: On the sufficiency of compound target specification of isolated vowels in /bVb/ syllables. J. Acoust. Soc. Am. 91, 390–410 (1992). doi:10.1121/1.402781

    Article  Google Scholar 

  • Assmann, P.F., Katz, W.F.: Time-varying spectral change in the vowels of children and adults. J. Acoust. Soc. Am. 108, 1856–1866 (2000). doi:10.1121/1.1289363

    Article  Google Scholar 

  • Assmann, P.F., Katz, W.F.: Synthesis fidelity and time-varying spectral change in vowels. J. Acoust. Soc. Am. 117, 886–895 (2005). doi:10.1121/1.1852549

    Article  Google Scholar 

  • Assmann, P.F., Nearey, T.M., Bharadwaj, S.V.: Developmental patterns in children’s speech: patterns of spectral change in vowels. In: Morrison, G.S., Assmann, P.F. (eds.) Vowel Inherent Spectral Change (Chap. 9). Springer, Heidelberg (2013)

    Google Scholar 

  • Assmann, P.F., Nearey, T.M., Hogan, J.T.: Vowel identification: orthographic, perceptual, and acoustic aspects. J. Acoust. Soc. Am. 71, 975–989 (1982). doi:10.1121/1.387579

    Article  Google Scholar 

  • Bladon, A.: Diphthongs: a case study of dynamic articulatory processing. Speech Commun. 4, 145–154 (1985). doi:10.1016/0167-6393(84)90040-2

    Article  Google Scholar 

  • Bond, Z.S.: The effects of varying glide duration on diphthong identification. Lang. Speech 21, 253–278 (1978)

    Google Scholar 

  • Bond, Z.S.: Experiments with synthetic diphthongs. J. Phonetics 10, 259–264 (1982)

    Google Scholar 

  • Borzone de Manrique, A.M.: Acoustic analysis of Spanish diphthongs. Phonetica 36, 194–206 (1979)

    Article  Google Scholar 

  • Clermont, F.: Spectro-temporal description of diphthongs in F1–F2–F3 space. Speech Commun. 13, 377–390 (1993). doi:10.1016/0167-6393(93)90036-K

    Article  Google Scholar 

  • Divenyi, P.: Perception of complete and incomplete formant transitions in vowels. J. Acoust. Soc. Am. 126, 1427–1439 (2009). doi:10.1121/1.3167482

    Article  Google Scholar 

  • Hillenbrand, J.M.: Static and dynamic approaches to understanding vowel perception. In: Morrison, G.S., Assmann, P.F. (eds.) Vowel inherent spectral change (Chap. 2). Heidelberg, Springer (2013)

    Google Scholar 

  • Fox, R.: Perceptual structure of monophthongs and diphthongs in English. Lang. Speech 26, 21–49 (1983)

    Google Scholar 

  • Fox, R.: Dynamic information in identification and discrimination of vowels. Phonetica 46, 97–116 (1989)

    Article  Google Scholar 

  • Fox, R.A., McGory, J.T.: Second language acquisition of a regional dialect of American English by native Japanese speakers. In: Bohn, O.-S., Munro, M.J. (eds.) Language experience in second language speech learning: in honor of James Emil Flege, pp. 117–134. John Benjamins, Amsterdam (2007)

    Google Scholar 

  • Gay, T.: Effects of speaking rate on diphthong formant movements. J. Acoust. Soc. Am. 44, 1570–1573 (1968). doi:10.1121/1.1911298

    Article  Google Scholar 

  • Gay, T.: A perceptual study of American English diphthongs. Lang. Speech 13, 65–88 (1970)

    Google Scholar 

  • Gottfried, M., Miller, J.D., Meyer, D.J.: Three approaches to the classification of American English diphthongs. Journal of Phonetics 21, 205–229 (1993)

    Google Scholar 

  • Hargus Ferguson, S., Kewley-Port, D.: Vowel intelligibility in clear and conversational speech for normal-hearing and hearing-impaired listeners. J. Acoust. Soc. Am. 112, 259–271 (2002). doi:10.1121/1.1482078

    Article  Google Scholar 

  • Harrington, J., Cassidy, S.: Dynamic and target theories of vowel classification: evidence from monophthongs and diphthongs in Australian English. Lang. Speech 37, 357–373 (1994). doi:10.1177/002383099403700402

    Google Scholar 

  • Hillenbrand, J.M., Nearey, T.M.: Identification of resynthesized /hVd/ syllables: Effects of formant contour. J. Acoust. Soc. Am. 105, 3509–3523 (1999). doi:10.1121/1.424676

    Article  Google Scholar 

  • Hillenbrand, J.M., Getty, L.A., Clark, M.J., Wheeler, K.: Acoustic characteristics of American English vowels. J. Acoust. Soc. Am. 97, 3099–3111 (1995). doi:10.1121/1.411872

    Article  Google Scholar 

  • Hillenbrand, J.M., Clark, M.J., Nearey, T.M.: Effect of consonant environment on vowel formant patterns. J. Acoust. Soc. Am. 109, 748–763 (2001). doi:10.1121/1.1337959

    Article  Google Scholar 

  • Holbrook, A., Fairbanks, G.: Diphthong formants and their movements. J. Speech Hear. Res. 5, 38–58 (1962)

    Google Scholar 

  • Huang, C.B.: Modelling human vowel identification using aspects of format trajectory and context. In: Tohkura, Y., Vatikiotis-Bateson, E., Sagisaka, Y. (eds.) Speech Perception, Production and Linguistic Structure, pp. 43–61. IOS, Tokyo, Ohmsha/Amsterdam (1992)

    Google Scholar 

  • Jacewicz, E., Fox, R.A.: Cross-dialectal differences in dynamic formant patterns in American English vowels. In: Morrison, G.S., Assmann, P.F. (eds.) Vowel Inherent Spectral Change (Chap. 8). Springer, Heidelberg (2013)

    Google Scholar 

  • Jacewicz, E., Fujimura, O., Fox, R.A.: Dynamics in diphthong perception. In: Sole, J., Recasens, D., Romero, J. (eds.) Proceedings of the 15th international congress of phonetic sciences, Barcelona. pp. 993–996. Causal Productions, Australia (2003)

    Google Scholar 

  • Jenkins, J.J., Strange, W.: Perception of dynamic information for vowels in syllable onsets and offsets. Percept Psychophysics 61, 1200–1210 (1999). doi:10.3758/BF03207623

    Article  Google Scholar 

  • Jenkins, J.J., Strange, W., Miranda, S.: Vowel identification in mixed-speaker silent-center syllables. J. Acoust. Soc. Am. 95, 1030–1043 (1994). doi:10.1121/1.410014

    Article  Google Scholar 

  • Kewley-Port, D., Goodman, S.G.: Thresholds for second formant transitions in front vowels. J. Acoust. Soc. Am. 118, 3252–3560 (2005). doi:10.1121/1.2074667

    Article  Google Scholar 

  • Lehiste, I., Peterson, G.E.: Transitions, glides, and diphthongs. J. Acoust. Soc. Am. 33, 268–277 (1961). doi:10.1121/1.1908681

    Article  Google Scholar 

  • Miller, J.D.: Auditory-perceptual interpretation of the vowel. J. Acoust. Soc. Am. 85, 2114–2134 (1989). doi:10.1121/1.397862

    Article  Google Scholar 

  • Moreton, E.: Realization of the English postvocalic [voice] contrast in F1 and F2. J. Phonetics 32, 1–33 (2004). doi:10.1016/S0095-4470(03)00004-4

    Article  Google Scholar 

  • Morrison, G.S., Nearey, T.M.: Testing theories of vowel inherent spectral change. J. Acoust. Soc. Am. 122, EL15–EL22 (2007) doi:10.1121/1.2739111

    Google Scholar 

  • Morrison, G.S.: Vowel inherent spectral change in forensic voice comparison. In: Morrison, G.S., Assmann, P.F. (eds.) Vowel Inherent Spectral Change (Chap. 11). Springer, Heidelberg (2013)

    Google Scholar 

  • Nábělek, A.K., Czyzewski, Z., Crowley, H.: Vowel boundaries for steady-state and linear formant trajectories. J. Acoust. Soc. Am. 94, 675–687 (1993). doi:10.1121/1.406885

    Article  Google Scholar 

  • Nábělek, A.K., Czyzewski, Z., Crowley, H.: Cues for perception of the diphthong /aɪ/ in either noise or reverberation. Part I. duration of the transition. J. Acoust. Soc. Am. 95, 2681–2693 (1994). doi:10.1121/1.409837

    Article  Google Scholar 

  • Nearey, T.M., Assmann, P.F.: Modeling the role of vowel inherent spectral change in vowel identification. J. Acoust. Soc. Am. 80, 1297–1308 (1986). doi:10.1121/1.394433

    Article  Google Scholar 

  • Nearey, T.M.: Evidence for the perceptual relevance of vowel-inherent spectral change for front vowels in Canadian English. In: Elenius, K., Branderud, P. (eds.) Proceedings of the 13th congress of phonetic sciences, Stockholm, (pp. 678–681). KTH, Sweden (1995)

    Google Scholar 

  • Nearey, T.M.: Vowel inherent spectral change in the vowels of North American English. In: Morrison, G.S., Assmann, P.F. (eds.) Vowel Inherent Spectral Change (Chap. 4). Springer, Heidelberg (2013)

    Google Scholar 

  • Neel, A.T.: Formant detail needed for vowel identification. Acoust. Res. Lett. Online 5, 125–131 (2004). doi:10.1121/1.1764452

    Article  Google Scholar 

  • Pols, L.C.W.: Spectral analysis and identification of Dutch vowels in monosyllabic words. PhD dissertation, University of Amsterdam. Amsterdam, Academishe pers B.V (1977)

    Google Scholar 

  • Strange, W., Jenkins, J.J., Johnson, T.L.: Dynamic specification of coarticulated vowels. J. Acoust. Soc. Am. 74, 695–705 (1983). doi:10.1121/1.389855

    Article  Google Scholar 

  • Strange, W., Jenkins, J.J.: Dynamic specification of coarticulated vowels: Research chronology, theory, and hypotheses. In: Morrison, G.S., Assmann, P.F. (eds.) Vowel Inherent Spectral Change (Chap. 5). Springer, Heidelberg (2013) doi:10.1007/978-3-642-14209-3_5

  • Watson, C., Harrington, J.: Acoustic evidence of dynamic formant trajectories in Australian English vowels. J. Acoust. Soc. Am. 106, 458–468 (1999). doi:10.1121/1.427069

    Article  Google Scholar 

  • Wise, C.M.: Acoustic structure of English diphthongs and semivowels vis-a-vis their phonetic symbolization. In: Zwirner, E., Bethge, W. (eds.) Proceedings of the 5th international congress on phonetic sciences, Münster pp. 589–593. Switzerland: S. Kager (1964)

    Google Scholar 

  • Zahorian, S.A., Jagharghi, A.J.: Speaker normalization of static and dynamic vowel spectral features. J. Acoust. Soc. Am. 90, 67–75 (1991). doi:10.1121/1.402350

    Article  Google Scholar 

  • Zahorian, S.A., Jagharghi, A.J.: Spectral-shape features versus formants as acoustic correlates for vowels. J. Acoust. Soc. Am. 94, 1966–1982 (1993). doi:10.1121/1.407520

    Article  Google Scholar 

Download references

Acknowledgments

The writing of this chapter began when the author was a PhD student at the Department of Linguistics, University of Alberta, and was supported by a Social Sciences and Humanities Research Council of Canada Doctoral Fellowship. Thanks to Terrance M. Nearey, Peter F. Assmann, James M. Hillenbrand, and Christian E. Stilp for comments on earlier versions of this chapter.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Geoffrey Stewart Morrison .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Morrison, G.S. (2013). Theories of Vowel Inherent Spectral Change. In: Morrison, G., Assmann, P. (eds) Vowel Inherent Spectral Change. Modern Acoustics and Signal Processing. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14209-3_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-14209-3_3

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-14208-6

  • Online ISBN: 978-3-642-14209-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics