Skip to main content

Simulation and Identification of Vowels Based on a Time-Varying Model of the Vocal Tract Area Function

  • Chapter
  • First Online:
Book cover Vowel Inherent Spectral Change

Part of the book series: Modern Acoustics and Signal Processing ((MASP))

Abstract

In their purest form, vowels can be conceived as being produced with static configurations of the vocal tract shape. Laboratory measurements of both acoustic and articulatory characteristics of vowels are typically performed with this assumption. In the case of natural, connected speech, however, the vocal tract shape undergoes nearly continuous change thus a true “static” configuration is rarely produced. Listeners are able to identify vowels in this time-varying situation, often with greater accuracy than for a vowel deliberately produced without any vocal tract change. This chapter examines the time-varying changes of the vocal tract shape that produce vowel inherent spectral change. Specifically, a model of the vocal tract area function is used to investigate how time-dependent formant frequencies originate from movement of the vocal tract.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Although any artificially-generated speech is strictly synthetic, the term simulated speech is used in this article to denote that it is produced, to some degree, by simulating the physical processes of human sound production. These consist primarily of vocal fold vibration, acoustic wave propagation in the tracheal, nasal, and vocal tract systems, and the radiated acoustic output. In contrast, formant synthesis is an attempt to replicate the acoustic properties of the speech output signal, but not necessarily any of the physical processes that produce those properties.

  2. 2.

    The publications in which each area function set was reported are as follows: SF0 (Story et al. 1998); SF1, SF2, SF3, SM1, SM2, and SM3 (Story 2005); SM0 (Story et al. 1996; also again, but resampled, in Story and Titze 1998); SM0-2 (Story 2008).

  3. 3.

    Following Story (2005, 2009), the PCA was performed on the equivalent diameters of the cross-sectional areas rather than on the areas themselves.

  4. 4.

    To obtain spectro-temporal information for the vowel simulation, time-dependent formant frequencies were obtained from productions of eleven American English vowels (/i, ɪ, e, \(\varepsilon \), æ, ʌ, ɑ, ɔ, o, ℧, u/), spoken in citation form by an adult male speaker. The vowels were recorded in a sound-treated room with an AKG CS1000 microphone. The signal was acquired in digital form at a sampling frequency of 44.1 kHz with a Kay Elemetrics CSL4400 and saved to a file in “wav” format. Formant frequencies were then estimated over the time course of each vowel with the formant analysis module in Praat (Boersma and Weenink 2009). Formant analysis parameters were manually adjusted so that the formant contours of F1 and F2 were aligned with the centers of their respective formant bands in a simultaneously-displayed wide-band spectrogram. Fundamental frequency (F0) and intensity contours (I0) for each vowel were also extracted with the appropriate Praat modules. All formant, F0, and I0 contours were transferred to vector form in Matlab (Mathworks 2008) for further processing.

Abbreviations

\(\Omega (x)\) :

Mean vocal tract diameter function

\(\phi _1(x)\) :

Mode 1 (first principal component)

\(\phi _2(x)\) :

Mode 2 (second principal component)

\(A(x)\) :

Vocal tract area function

\(A(x,t)\) :

Time-dependent vocal tract area function

\(q_1\) :

Scaling coefficient of the first mode

\(q_1(t)\) :

Time dependent version of \(q_1\)

\(q_2\) :

Scaling coefficient of the second mode

\(q_2(t)\) :

Time dependent version of \(q_2\)

F0:

Fundamental frequency

F1:

First formant frequency

F2:

Second formant frequency

F3:

Third formant frequency

I0:

Intensity

MRI:

Magnetic resonance imaging

PCA:

Principal component analysis

VISC:

Vowel inherent spectral change

XRMB:

X-ray microbeam

References

  • Baer, T., Gore, J.C., Gracco, L.C., Nye, P.W.: Analysis of vocal tract shape and dimensions using magnetic resonance imaging: Vowels. J. Acoust. Soc. Am. 90, 799–828 (1991). doi:10.1121/1.401949

    Article  Google Scholar 

  • Boersma, P., Weenink, D.: Praat, Version 5.1, www.praat.org, (2009) last viewed on February 2, 2009

  • Bunton, K., Story, B.H.: Identification of synthetic vowels based on selected vocal tract area functions. J. Acoust. Soc. Am. 125, 19–22 (2009). doi:10.1121/1.3033740

    Article  Google Scholar 

  • Bunton, K., Story, B.H.: Identification of synthetic vowels based on a time-varying model of the vocal tract area function. J. Acoust. Soc. Am. 127, EL146–EL152 (2010). doi:10.1121/1.3313921

    Article  Google Scholar 

  • Fant, G.: The Acoustic Theory of Speech Production. Mouton, The Hague (1960)

    Google Scholar 

  • Hillenbrand, J., Getty, L.A., Clark, M.J., Wheeler, K.: Acoustic characteristics of American English vowels. J. Acoust. Soc. Am. 97, 3099–3111 (1995). doi:10.1121/1.409456

    Article  Google Scholar 

  • Hillenbrand, J., Nearey, T.: Identification of resynthesized /hVd/ utterances: Effects of formant contour. J. Acoust. Soc. Am. 105, 3509–3523 (1999). doi:10.1121/1.411694

    Article  Google Scholar 

  • Hillenbrand, J., Clark, M., Houde, R.: Some effects of duration on vowel recognition. J. Acoust. Soc. Am. 108, 3013–3022 (2000). doi:10.1121/1.1323463

    Article  Google Scholar 

  • Hillenbrand, J.M.: Static and dynamic approaches to understanding vowel perception. In: Morrison, G.S., Assmann, P.F. (eds.) Vowel Inherent Spectral Change (Chap. 2). Springer , Heidelberg (2013)

    Google Scholar 

  • Klatt, D.H., Klatt, L.C.: Analysis, synthesis, and perception of voice quality variations among male and female talkers. J. Acoust. Soc. Am. 87, 820–857 (1990). doi:10.1121/1.398894

    Article  Google Scholar 

  • Labov, W., Ash, S., Boberg, C.: The Atlas of North American English: Phonetics. Mouton de Gruyter, Berlin (2006)

    Google Scholar 

  • Liljencrants, J.: Speech synthesis with a reflection-type line analog. DS Dissertation, Department of Speech Communication and Music Acousstics, Royal Institute of Technology, Stockholm, Sweden (1985)

    Google Scholar 

  • The Mathworks, Matlab, Version 7.6.0.324 (R2008a).

    Google Scholar 

  • Morrison, G.S.: Theories of vowel inherent spectral change: A review. In: Morrison, G.S., Assmann, P.F. (eds.) Vowel Inherent Spectral Change (Chap. 3). Springer, Heidelberg (2013a)

    Google Scholar 

  • Morrison, G.S., Nearey, T.M.: Testing theories of vowel inherent spectral change. J. Acoust. Soc. Am. 122, EL15–EL22 (2007). doi:10.1121/1.2739111

    Article  Google Scholar 

  • Nearey, T.M.: Static, dynamic, and relational properties in vowel perception. J. Acoust. Soc. Am. 85, 2088–2113 (1989). doi:10.1121/1.397861

    Article  Google Scholar 

  • Nearey, T.M., Assmann, P.F.: Modeling the role of vowel inherent spectral change in vowel identification. J. Acoust. Soc. Am. 80, 1297–1308 (1986). doi:10.1121/1.394433

    Article  Google Scholar 

  • Nittrouer, S.: Dynamic spectral structure specifies vowels for children and adults. J. Acoust. Soc. Am. 122, 2328–2339 (2007). doi:10.1121/1.2769624

    Article  Google Scholar 

  • Stevens, K.N.: On the quantal theory of speech. J. Phonetics 17, 3–45 (1989)

    Google Scholar 

  • Story, B.H.: Speech simulation with an enhanced wave-reflection model of the vocal tract. Dissertation, Ph. D, University of Iowa (1995)

    Google Scholar 

  • Story, B.H., Titze, I.R., Hoffman, E.A.: Vocal tract area functions from magnetic resonance imaging. J. Acoust. Soc. Am 100, 537–554 (1996). doi:10.1121/1.415960

    Article  Google Scholar 

  • Story, B.H., Titze, I.R., Hoffman, E.A.: Vocal tract area functions for an adult female speaker based on volumetric imaging. J. Acoust. Soc. Am 104, 471–487 (1998). doi:10.1121/1.423298

    Article  Google Scholar 

  • Story, B.H., Titze, I.R.: Parameterization of vocal tract area functions by empirical orthogonal modes. J. Phonetics 26, 223–260 (1998). doi:10.1006/jpho.1998.0076

    Article  Google Scholar 

  • Story, B.H., Titze, I.R., Hoffman, E.A.: The relationship of vocal tract shape to three voice qualities. J. Acoust. Soc. Am. 109, 1651–1667 (2001). doi:10.1121/1.1352085

    Article  Google Scholar 

  • Story, B.H.: Synergistic modes of vocal tract articulation for American English vowels. J. Acoust. Soc. Am. 118, 3834–3859 (2005). doi:10.1121/1.2118367

    Article  Google Scholar 

  • Story, B. H.: A technique for ``tuning'' vocal tract area functions based on acoustic sensitivity functions. J. Acoust. Soc. Am. 119(2), 715–718 (2006). doi:10.1121/1.2151802

    Article  Google Scholar 

  • Story, B.H.: Time-dependence of vocal tract modes during production of vowels and vowel sequences. J. Acoust. Soc. Am. 121, 3770–3789 (2007). doi:10.1121/1.2730621

    Article  Google Scholar 

  • Story, B.H.: Comparison of Magnetic Resonance Imaging-based vocal tract area functions obtained from the same speaker in 1994 and 2002. J. Acoust. Soc. Am. 123, 327–335 (2008). doi:10.1121/1.2805683

    Article  Google Scholar 

  • Story, B.H.: Vocal tract modes based on multiple area function sets from one speaker. J. Acoust. Soc. Am. 125, EL141–EL147 (2009). doi:10.1121/1.3082263

    Article  Google Scholar 

  • Titze, I.R.: Parameterization of the glottal area, glottal flow, and vocal fold contact area. J. Acoust. Soc. Am. 75, 570–580 (1984). doi:10.1121/1.390530

    Article  Google Scholar 

  • Titze, I.R., Story, B.H.: Acoustic interactions of the voice source with the lower vocal tract. J. Acoust. Soc. Am. 101(4), 2234–2243 (1997). doi:10.1121/1.418246

    Article  Google Scholar 

  • Titze, I.R.: Regulating glottal airflow in phonation: Application of the maximum power transfer theorem to a low dimensional phonation model. J. Acoust. Soc. Am. 111, 367–376 (2002). doi:10.1121/1.1417526

    Article  Google Scholar 

  • Titze, I.R.: The myoelastic aerodynamic theory of phonation. National Cent. Voice Speech 1, 197–214 (2006)

    Google Scholar 

  • Titze, I.R.: Nonlinear source-filter coupling in phonation: theory. J. Acoust. Soc. Am. 123(5), 2733–2749 (2008). doi:10.1121/1.2832337

    Article  Google Scholar 

  • Westbury, J. R.: X-ray microbeam speech production database user’s handbook. (version 1.0) (UW-Madison), (1994)

    Google Scholar 

Download references

Acknowledgments

This research was supported by NIH grant number R01-DC04789.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Brad H. Story .

Editor information

Editors and Affiliations

1 Electronic Supplementary Material

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Story, B.H., Bunton, K. (2013). Simulation and Identification of Vowels Based on a Time-Varying Model of the Vocal Tract Area Function. In: Morrison, G., Assmann, P. (eds) Vowel Inherent Spectral Change. Modern Acoustics and Signal Processing. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14209-3_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-14209-3_7

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-14208-6

  • Online ISBN: 978-3-642-14209-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics