Abstract
In their purest form, vowels can be conceived as being produced with static configurations of the vocal tract shape. Laboratory measurements of both acoustic and articulatory characteristics of vowels are typically performed with this assumption. In the case of natural, connected speech, however, the vocal tract shape undergoes nearly continuous change thus a true “static” configuration is rarely produced. Listeners are able to identify vowels in this time-varying situation, often with greater accuracy than for a vowel deliberately produced without any vocal tract change. This chapter examines the time-varying changes of the vocal tract shape that produce vowel inherent spectral change. Specifically, a model of the vocal tract area function is used to investigate how time-dependent formant frequencies originate from movement of the vocal tract.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Although any artificially-generated speech is strictly synthetic, the term simulated speech is used in this article to denote that it is produced, to some degree, by simulating the physical processes of human sound production. These consist primarily of vocal fold vibration, acoustic wave propagation in the tracheal, nasal, and vocal tract systems, and the radiated acoustic output. In contrast, formant synthesis is an attempt to replicate the acoustic properties of the speech output signal, but not necessarily any of the physical processes that produce those properties.
- 2.
- 3.
- 4.
To obtain spectro-temporal information for the vowel simulation, time-dependent formant frequencies were obtained from productions of eleven American English vowels (/i, ɪ, e, \(\varepsilon \), æ, ʌ, ɑ, ɔ, o, ℧, u/), spoken in citation form by an adult male speaker. The vowels were recorded in a sound-treated room with an AKG CS1000 microphone. The signal was acquired in digital form at a sampling frequency of 44.1 kHz with a Kay Elemetrics CSL4400 and saved to a file in “wav” format. Formant frequencies were then estimated over the time course of each vowel with the formant analysis module in Praat (Boersma and Weenink 2009). Formant analysis parameters were manually adjusted so that the formant contours of F1 and F2 were aligned with the centers of their respective formant bands in a simultaneously-displayed wide-band spectrogram. Fundamental frequency (F0) and intensity contours (I0) for each vowel were also extracted with the appropriate Praat modules. All formant, F0, and I0 contours were transferred to vector form in Matlab (Mathworks 2008) for further processing.
Abbreviations
- \(\Omega (x)\) :
-
Mean vocal tract diameter function
- \(\phi _1(x)\) :
-
Mode 1 (first principal component)
- \(\phi _2(x)\) :
-
Mode 2 (second principal component)
- \(A(x)\) :
-
Vocal tract area function
- \(A(x,t)\) :
-
Time-dependent vocal tract area function
- \(q_1\) :
-
Scaling coefficient of the first mode
- \(q_1(t)\) :
-
Time dependent version of \(q_1\)
- \(q_2\) :
-
Scaling coefficient of the second mode
- \(q_2(t)\) :
-
Time dependent version of \(q_2\)
- F0:
-
Fundamental frequency
- F1:
-
First formant frequency
- F2:
-
Second formant frequency
- F3:
-
Third formant frequency
- I0:
-
Intensity
- MRI:
-
Magnetic resonance imaging
- PCA:
-
Principal component analysis
- VISC:
-
Vowel inherent spectral change
- XRMB:
-
X-ray microbeam
References
Baer, T., Gore, J.C., Gracco, L.C., Nye, P.W.: Analysis of vocal tract shape and dimensions using magnetic resonance imaging: Vowels. J. Acoust. Soc. Am. 90, 799–828 (1991). doi:10.1121/1.401949
Boersma, P., Weenink, D.: Praat, Version 5.1, www.praat.org, (2009) last viewed on February 2, 2009
Bunton, K., Story, B.H.: Identification of synthetic vowels based on selected vocal tract area functions. J. Acoust. Soc. Am. 125, 19–22 (2009). doi:10.1121/1.3033740
Bunton, K., Story, B.H.: Identification of synthetic vowels based on a time-varying model of the vocal tract area function. J. Acoust. Soc. Am. 127, EL146–EL152 (2010). doi:10.1121/1.3313921
Fant, G.: The Acoustic Theory of Speech Production. Mouton, The Hague (1960)
Hillenbrand, J., Getty, L.A., Clark, M.J., Wheeler, K.: Acoustic characteristics of American English vowels. J. Acoust. Soc. Am. 97, 3099–3111 (1995). doi:10.1121/1.409456
Hillenbrand, J., Nearey, T.: Identification of resynthesized /hVd/ utterances: Effects of formant contour. J. Acoust. Soc. Am. 105, 3509–3523 (1999). doi:10.1121/1.411694
Hillenbrand, J., Clark, M., Houde, R.: Some effects of duration on vowel recognition. J. Acoust. Soc. Am. 108, 3013–3022 (2000). doi:10.1121/1.1323463
Hillenbrand, J.M.: Static and dynamic approaches to understanding vowel perception. In: Morrison, G.S., Assmann, P.F. (eds.) Vowel Inherent Spectral Change (Chap. 2). Springer , Heidelberg (2013)
Klatt, D.H., Klatt, L.C.: Analysis, synthesis, and perception of voice quality variations among male and female talkers. J. Acoust. Soc. Am. 87, 820–857 (1990). doi:10.1121/1.398894
Labov, W., Ash, S., Boberg, C.: The Atlas of North American English: Phonetics. Mouton de Gruyter, Berlin (2006)
Liljencrants, J.: Speech synthesis with a reflection-type line analog. DS Dissertation, Department of Speech Communication and Music Acousstics, Royal Institute of Technology, Stockholm, Sweden (1985)
The Mathworks, Matlab, Version 7.6.0.324 (R2008a).
Morrison, G.S.: Theories of vowel inherent spectral change: A review. In: Morrison, G.S., Assmann, P.F. (eds.) Vowel Inherent Spectral Change (Chap. 3). Springer, Heidelberg (2013a)
Morrison, G.S., Nearey, T.M.: Testing theories of vowel inherent spectral change. J. Acoust. Soc. Am. 122, EL15–EL22 (2007). doi:10.1121/1.2739111
Nearey, T.M.: Static, dynamic, and relational properties in vowel perception. J. Acoust. Soc. Am. 85, 2088–2113 (1989). doi:10.1121/1.397861
Nearey, T.M., Assmann, P.F.: Modeling the role of vowel inherent spectral change in vowel identification. J. Acoust. Soc. Am. 80, 1297–1308 (1986). doi:10.1121/1.394433
Nittrouer, S.: Dynamic spectral structure specifies vowels for children and adults. J. Acoust. Soc. Am. 122, 2328–2339 (2007). doi:10.1121/1.2769624
Stevens, K.N.: On the quantal theory of speech. J. Phonetics 17, 3–45 (1989)
Story, B.H.: Speech simulation with an enhanced wave-reflection model of the vocal tract. Dissertation, Ph. D, University of Iowa (1995)
Story, B.H., Titze, I.R., Hoffman, E.A.: Vocal tract area functions from magnetic resonance imaging. J. Acoust. Soc. Am 100, 537–554 (1996). doi:10.1121/1.415960
Story, B.H., Titze, I.R., Hoffman, E.A.: Vocal tract area functions for an adult female speaker based on volumetric imaging. J. Acoust. Soc. Am 104, 471–487 (1998). doi:10.1121/1.423298
Story, B.H., Titze, I.R.: Parameterization of vocal tract area functions by empirical orthogonal modes. J. Phonetics 26, 223–260 (1998). doi:10.1006/jpho.1998.0076
Story, B.H., Titze, I.R., Hoffman, E.A.: The relationship of vocal tract shape to three voice qualities. J. Acoust. Soc. Am. 109, 1651–1667 (2001). doi:10.1121/1.1352085
Story, B.H.: Synergistic modes of vocal tract articulation for American English vowels. J. Acoust. Soc. Am. 118, 3834–3859 (2005). doi:10.1121/1.2118367
Story, B. H.: A technique for ``tuning'' vocal tract area functions based on acoustic sensitivity functions. J. Acoust. Soc. Am. 119(2), 715–718 (2006). doi:10.1121/1.2151802
Story, B.H.: Time-dependence of vocal tract modes during production of vowels and vowel sequences. J. Acoust. Soc. Am. 121, 3770–3789 (2007). doi:10.1121/1.2730621
Story, B.H.: Comparison of Magnetic Resonance Imaging-based vocal tract area functions obtained from the same speaker in 1994 and 2002. J. Acoust. Soc. Am. 123, 327–335 (2008). doi:10.1121/1.2805683
Story, B.H.: Vocal tract modes based on multiple area function sets from one speaker. J. Acoust. Soc. Am. 125, EL141–EL147 (2009). doi:10.1121/1.3082263
Titze, I.R.: Parameterization of the glottal area, glottal flow, and vocal fold contact area. J. Acoust. Soc. Am. 75, 570–580 (1984). doi:10.1121/1.390530
Titze, I.R., Story, B.H.: Acoustic interactions of the voice source with the lower vocal tract. J. Acoust. Soc. Am. 101(4), 2234–2243 (1997). doi:10.1121/1.418246
Titze, I.R.: Regulating glottal airflow in phonation: Application of the maximum power transfer theorem to a low dimensional phonation model. J. Acoust. Soc. Am. 111, 367–376 (2002). doi:10.1121/1.1417526
Titze, I.R.: The myoelastic aerodynamic theory of phonation. National Cent. Voice Speech 1, 197–214 (2006)
Titze, I.R.: Nonlinear source-filter coupling in phonation: theory. J. Acoust. Soc. Am. 123(5), 2733–2749 (2008). doi:10.1121/1.2832337
Westbury, J. R.: X-ray microbeam speech production database user’s handbook. (version 1.0) (UW-Madison), (1994)
Acknowledgments
This research was supported by NIH grant number R01-DC04789.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic Supplementary Material
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Story, B.H., Bunton, K. (2013). Simulation and Identification of Vowels Based on a Time-Varying Model of the Vocal Tract Area Function. In: Morrison, G., Assmann, P. (eds) Vowel Inherent Spectral Change. Modern Acoustics and Signal Processing. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14209-3_7
Download citation
DOI: https://doi.org/10.1007/978-3-642-14209-3_7
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14208-6
Online ISBN: 978-3-642-14209-3
eBook Packages: EngineeringEngineering (R0)