Simulation and Identification of Vowels Based on a Time-Varying Model of the Vocal Tract Area Function

Story, Brad H.; Bunton, Kate

doi:10.1007/978-3-642-14209-3_7

Brad H. Story³ &
Kate Bunton³

Part of the book series: Modern Acoustics and Signal Processing ((MASP))

1369 Accesses
3 Citations

Abstract

In their purest form, vowels can be conceived as being produced with static configurations of the vocal tract shape. Laboratory measurements of both acoustic and articulatory characteristics of vowels are typically performed with this assumption. In the case of natural, connected speech, however, the vocal tract shape undergoes nearly continuous change thus a true “static” configuration is rarely produced. Listeners are able to identify vowels in this time-varying situation, often with greater accuracy than for a vowel deliberately produced without any vocal tract change. This chapter examines the time-varying changes of the vocal tract shape that produce vowel inherent spectral change. Specifically, a model of the vocal tract area function is used to investigate how time-dependent formant frequencies originate from movement of the vocal tract.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Although any artificially-generated speech is strictly synthetic, the term simulated speech is used in this article to denote that it is produced, to some degree, by simulating the physical processes of human sound production. These consist primarily of vocal fold vibration, acoustic wave propagation in the tracheal, nasal, and vocal tract systems, and the radiated acoustic output. In contrast, formant synthesis is an attempt to replicate the acoustic properties of the speech output signal, but not necessarily any of the physical processes that produce those properties.
2.
The publications in which each area function set was reported are as follows: SF0 (Story et al. 1998); SF1, SF2, SF3, SM1, SM2, and SM3 (Story 2005); SM0 (Story et al. 1996; also again, but resampled, in Story and Titze 1998); SM0-2 (Story 2008).
3.
Following Story (2005, 2009), the PCA was performed on the equivalent diameters of the cross-sectional areas rather than on the areas themselves.
4.
To obtain spectro-temporal information for the vowel simulation, time-dependent formant frequencies were obtained from productions of eleven American English vowels (/i, ɪ, e, \(\varepsilon \), æ, ʌ, ɑ, ɔ, o, ℧, u/), spoken in citation form by an adult male speaker. The vowels were recorded in a sound-treated room with an AKG CS1000 microphone. The signal was acquired in digital form at a sampling frequency of 44.1 kHz with a Kay Elemetrics CSL4400 and saved to a file in “wav” format. Formant frequencies were then estimated over the time course of each vowel with the formant analysis module in Praat (Boersma and Weenink 2009). Formant analysis parameters were manually adjusted so that the formant contours of F1 and F2 were aligned with the centers of their respective formant bands in a simultaneously-displayed wide-band spectrogram. Fundamental frequency (F0) and intensity contours (I0) for each vowel were also extracted with the appropriate Praat modules. All formant, F0, and I0 contours were transferred to vector form in Matlab (Mathworks 2008) for further processing.

Abbreviations

\(\Omega (x)\) :: Mean vocal tract diameter function
\(\phi _1(x)\) :: Mode 1 (first principal component)
\(\phi _2(x)\) :: Mode 2 (second principal component)
\(A(x)\) :: Vocal tract area function
\(A(x,t)\) :: Time-dependent vocal tract area function
\(q_1\) :: Scaling coefficient of the first mode
\(q_1(t)\) :: Time dependent version of \(q_1\)
\(q_2\) :: Scaling coefficient of the second mode
\(q_2(t)\) :: Time dependent version of \(q_2\)
F0:: Fundamental frequency
F1:: First formant frequency
F2:: Second formant frequency
F3:: Third formant frequency
I0:: Intensity
MRI:: Magnetic resonance imaging
PCA:: Principal component analysis
VISC:: Vowel inherent spectral change
XRMB:: X-ray microbeam

References

Baer, T., Gore, J.C., Gracco, L.C., Nye, P.W.: Analysis of vocal tract shape and dimensions using magnetic resonance imaging: Vowels. J. Acoust. Soc. Am. 90, 799–828 (1991). doi:10.1121/1.401949
Article Google Scholar
Boersma, P., Weenink, D.: Praat, Version 5.1, www.praat.org, (2009) last viewed on February 2, 2009
Bunton, K., Story, B.H.: Identification of synthetic vowels based on selected vocal tract area functions. J. Acoust. Soc. Am. 125, 19–22 (2009). doi:10.1121/1.3033740
Article Google Scholar
Bunton, K., Story, B.H.: Identification of synthetic vowels based on a time-varying model of the vocal tract area function. J. Acoust. Soc. Am. 127, EL146–EL152 (2010). doi:10.1121/1.3313921
Article Google Scholar
Fant, G.: The Acoustic Theory of Speech Production. Mouton, The Hague (1960)
Google Scholar
Hillenbrand, J., Getty, L.A., Clark, M.J., Wheeler, K.: Acoustic characteristics of American English vowels. J. Acoust. Soc. Am. 97, 3099–3111 (1995). doi:10.1121/1.409456
Article Google Scholar
Hillenbrand, J., Nearey, T.: Identification of resynthesized /hVd/ utterances: Effects of formant contour. J. Acoust. Soc. Am. 105, 3509–3523 (1999). doi:10.1121/1.411694
Article Google Scholar
Hillenbrand, J., Clark, M., Houde, R.: Some effects of duration on vowel recognition. J. Acoust. Soc. Am. 108, 3013–3022 (2000). doi:10.1121/1.1323463
Article Google Scholar
Hillenbrand, J.M.: Static and dynamic approaches to understanding vowel perception. In: Morrison, G.S., Assmann, P.F. (eds.) Vowel Inherent Spectral Change (Chap. 2). Springer , Heidelberg (2013)
Google Scholar
Klatt, D.H., Klatt, L.C.: Analysis, synthesis, and perception of voice quality variations among male and female talkers. J. Acoust. Soc. Am. 87, 820–857 (1990). doi:10.1121/1.398894
Article Google Scholar
Labov, W., Ash, S., Boberg, C.: The Atlas of North American English: Phonetics. Mouton de Gruyter, Berlin (2006)
Google Scholar
Liljencrants, J.: Speech synthesis with a reflection-type line analog. DS Dissertation, Department of Speech Communication and Music Acousstics, Royal Institute of Technology, Stockholm, Sweden (1985)
Google Scholar
The Mathworks, Matlab, Version 7.6.0.324 (R2008a).
Google Scholar
Morrison, G.S.: Theories of vowel inherent spectral change: A review. In: Morrison, G.S., Assmann, P.F. (eds.) Vowel Inherent Spectral Change (Chap. 3). Springer, Heidelberg (2013a)
Google Scholar
Morrison, G.S., Nearey, T.M.: Testing theories of vowel inherent spectral change. J. Acoust. Soc. Am. 122, EL15–EL22 (2007). doi:10.1121/1.2739111
Article Google Scholar
Nearey, T.M.: Static, dynamic, and relational properties in vowel perception. J. Acoust. Soc. Am. 85, 2088–2113 (1989). doi:10.1121/1.397861
Article Google Scholar
Nearey, T.M., Assmann, P.F.: Modeling the role of vowel inherent spectral change in vowel identification. J. Acoust. Soc. Am. 80, 1297–1308 (1986). doi:10.1121/1.394433
Article Google Scholar
Nittrouer, S.: Dynamic spectral structure specifies vowels for children and adults. J. Acoust. Soc. Am. 122, 2328–2339 (2007). doi:10.1121/1.2769624
Article Google Scholar
Stevens, K.N.: On the quantal theory of speech. J. Phonetics 17, 3–45 (1989)
Google Scholar
Story, B.H.: Speech simulation with an enhanced wave-reflection model of the vocal tract. Dissertation, Ph. D, University of Iowa (1995)
Google Scholar
Story, B.H., Titze, I.R., Hoffman, E.A.: Vocal tract area functions from magnetic resonance imaging. J. Acoust. Soc. Am 100, 537–554 (1996). doi:10.1121/1.415960
Article Google Scholar
Story, B.H., Titze, I.R., Hoffman, E.A.: Vocal tract area functions for an adult female speaker based on volumetric imaging. J. Acoust. Soc. Am 104, 471–487 (1998). doi:10.1121/1.423298
Article Google Scholar
Story, B.H., Titze, I.R.: Parameterization of vocal tract area functions by empirical orthogonal modes. J. Phonetics 26, 223–260 (1998). doi:10.1006/jpho.1998.0076
Article Google Scholar
Story, B.H., Titze, I.R., Hoffman, E.A.: The relationship of vocal tract shape to three voice qualities. J. Acoust. Soc. Am. 109, 1651–1667 (2001). doi:10.1121/1.1352085
Article Google Scholar
Story, B.H.: Synergistic modes of vocal tract articulation for American English vowels. J. Acoust. Soc. Am. 118, 3834–3859 (2005). doi:10.1121/1.2118367
Article Google Scholar
Story, B. H.: A technique for ``tuning'' vocal tract area functions based on acoustic sensitivity functions. J. Acoust. Soc. Am. 119(2), 715–718 (2006). doi:10.1121/1.2151802
Article Google Scholar
Story, B.H.: Time-dependence of vocal tract modes during production of vowels and vowel sequences. J. Acoust. Soc. Am. 121, 3770–3789 (2007). doi:10.1121/1.2730621
Article Google Scholar
Story, B.H.: Comparison of Magnetic Resonance Imaging-based vocal tract area functions obtained from the same speaker in 1994 and 2002. J. Acoust. Soc. Am. 123, 327–335 (2008). doi:10.1121/1.2805683
Article Google Scholar
Story, B.H.: Vocal tract modes based on multiple area function sets from one speaker. J. Acoust. Soc. Am. 125, EL141–EL147 (2009). doi:10.1121/1.3082263
Article Google Scholar
Titze, I.R.: Parameterization of the glottal area, glottal flow, and vocal fold contact area. J. Acoust. Soc. Am. 75, 570–580 (1984). doi:10.1121/1.390530
Article Google Scholar
Titze, I.R., Story, B.H.: Acoustic interactions of the voice source with the lower vocal tract. J. Acoust. Soc. Am. 101(4), 2234–2243 (1997). doi:10.1121/1.418246
Article Google Scholar
Titze, I.R.: Regulating glottal airflow in phonation: Application of the maximum power transfer theorem to a low dimensional phonation model. J. Acoust. Soc. Am. 111, 367–376 (2002). doi:10.1121/1.1417526
Article Google Scholar
Titze, I.R.: The myoelastic aerodynamic theory of phonation. National Cent. Voice Speech 1, 197–214 (2006)
Google Scholar
Titze, I.R.: Nonlinear source-filter coupling in phonation: theory. J. Acoust. Soc. Am. 123(5), 2733–2749 (2008). doi:10.1121/1.2832337
Article Google Scholar
Westbury, J. R.: X-ray microbeam speech production database user’s handbook. (version 1.0) (UW-Madison), (1994)
Google Scholar

Download references

Acknowledgments

This research was supported by NIH grant number R01-DC04789.

Author information

Authors and Affiliations

Department of Speech, Language and Hearing Sciences, University of Arizona, Tucson, AZ, USA
Brad H. Story & Kate Bunton

Authors

Brad H. Story
View author publications
You can also search for this author in PubMed Google Scholar
Kate Bunton
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Brad H. Story .

Editor information

Editors and Affiliations

School of Electrical Engineering, and Telecommunications, University of New South Wales, Sydney, 2052, New South Wales, Australia
Geoffrey Stewart Morrison
School of Behavioral and, Brain Sciences, University of Texas at Dallas, Richardson, Richardson, TX, 75083, Texas, USA
Peter F. Assmann

1 Electronic Supplementary Material

ESM 1 (AUDIO 50 KB)

ESM 2 (AUDIO 1031 KB)

ESM 3 (AUDIO 792 KB)

ESM 4 (AUDIO 37 KB)

ESM 5 (AUDIO 38 KB)

ESM 6 (AUDIO 38 KB)

ESM 7 (AUDIO 38 KB)

ESM 8 (AUDIO 38 KB)

ESM 9 (AUDIO 38 KB)

ESM 10 (AUDIO 38 KB)

ESM 11 (AUDIO 38 KB)

ESM 12 (AUDIO 38 KB)

ESM 13 (AUDIO 38 KB)

ESM 14 (AUDIO 38 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Story, B.H., Bunton, K. (2013). Simulation and Identification of Vowels Based on a Time-Varying Model of the Vocal Tract Area Function. In: Morrison, G., Assmann, P. (eds) Vowel Inherent Spectral Change. Modern Acoustics and Signal Processing. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14209-3_7

Download citation

DOI: https://doi.org/10.1007/978-3-642-14209-3_7
Published: 14 December 2012
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14208-6
Online ISBN: 978-3-642-14209-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics