Aerodynamic and Acoustic Theory of Voice Production

Chapter

Abstract

A theory of voice production for vowels has to deal with two related problems; the problem of biomechanical modeling of vocal fold vibrations and the problem of calculating volume-velocity airflow through the glottis or the glottal airflow. This report is a tutorial on the second problem. We call this the aerodynamic and acoustic theory of voice production. Calculation of glottal airflow is difficult since it depends on an interaction between (1) the nonlinear time varying glottal impedance specified in the time domain and (2) the subglottal and vocal tract input impedances specified in the frequency domain. The effect of glottal geometry on the glottal impedance and the role of glottal impedance elements like kinetic resistance, viscous resistance and glottal inductance in determining glottal airflow are discussed. Methods to calculate vocal tract or subglottal input impedance based on a transmission line analog model and a formant network model are presented. Equations to find glottal airflow with source-filter interaction are derived. A digital pole-zero modeling of input impedance is proposed for an efficient and accurate computation of glottal airflow. The role of various factors in determining the so called residue, ripple and superposition components of glottal airflow is discussed with examples. The time domain response of a vowel is calculated using the glottal airflow with source-filter interaction. The instantaneous frequency and instantaneous bandwidth of an interactive vowel response are computed and interpreted. Further research is needed to extend the theory to the case of breathy vowels, vowel onsets, consonant to vowel and vowel to consonant transitions where the acoustic waves are superposed on a large dynamically changing mean airflow. A good understanding of the theory guides one in appropriate modeling and interpretation of voice source. The relevant features in voice source for a specific application such as forensic speaker identification can thus be identified. The author believes that habitually formed relative dynamic variations in voice source parameters are of greater significance in forensic speaker recognition.

Keywords

Covariance Respiration Adduct Convolution Sine 

References

  1. 1.
    Dunn HK (1950) The calculation of vowel resonances and an electrical vocal tract. J Acoust Soc Am 22(6):740–753 (Reproduced in speech analysis, Schafer RW, Markel JD (eds), IEEE Press)Google Scholar
  2. 2.
    Stevens KN, Kaowski S, Fant G (1953) An electrical analog of the vocal tract. J Acoust Soc Am 25(4):734–742CrossRefGoogle Scholar
  3. 3.
    Fant G (1960) Acoustic theory of speech production. Mouton, HagueGoogle Scholar
  4. 4.
    Flanagan JL (1965) Speech analysis, synthesis and perception, 1st edn. Springer, New York (2nd edn 1972)Google Scholar
  5. 5.
    Sondhi MM (1974) Model for wave propagation in a lossy vocal tract. J Acoust Soc Am 55(5):1070–1075CrossRefGoogle Scholar
  6. 6.
    Van Den Berg Jw, Zantema JT, Doornebal P (1957) On the air response and the Bernoulli effect of the human larynx. J Acoust Soc Am 29:626–631CrossRefGoogle Scholar
  7. 7.
    Flanagan JL (1959) Estimates of intraglottal pressure during phonation. J Speech Hear Res 2:168–172Google Scholar
  8. 8.
    Van Den Berg Jw, Zantema JT, Doornebal P (1959) Myoelastc-aerodynamic theory of voice production. J Speech Hear Res 1:227–243Google Scholar
  9. 9.
    Flanagan JL, Landgraf LL(1968) Self-oscillating source for vocal tract synthesizers. IEEE Trans Audio Electroacoust AU-16:57–64CrossRefGoogle Scholar
  10. 10.
    Flanagan JL, Cherry L (1969) Excitation of vocal-tract synthesizers. J Acoust Soc Am 45(3):764–769CrossRefGoogle Scholar
  11. 11.
    Ishizaka K, Flanagan JL (1972) Synthesis of voiced sounds from the two mass model of the vocal cords. BSTJ 51:1233–1267Google Scholar
  12. 12.
    Ishizaka K, Matsudiara M (1972) Fluid mechanical consideration of vocal cord vibrations. SCRL Monograph No. 8, Santa Barbara, CaliforniaGoogle Scholar
  13. 13.
    Ishizaka K, Matsudiara M (1972) Theory of vocal cord vibrations. Rep Univ Electro Commn 23:107–136Google Scholar
  14. 14.
    Flanagan JL, Ishizaka K, Shipley KL (1975) Synthesis of speech from a dynamic model of the vocal cords and vocal tract. BSTJ 54:485–506Google Scholar
  15. 15.
    Broad DJ (1979) The new theories of vocal fold vibration. Speech and language: advances in basic research and practice, vol 2. Academic, New York, pp 203–256Google Scholar
  16. 16.
    Titze IR (1980) Comments on the myoelastic-aerodynamic theory of phonation. J Speech Hear Res 23:495–510Google Scholar
  17. 17.
    McGowan R (1991) Phonation from a continuum mechanics points of view. In: Gauffin J, Hammarberg B (eds) Vocal fold physiology: acoustics, perceptual and physiological aspects of voice mechanisms. Singular publishing, San Diego, pp 65–72Google Scholar
  18. 18.
    Titze IR (1994) Principles of voice production. Prentice Hall, Englewood CliffGoogle Scholar
  19. 19.
    Sondhi MM, Schroeter J (1987) A hybrid time-frequency domain articulatory speech synthesizer. IEEE Trans ASSP 35(7):955–967CrossRefGoogle Scholar
  20. 20.
    Schroeter J, Sondhi MM (1992) Speech coding based on physiological models of speech production. In: Furui S, Sondhi MM (eds) Advances in speech signal processing. Marcel Dekker, New York, pp 231–267Google Scholar
  21. 21.
    Hiki S, Koike Y, Takahashi H (1970) Simultaneous measurement of subglottal and supraglottal pressure variation. 79th meeting, ASA, paper DD4, April 1970Google Scholar
  22. 22.
    Kitzing P, Lofquist A (1975) Subglottal and oral air pressures during phonation—preliminary investigation using a miniature transducer system. Med Bio Eng 13:644–648 (Sept 1975)Google Scholar
  23. 23.
    Cranen B, Boves L (1985) Pressure measurement during speech production. J Acoust Soc Am 77:1543–1551CrossRefGoogle Scholar
  24. 24.
    Mrayati M, Guerin B, Boe LJ (1976) Etude de l’impedeance d’entrée du conduit vocal—couplage source-conduit vocal. Acoustica 35:330–340Google Scholar
  25. 25.
    Guerin B, Mrayati M, Carre R (1976) A voice source taking into account of coupling with the supraglottal cavities. ICASSP 1:47–50Google Scholar
  26. 26.
    Fant G, Liljencrants J (1979) Perception of vowels and truncated intraperiod decay envelopes. STL-QPSR 1:79–84Google Scholar
  27. 27.
    Rothenberg M (1981) An interactive model for the voice source. STL-QPSR 4:1–7Google Scholar
  28. 28.
    Rothenberg M (1983) Acoustic interaction between the glottal source and the vocal tract. In: Bless DM, Abbs JH (eds) Proc. Conf. Vocal Fold Physiology, Kurume, Japan, 1980, pp 305–323, College Hill, San Diego,Google Scholar
  29. 29.
     Al-Ansari A, Guerin B, Degryse D (1981) Subglottal impedance effects on the vocal source signal. IV FASE Symp April:21–24 (Venezia)Google Scholar
  30. 30.
     Fant G (1981) The source-filter concept in voice production. STL-QPSR 1:21–37Google Scholar
  31. 31.
     Ananthapadmanabha TV, Fant G (1981) Glottal flow calculations. Paper JJ3, 102nd ASA meeting, Miami Beach, FloridaGoogle Scholar
  32. 32.
     Ananthapadmanabha TV, Fant G (1982) Calculation of true glottal flow and its components. STL-QPSR 1:1–30 (Also Speech Commun 1:167–184)Google Scholar
  33. 33.
     Fant G, Ananthapadmanabha TV (1982) Truncation and superposition. STL-QPSR 2–3:1–17Google Scholar
  34. 34.
     Scherer RW (1981) Laryngeal fluid mechanics: steady flow considerations using static models. PhD thesis, University of IowaGoogle Scholar
  35. 35.
     Gauffin J, Binh N, Ananthapadmanabha TV, Fant G (1981) Glottal geometry of volume-velocity waveform. Proc. research conf. voice physiology, Madison, WI, USA, May 31,Jun 4, 1981Google Scholar
  36. 36.
     Ananthapadmanabha TV, Gauffin J (1983) Some results on the aerodynamic and acoustic factors in phonation. STL-QPSR 1, 1983(and also in Titze IR, Scherer R (eds) Proc. vocal fold physiology conf., Denver Center for the performing Arts, Denver, Colarado, 1983)Google Scholar
  37. 37.
    Scherer R, Titze IR (1983) Pressure-flow relationships in a model of the laryngeal airway with diverging glottis. In: Bless DM, Abbs JH (eds) Vocal fold physiology: contemporary research and clinical issues. College-Hill Press, San DiegoGoogle Scholar
  38. 38.
    Ananthapadmanabha TV, Nord L, Fant G (1982) Perceptual discriminability of nonexponential/exponential damping of the first formant of vowel sounds. In: Carlson R, Granstrom B (eds) The representation of speech in the peripheral auditory system. North-Holland, Amsterdam, pp 217–222Google Scholar
  39. 39.
    Nord L, Ananthapadmanabha TV, Fant G (1984) Signal analysis and perceptual tests of vowel responses with an interactive source filter model. STL-QPSR 25(2–3):25–52Google Scholar
  40. 40.
    Hirano M (1981) Clinical examination of voice. Springer, New YorkGoogle Scholar
  41. 41.
    Ingard U, Ising H (1967) Acoustic nonlinearity of an orifice. J Acoust Soc Am 42(1):6–17CrossRefGoogle Scholar
  42. 42.
    Laine U, Karjalainen M (1986) Measurements on the effects of glottal opening and flow on the glottal impedance. ICASSP , paper 31.6.1, pp 1621–1625Google Scholar
  43. 43.
    Badin P, Bailly G, Raybaudi M, Segebarth C (1998) A three-dimensional linear acoustic articulatory model based on MRI data. Proc 5th ICSLP, vol 2, pp 417–420Google Scholar
  44. 44.
    Coker CH (1976) A model of articulatory dynamics and control. Proc IEEE 64(4):452–460CrossRefGoogle Scholar
  45. 45.
    Mermelstein P (1973) Articulatory model for the study of speech production. J Acoust Soc Am 53(4):1070–1082CrossRefGoogle Scholar
  46. 46.
    Ishizaka K, Masudiara M, Kaneko T (1976) Input acoustic impedance measurement of the subglottal system. J Acoust Soc Am 60:910–917Google Scholar
  47. 47.
    Koike V, Hirano M (1973) Glottal area time function and subglottal pressure variation. J Acoust Soc Am 54:1618–1672CrossRefGoogle Scholar
  48. 48.
    Van Valkenberg ME (1976) Network analysis. Prentice-Hall, New DelhiGoogle Scholar
  49. 49.
    Wakita H, Fant G (1978) Toward a better vocal tract model. STL-QPSR 1:9–29Google Scholar
  50. 50.
    Sondhi MM, Sinder DJ (2004) Articulatory modeling: a role in concatenative text to speech synthesis. In: Narayanan S,  Alwan A (eds) Text to Speech Synthesis: New Paradigms and Advances. Pearson education, India (205, Chapter 4), pp 85–109Google Scholar
  51. 51.
    Badin P, Fant G (1984) Notes on vocal tract computation. STL-QPSR 2–3:53–107Google Scholar
  52. 52.
    Kelly JL, Lochbaum CC (1962) Speech synthesis. In ICA 4, paper G42 (Also in Speech synthesis, Rabiner LR, Flanagan JL (eds), Dowden Wiley, 1973)Google Scholar
  53. 53.
    Maeda S (1982) A digital simulation method of the vocal tract system. Speech Commun 1:199–229CrossRefGoogle Scholar
  54. 54.
    Strube HW (1982) Time varying wave digital filters and vocal tract models. ICASSP 1982:923–926Google Scholar
  55. 55.
    Laine U (1982) Modeling lip radiation in the z-domain. ICASSP 1982:1992–1995Google Scholar
  56. 56.
    Liljencrants J (1985) Speech synthesis with a reflection-type line analog. DSc dissertation, Department of speech communication and music acoustics, RIT, StockholmGoogle Scholar
  57. 57.
    Gold B, Rabiner LR (1968) Analysis of digital and analog formant synthesizers. RLE Tech rep.465, June 1968Google Scholar
  58. 58.
    Laine U (1988) Higher pole correction in vocal tract models and terminal analogs. Speech Commun 7:21–40CrossRefGoogle Scholar
  59. 59.
    Plumpe MD, Quatieri TF, Reynolds DA (1999) Modeling of the glottal flow derivative waveform with application to speaker identification. IEEE Trans Acoust Speech Signal Process 7(5):569–585Google Scholar
  60. 60.
    Fant G Private communicationGoogle Scholar
  61. 61.
    Rosenberg AE (1971) Effect of glottal pulse shape on the quality of natural vowel. J Acoust Soc Am 49:583–590CrossRefGoogle Scholar
  62. 62.
    Ananthapadmanabha TV (1984) Acoustic analysis of voice source dynamics. STL-QPSR 2–3:1–24Google Scholar
  63. 63.
    Ananthapadmanabha TV (1993) Working papers, MIT speech group report, vol ixGoogle Scholar
  64. 64.
    Fant G, Liljencrants J, Lin Q (1985) A four parameter model of glottal flow. STL-QPSR 4:1–13Google Scholar
  65. 65.
    Matasuek MR, Batalev VS (1980) A new approach to the determination of glottal waveform. IEEE Trans ASSP 28:616–622CrossRefGoogle Scholar
  66. 66.
    Rice DL (1974) Articulatory tracking of the acoustic speech signal. Proc. speech commn seminar, Stockholm pp 21–26Google Scholar
  67. 67.
    Kitzing P, Lofquist A (1979) Evaluation of voice therapy by means of photoglottography. Folia Phoniatrica 31:103–109CrossRefGoogle Scholar
  68. 68.
    Titze IR (1984) Parameterization of the glottal area, glottal flow and vocal fold contact area. J Acoust Soc Am 75(2):57–580CrossRefGoogle Scholar
  69. 69.
    Laine U Input impedance data were computed and provided by Laine.Google Scholar
  70. 70.
    Fant G Private communication clarifying the x-ray dataGoogle Scholar
  71. 71.
    Lofquist A, Ananthapadmanabha TV Unpublished. Simultaneous Measurement of glottal area using trans-illumination photo-glottographic equipment and airflow using a maskGoogle Scholar
  72. 72.
    Liljencrants J INA program for inverse filtering. KTH, StockholmGoogle Scholar
  73. 73.
    Cranen B, Boves L (1983) Pressure 77Google Scholar
  74. 74.
    Miller RL (1959) Nature of vocal cord wave. J Acoust Soc Am 31:667–677CrossRefGoogle Scholar
  75. 75.
    Holmes JN (1962) An investigation of the volume velocity waveform at the larynx during speech by means of an inverse filter. Proc IV Intnl Cong on Ac, pp 1–4, August 1962Google Scholar
  76. 76.
    Atal BS, Hanauer SL (1971) Speech analysis and synthesis by linear prediction of the speech wave. J Acoust Soc Am 50:637–655 (Aug 1971)Google Scholar
  77. 77.
    Makhoul J (1975) Linear prediction: a tutorial review. Proc. IEEE, vol 63, April 1975, pp 561–580 (Also in Schafer RW, Markel JD (eds) Speech Analysis, IEEE Press)Google Scholar
  78. 78.
    Ananthapadmanabha TV, Yegnanarayana B (1979) Epoch extraction from linear prediction residual. IEEE Trans ASSP 27:309–319CrossRefGoogle Scholar
  79. 79.
    Cheng Y, Guerin B (1987) A study of the source-filter interactive concept and its application to male and female speech synthesis. Bulletin du Lab de la Commun. Parlee, No. 1A, pp 29–66, INPG-ENSERG, Grenoble, FranceGoogle Scholar
  80. 80.
    Lin Q (1990) Speech production theory and articulatory speech synthesis. PhD thesis, KTHGoogle Scholar
  81. 81.
    Childers DG, Lee CK (1991) Voice quality factors: analysis, synthesis and perception. J Acoust Soc Am 90(5):2394–2410CrossRefGoogle Scholar
  82. 82.
    Ananthapadmanabha TV, Prasad MG (1989) A note on jet noise component in phonation. Tech Mem, AT&T Bell LabsGoogle Scholar
  83. 83.
    Rothenberg M (1983) Source-tract interaction in breathy voice. In: Titze IR, Scherer RC (eds) Vocal fold physiology—biomechanics, acoustics and phonatory control. Denver center for the performing arts, ColaradoGoogle Scholar
  84. 84.
    Lofquist A, Koenig L, McGowan RS (1995) Vocal tract aerodynamics in/aCa/utterances: measurements. Speech Commun 16:49–66CrossRefGoogle Scholar
  85. 85.
    Sundberg J, Gauffin J (1981) Waveform and spectrum of glottal voice source. STL-QPSR 2–3Google Scholar
  86. 86.
    Fujisaki H, abd Ljungquist M (1986) Proposal and evaluation of models for the glottal source waveform. ICASSP 1986, paper 31.2.1, pp 1605–1608Google Scholar
  87. 87.
    Ananthapadmanabha TV(1991) Spectral parameters of a voice source model. 122nd meeting ASA, 1991. (Also in Speech technology for man-machine interaction Rao PVS, Kalia BB (eds) Tata McGraw Hill, New Delhi, 1993)Google Scholar
  88. 88.
    Ananthapadmanabha TV (1995) See discussion section on “waveforms and spectrum envelopes”. In: Fujimura O, Hirano M (eds) Vocal fold physiology: voice quality control. Singular Publishing, San Diego, pp 347–353 (Fujimura O, Chap 22)Google Scholar
  89. 89.
    Jankowski CR (1996) Fine structure features for speaker identification. PhD thesis, MIT, USAGoogle Scholar
  90. 90.
    Laver J (1980) Phonetic description of voice quality. Cambridge University Press, UKGoogle Scholar
  91. 91.
    Nolan JF (1983) The phonetic bases of speaker recognition. Cambridge University Press, CambridgeGoogle Scholar
  92. 92.
    Koenig BE (1986) Spectrographic voice identification: a forensic survey. J Acoust Soc Am 79:2088–2090CrossRefGoogle Scholar
  93. 93.
    Amino,K, Arai T (2009) Speaker-dependent characteristics of the nasals. Forensic Sci Int 185(1–3):21–28 (Mar 2009)Google Scholar
  94. 94.
    Ananthapadmanabha TV (1982) Intelligibility carried by speech source functions. STL-QPSR 4:49–64Google Scholar
  95. 95.
    Quatieri TF, Jankowski CR, Reynolds DA (1994) Energy onset times for speaker identification. IEEE Sig Process Lett 1(11):160–162 (Nov 1994)Google Scholar
  96. 96.
    Jankowski CR, Quatieri TF, Reynolds DA (1996) Fine structure features for speaker identification. IEEE international Conference on acoustics, speech and signal processing, pp II-689–692, May 1996Google Scholar
  97. 97.
    Ananthapadmanabha TV(1978) Epoch extraction of voice speech. PhD thesis, Indian institute of science, BangaloreGoogle Scholar
  98. 98.
    Mahadeva Prasanna SR, Gupta CS, Yegnanarayana B (2006) Extraction of speaker-specific excitation information from linear prediction residual of speech. Speech Commun 48(10):1243–1261 (Oct 2006)Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  1. 1.Voice and Speech SystemsMalleswaram, BangaloreIndia

Personalised recommendations