Speech Production and Modelling

  • Tom BäckströmEmail author
Part of the Signals and Communication Technology book series (SCT)


Humans produce speech sounds by pushing air out of the lungs and letting the vocal folds oscillate by the airflow as well as by turbulent constrictions in the vocal tract. The flow-waveform thus created is further modulated by the resonances of the vocal tract. These features form the characteristic properties of phones. For efficient coding, we must model these features with a minimum number of parameters without altering the perceptual impression.


  1. 1.
    Austin, S.F., Titze, I.R.: The effect of subglottal resonance upon vocal fold vibration. J. Voice 11(4), 391–402 (1997)CrossRefGoogle Scholar
  2. 2.
    Benesty, J., Sondhi, M., Huang, Y.: Springer Handbook of Speech Processing. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  3. 3.
    Bozkurt, B., Doval, B., d’Alessandro, C., Dutoit, T.: Zeros of z-transform (zzt) decomposition of speech for source-tract separation. In: Proceedings International Conference Speech, Language Processing (2004)Google Scholar
  4. 4.
    Bozkurt, B., Dutoit, T.: Mixed-phase speech modeling and formant estimation, using differential phase spectrums. In: ISCA Tutorial and Research Workshop on Voice Quality: Functions, Analysis and Synthesis (2003)Google Scholar
  5. 5.
    Degottex, G., Roebel, A., Rodet, X.: Phase minimization for glottal model estimation. IEEE Trans. Audio Speech Lang. Process. 19(5), 1080–1090 (2011)CrossRefGoogle Scholar
  6. 6.
    Erath, B.D., Zañartu, M., Stewart, K.C., Plesniak, M.W., Sommer, D.E., Peterson, S.D.: A review of lumped-element models of voiced speech. Speech Commun. 55(5), 667–690 (2013)Google Scholar
  7. 7.
    Fant, G.: Acoustic Theory of Speech Production. Walter de Gruyter, Germany (1970)Google Scholar
  8. 8.
    Flanagan, J.L.: Speech Analysis: Synthesis and Perception. Springer-Verlag, New York (1972)CrossRefGoogle Scholar
  9. 9.
    Goldstein, U.G.: An articulatory model for the vocal tracts of growing children. Ph.D. thesis, Massachusetts Institute of Technology (1980)Google Scholar
  10. 10.
    Kelly, J.L., Lochbaum, C.C.: Speech synthesis. In: Proceedings Fourth International Congress on Acoustics, vol. G42, pp. 1–4. Copenhagen, Denmark (1962)Google Scholar
  11. 11.
    Laine, U.K.: Modelling of lip radiation impedance in z-domain. In: Proceedings of the ICASSP, vol. 7, pp. 1992–1995. IEEE (1982)Google Scholar
  12. 12.
    Lulich, S.M.: Subglottal resonances and distinctive features. J. Phon. 38(1), 20–32 (2010)CrossRefGoogle Scholar
  13. 13.
    Markel, J.E., Gray, A.H.: Linear Prediction of Speech. Springer-Verlag, Inc., New York (1982)Google Scholar
  14. 14.
    Palo, J., Aalto, D., Aaltonen, O., Happonen, R.P., Malinen, J., Saunavaara, J., Vainio, M.: Articulating finnish vowels: results from MRI and sound data. Ling. Ural. 48(3), 194–199 (2012)Google Scholar
  15. 15.
    Pulkki, V., Karjalainen, M.: Communication Acoustics: An Introduction to Speech, Audio and Psychoacoustics. Wiley, New Jersey (2015)Google Scholar
  16. 16.
    Rabiner, L.R., Schafer, R.W.: Digital Processing of Speech Signals, vol. 100. Prentice-Hall, Englewood Cliffs (1978)Google Scholar
  17. 17.
    Ramasubramanian, V.: Ultra low bit-rate speech coding: an overview and recent results. In: Signal Processing and Communications (SPCOM), 2012 International Conference on, pp. 1–5. IEEE (2012)Google Scholar
  18. 18.
    Ramasubramanian, V., Harish, D.: Ultra low bit-rate speech coding based on unit-selection with joint spectral-residual quantization: no transmission of any residual information. In: Proceedings of the Interspeech (2009)Google Scholar
  19. 19.
    Rossing, T.D.: The Science of Sound. Addison-Wesley, New York (1990)Google Scholar
  20. 20.
    Smith III, J.O.: Physical audio signal processing for virtual musical instruments and audio effects. In: Center for Computer Research in Music and Acoustics (CCRMA) (2013)Google Scholar
  21. 21.
    Tokuda, K., Masuko, T., Hiroi, J., Kobayashi, T., Kitamura, T.: A very low bit rate speech coder using hmm-based speech recognition/synthesis techniques. In: Proceedings of the ICASSP, vol. 2, pp. 609–612. IEEE (1998)Google Scholar
  22. 22.
    Vary, P., Martin, R.: Digital Speech Transmission: Enhancement, Coding and Error Concealment. Wiley, New Jersey (2006)CrossRefGoogle Scholar
  23. 23.
    Wikipedia. Formant — Wikipedia, the free encyclopedia (2015). Accessed 1 Dec 2015Google Scholar
  24. 24.
    Wikipedia. International phonetic alphabet chart for English dialects — Wikipedia, the free encyclopedia (2015). Accessed 1 Dec 2015Google Scholar
  25. 25.
    Wikipedia. Table of vowels — Wikipedia, the free encyclopedia (2015). Accessed 1 Dec 2015Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.International Audio Laboratories Erlangen (AudioLabs)Friedrich-Alexander University Erlangen-Nürnberg (FAU)ErlangenGermany

Personalised recommendations