Encyclopedia of Biometrics

Living Edition
| Editors: Stan Z. Li, Anil K. Jain

Voice Sample Synthesis

  • Juergen Schroeter
  • Alistair Conkie
Living reference work entry
DOI: https://doi.org/10.1007/978-3-642-27733-7_6-3



Over the last decade, speech synthesis, the technology that enables machines to talk to humans, has become so natural sounding that a naïve listener might assume that he/she is listening to a recording of a live human speaker. Speech synthesis is not new; indeed, it took several decades to arrive where it is today. Originally starting from the idea of using physics-based models of the vocal tract, it took many years of research to perfect the encapsulation of the acoustic properties of the vocal tract as a “black box,” using so-called formant synthesizers. Then, with the help of ever more powerful computing technology, it became viable to use snippets of recorded speech directly and glue them together to create new sentences in the form of concatenative synthesizers. Combining this idea with now available methods for fast search, potentially millions of choices are evaluated to find the optimal...


Automatic Speech Recognition Vocal Tract Speech Synthesis Unit Selection Speaker Verification System 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in to check access.


  1. 1.
    J. Schroeter, Basic principles of speech synthesis, in Springer Handbook of Speech Processing and Communication, chap. 19, ed. by J. Benesty (Springer, Berlin, 2008)Google Scholar
  2. 2.
    J.L. Bader, Presidents as pitchmen, and posthumous play-by-play, commentary. New York Times, 9 Aug 2001Google Scholar
  3. 3.
    J. van Santen, R. Sproat, J. Olive, J. Hirschberg (eds.) Progress in Speech Synthesis, section III (Springer, New York, 1997)Google Scholar
  4. 4.
    J.N. Holmes, Research report formant synthesizers: cascade or parallel? Speech Commun. 2 (4), 251–273 (1983)CrossRefGoogle Scholar
  5. 5.
    R. Sproat, (ed.), Multilingual Text-to-Speech Synthesis. The Bell Labs Approach (Kluwer Academic, Dordrecht, 1998)Google Scholar
  6. 6.
    A. Hunt, A.W. Black, Unit selection in a concatenative speech synthesis system using a large speech database, in Proceedings of the ICASSP-96, Atlanta, 1996, pp. 373–376Google Scholar
  7. 7.
    G.D. Forney, The viterbi algorithm. Proc. IEEE 61 (3), 268–278 (1973)MathSciNetCrossRefGoogle Scholar
  8. 8.
    T. Dutoit, Corpus-based speech synthesis, in Springer Handbook of Speech Processing and Communication, chap. 21, ed. by J. Benesty (Springer, Berlin, 2008)Google Scholar
  9. 9.
    J. van Santen, Prosodic processing, in Springer Handbook of Speech Processing and Communication, chap. 23, ed. by J. Benesty (Springer, Berlin, 2008)Google Scholar
  10. 10.
    E. Cosatto, H.P. Graf, J. Ostermann, J. Schroeter, From audio-only to audio and video text-to-speech. Acta Acust. 90, 1084–1095 (2004)Google Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  1. 1.AT&T Labs ResearchFlorham ParkUSA