Learning About Speech from Data: Beyond NETtalk

  • Robert I. Damper
Part of the Telecommunications Technology & Applications Series book series (TTAP)


Speech synthesis is an emerging technology with a wide range of potential applications. In most such applications, the message to be spoken will be in the form of text input, so the main focus of development is text-to-speech (TTS) synthesis. Strongly influenced by the academic traditions of generative linguistics, early work on TTS systems took it as axiomatic that a knowledge-based approach was essential to successful implementation. Presumed theoretical constraints on the learnability of their native language by humans were applied by extension to machine learners to conclude the futility of trying to make useful ‘blank slate’ inferences about speech and language simply from exposure. This situation has changed dramatically in recent years with the easy availability of computers to act as machine learners and large databases to act as training resources. Many positive achievements in machine learning have comprehensively proven its usefulness in a range of natural language processing tasks, despite the negative assumptions of earlier times. Thus, contemporary speech synthesis relies heavily on data-driven techniques.

This chapter introduces and motivates the topic of data-driven speech synthesis, and outlines the concepts that will be encountered in the rest of the book. The main problems that any TI’S system must solve are: automatic generation of pronunciation, prosodic adjustment, and synthesis of the final output speech. The first of these problems has been quite well-studied and it is here that machine-learning techniques have been most obviously applied. Indeed, the problem of text-phoneme conversion (the ‘Nettalk’ problem) has become something of a benchmark in machine learning and, hence, we will have most to say on this topic. As the utility of data-driven methods becomes ever more widely accepted, however, attention is starting to turn to the use of these techniques in other areas of synthesis, most particularly modelling and generation of prosody, and the generation of the output speech itself.


Hide Unit Speech Synthesis Central Letter Natural Language Processing Task Generative Linguistic 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer Science+Business Media Dordrecht 2001

Authors and Affiliations

  • Robert I. Damper
    • 1
  1. 1.Department of Electronics and Computer ScienceUniversity of SouthamptonUK

Personalised recommendations