Learning Full Pitch Variation Patterns with NeuralNets

  • Tingshao Zhu
  • Wen Gao
  • Charles X. Ling
Conference paper


Prosodic model is very important for speech synthesis. It includes pitch model, duration model and pause model, and pitch model is the most important. Now most pitch models are constructed by linguistics experts, and they are described qualitatively and with low precise. We consider the pitch models as the mapping between the pitches of isolate syllable and those of the same one in phrase, so neural net can be used to learn the patterns. For acquiring these patterns quantitatively and precisely, BP networks are established to extract pitch and duration variation patterns from large speech database. Since the networks have been trained from actual speech samples, the quality of synthesis speech which is based on the networks can be high. In this paper, the architecture is first specified, then the new time wrapping algorithm and the networks are introduced in detail, and at last results are given too.


Speech Data Speech Synthesis Duration Model Pitch Variation Speech Database 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Lin Tao, Wang Lijia. Acoustics course of study. Peking University Press, Beijing, 1994.Google Scholar
  2. 2.
    Chu Min. Research on Chinese TTS system with high intelligibility and naturalness. Ph.D thesis, Institute of Acoustics, Academia Sinica, 1995.Google Scholar
  3. 3.
    Usama M. Fayyad, Gregory Piatetsky-Shapiro, Padhraic Smyth, and Ramasamy Uthurusamy. Adavance In Knowledge Dicovery And Data Mining. AAAI/MIT Press, 1996.Google Scholar
  4. 4.
    George H. John. Enhancements to the Data Mining Process. Ph.D thesis, Stanford University, 1997.Google Scholar
  5. 5.
    Yang Xingjun, Chi Huisheng. Speech Signal Digital Process. Publishing House of Electronic Industry, Beijing, 1990.Google Scholar
  6. 6.
    Wang Wei. Principle of Artificial Neural Network — rudiment and implement. Beijing University of Aeronautics and Astronautics Press, Beijing, 1995.Google Scholar
  7. 7.
    Kero, B., L. Russell, S. Tsur and W.M. Shen. An Overview of Data Mining Technologies. The KDD Workshop in the 4th International Conference on Deductive and Object-Oriented Databases, Singapore, 1995.Google Scholar
  8. 8.
    Famili. The Role of Data Pre-processing in Intelligent Data Analysis. Proceedings of the IDA-95 Symposium, Baden-Baden, Germany, 1995, pp 54–58.Google Scholar
  9. 9.
    J. Han, Y. Fu, Y. Huang, Y. Cai, and N. Cercone. DBLearn: A system prototype for knowledge discovery in relational databases. Proc. 1994 ACM-SIGMOD Int’l Conf. on Management of Data (SIGMOD’94), Minneapolis, MN, May 1994.Google Scholar
  10. 10.
    Wu Zongji. The tone variation in mandarin. Chinese grammar. No 6, 1982, pp 439–449.Google Scholar
  11. 11.
    Lin Maocan, Yan Jinzu, Sun guohua.: Experiment of the normal accent in Beijing dialect. Dialect. No 1, 1984.Google Scholar

Copyright information

© Springer-Verlag London Limited 1999

Authors and Affiliations

  • Tingshao Zhu
    • 1
  • Wen Gao
    • 1
  • Charles X. Ling
    • 2
  1. 1.MOTOROLA-ICT Joint R&D Lab, Institute of Computing TechnologyAcademia SinicaBeijingChina
  2. 2.Department of Computer ScienceUniversity of Western OntarioLondonCanada

Personalised recommendations