Mining Intonation Corpora Using Knowledge Driven Sequential Clustering

  • David Escudero-Mancebo
  • Valentín Cardeñoso-Payo
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4140)


This work presents a mining methodology designed to cope with the usual data scarcity problems of intonation corpora which arises from the high variability of prosodic information. The methodology is an adaptation of a basic agglomerative clustering technique, guided by a set of domain constraints. The peculiarities of the text-to-speech intonation modelling problem are considered in order to fix the initial configuration of the cluster and the criteria to merge classes and stopping their splitting. The scarcity problem poses the need to apply a sequential selection mechanism of prosodic features, in order to obtain the initial set of classes in the cluster. A searching strategy to select the best class among a set of alternatives is proposed, which provides useful prediction models for accurate synthetic intonation. Visualization of final classes by means of a modified decision tree brings graphical cues about contrastable prosodic information of the intonation corpus.


Cluster Technique Acoustic Parameter Stress Group Speaker Recognition Prosodic Feature 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Aguado, P.D., Wimmer, K., Bonafonte, A.: Joint extraction and prediction of fujisaki’s intonation model parameters. In: Proceedings of Eurospeech 2005 (2005)Google Scholar
  2. 2.
    Allen, J., Hunnicutt, M.S., Klatt, D.: From Text to Speech: The MITalk System. Cambridge University Press, Cambridge (1987)Google Scholar
  3. 3.
    Botinis, A., Granstrom, B., Moebius, B.: Developments and Paradigms in Intonation Research. Speech Communications 33, 263–296 (2001)MATHCrossRefGoogle Scholar
  4. 4.
    Cardeoso, V., Escudero, D.: A strategy to solve data scarcity problems in corpus based intonation modelling. In: Proceedings of ICASSP 2004 (2004)Google Scholar
  5. 5.
    Escudero, D.: Modelado Estadstico de Entonacin con Funciones de Bzier: Aplicaciones a la Conversin Texto Voz. PhD thesis, Dpto. de Informtica, Universidad de Valladolid, Espaa (2002)Google Scholar
  6. 6.
    Escudero, D., Cardeoso, V., Bonafonte, A.: Corpus based extraction of quantitative prosodic parameters of stress groups in spanish. In: Proceedings of ICASSP 2002, Mayo (2002)Google Scholar
  7. 7.
    Escudero, D., Cardeoso, V.: Optimized selection of intonation dictionaries in corpus based intonation modelling. In: Proceedings of Eurospeech (September 2005)Google Scholar
  8. 8.
    Gerhard, D.: Pitch extraction and fundamental frequency: History and current techniques. Technical Report TR-CS 2003-06, Department of Computer Science, University of Regina, Regina, Saskatchewan, CANADA (November 2003)Google Scholar
  9. 9.
    Hart, J., Collier, R., Cohen, A.: A perceptual study of intonation. An experimental approach to speech melody. Cambridge University Press, Cambridge (1990)CrossRefGoogle Scholar
  10. 10.
    Hermes, D.J.: Measuring the perceptual similarity of pitch contours. Journal of Speech, Language, and Hearing Research 41, 73–82 (1994)Google Scholar
  11. 11.
    Jain, A.K., Murty, M.N., Flynn, P.J.: Data Clustering: A Review. ACM Computing Surveys 31(3), 264–323 (1999)CrossRefGoogle Scholar
  12. 12.
    Joskisch, O., Mixdorff, H., Kruschke, H., Kordon, U.: Learning the parameters of quantitative prosody models. In: Proceedings of ICSLP 2000 (2000)Google Scholar
  13. 13.
    Navarro-Toms, T.: Manual de Entonacin Espaola. Madrid, Guadarrama (1944)Google Scholar
  14. 14.
    Sakai, S.: Additive modeling of english f0 contours for speech synthesis. In: Proceedings of ICASSP 2005 (2005)Google Scholar
  15. 15.
    Shriberg, E., Ferrer, L., Kajarekar, S., Venkataraman, A., Stolcke, A.: Modeling Prosodic Feature Sequences for Speaker Recognition. Speech Communication 46(3-4), 455–472 (2005)CrossRefGoogle Scholar
  16. 16.
    Shriberg, E., Stolcke, A., Hakkani, D., Tur, G.: Prosody-Based Automatic Segmentation into Sentences and Topics. Speech Communication 32(1-2), 127–154 (2000)CrossRefGoogle Scholar
  17. 17.
    Sosa, J.M.: La Entonacin del Espaol. Ctedra (1999)Google Scholar
  18. 18.
    Sproat, R.: Multilingual Text-to-Speech Synthesis. Kluwer, Dordrecht (1998)Google Scholar
  19. 19.
    Taylor, P.: Analysis and Synthesis of Intonation using the Tilt Model. Journal of Acoustical Society of America 107(3), 1697–1714 (2000)CrossRefGoogle Scholar
  20. 20.
    Webb, A.: Statistical Pattern Recognition, 2nd edn. Wiley, Chichester (2002)MATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • David Escudero-Mancebo
    • 1
  • Valentín Cardeñoso-Payo
    • 1
  1. 1.Department of Computer ScienceUniversity of ValladolidValladolidSpain

Personalised recommendations