Expressive Speech Recognition and Synthesis as Enabling Technologies for Affective Robot-Child Communication

  • Selma Yilmazyildiz
  • Wesley Mattheyses
  • Yorgos Patsis
  • Werner Verhelst
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4261)


This paper presents our recent and current work on expressive speech synthesis and recognition as enabling technologies for affective robot-child interaction. We show that current expression recognition systems could be used to discriminate between several archetypical emotions, but also that the old adage ”there’s no data like more data” is more than ever valid in this field. A new speech synthesizer was developed that is capable of high quality concatenative synthesis. This system will be used in the robot to synthesize expressive nonsense speech by using prosody transplantation and a recorded database with expressive speech examples. With these enabling components lining up, we are getting ready to start experiments towards hopefully effective child-machine communication of affect and emotion.


Speech Signal Speech Synthesis Emotional Speech Pitch Period Speech Synthesizer 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Simon et Odil: Website for hospitalized children,
  2. 2.
    IBBT research project ASCIT: Again at my School by fostering Communication through Interactive Technologies for long term sick children,
  3. 3.
    Anty project website,
  4. 4.
    Anty foundation website,
  5. 5.
    Breazeal, C., Aryananda, L.: Recognition of Affective Communicative Intent in Robot-Directed Speech. Autonomous Robots 12, 83–104 (2002)MATHCrossRefGoogle Scholar
  6. 6.
    Oudeyer, P.: The production and recognition of emotions in speech: features and algorithms. International Journal of Human-Computer Studies 59, 157–183 (2003)CrossRefGoogle Scholar
  7. 7.
    Slaney, M., McRoberts, G.: A Recognition System for Affective Vocalization. Speech Communication 39, 367–384 (2003)MATHCrossRefGoogle Scholar
  8. 8.
    Ververidis, D., Kotropolos, C.: Automatic speech classification to five emotional states based on gender information. In: Proceedings of Eusipco 2004, pp. 341–344 (2004)Google Scholar
  9. 9.
    Hammal, Z., Bozkurt, B., Couvreur, L., Unay, D., Caplier, A., Dutoit, T.: Passive versus active: vocal classification system. In: Proceedings of Eusipco-2005 (2005)Google Scholar
  10. 10.
    Shami, M., Verhelst, W.: Automatic Classification of Emotions in Speech Using Multi-Corpora Approaches. In: Proc. of the second annual IEEE BENELUX/DSP Valley Signal Processing Symposium SPS-DARTS (2006)Google Scholar
  11. 11.
    Verhelst, W., Borger, M.: Intra-Speaker Transplantation of Speech Characteristics. In: An Application of Waveform Vocoding Techniques and DTW. Proceedings of Eurospeech 1991, Genova, pp. 1319–1322 (1991)Google Scholar
  12. 12.
    Van Coile, B., Van Tichelen, L., Vorstermans, A., Staessen, M.: Protran: A Prosody Transplantation Tool for Text-To-Speech Applications. In: Proceedings of the International Conference on Spoken Language Processing ICSLP 1994, Yokohama, pp. 423–426 (1994)Google Scholar
  13. 13.
    Moulines, E., Charpentier, F.: Pitch-Synchronous Waveform Processing Techniques for Text-to-Speech Synthesis Using Diphones. Speech Communication 9, 453–467 (1990)CrossRefGoogle Scholar
  14. 14.
    Verhelst, W.: On the Quality of Speech Produced by Impulse Driven Linear Systems. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing - ICASSP 1991, pp. 501–504 (1991)Google Scholar
  15. 15.
    Mattheyses, W.: Vlaamstalige tekst-naar-spraak systemen met PSOLA (Flemish text-to-speech systems with PSOLA, in Dutch). Master thesis, Vrije Universiteit Brussel (2006)Google Scholar
  16. 16.
    Mattheyses, W., Verhelst, W., Verhoeve, P.: Robust Pitch Marking for Prosodic Modification of Speech Using TD-PSOLA. In: Proceedings of the IEEE Benelux/DSP Valley Signal Processing Symposium, SPS-DARTS, pp. 43–46 (2006)Google Scholar
  17. 17.
    Conkie, A., Isard, I.: Optimal coupling of diphones. In: Proceedings of the 2nd ESCA/IEEE Workshop on Speech Synthesis - SSW2 (1994)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Selma Yilmazyildiz
    • 1
  • Wesley Mattheyses
    • 1
  • Yorgos Patsis
    • 1
  • Werner Verhelst
    • 1
  1. 1.dept. ETRO-DSSPVrije Universiteit BrusselBrusselsBelgium

Personalised recommendations