Validation of an Expressive Speech Corpus by Mapping Automatic Classification to Subjective Evaluation

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4507)


This paper presents the validation of the expressive content of an acted corpus produced to be used in speech synthesis. The use of acted speech can be rather lacking in authenticity and therefore its expressiveness validation is required. The goal is to obtain an automatic classifier able to prune the bad utterances –with wrong expressiveness–. Firstly, a subjective test has been conducted with almost ten percent of the corpus utterances. Secondly, objective techniques have been carried out by means of automatic identification of emotions using different algorithms applied to statistical features computed over the speech prosody. The relationship between both evaluations is achieved by an attribute selection process guided by a metric that measures the matching between the misclassified utterances by the users and the automatic process. The experiments show that this approach can be useful to provide a subset of utterances with poor or wrong expressive content.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Alías, F., Monzo, C., Socoró, J.C.: A pitch marks filtering algorithm based on restricted dynamic programming. In: InterSpeech 2006 -International Conference on Spoken Language Processing (ICSLP), pp. 1698–1701 (2006)Google Scholar
  2. 2.
    Campbell, N.W.: Databases of emotional speech. In: Proceedings of the ISCA Workshop on Speech and Emotion, pp. 34–38 (2000)Google Scholar
  3. 3.
    Cowie, R., Douglas-Cowie, E., Cox, C.: Beyond emotion archetypes: databases for emotion modelling using neural networks. Neural Networks 18, 371–388 (2005)CrossRefGoogle Scholar
  4. 4.
    Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., Taylor, J.G.: Emotion recognition in human computer interaction. IEEE Signal Processing 18(1), 33–80 (2001)CrossRefGoogle Scholar
  5. 5.
    Devillers, L., Vidrascu, L., Lamel, L.: Challenges in real-life emotion annotation and machine learning based detection. Neural Networks 18, 407–422 (2005)CrossRefGoogle Scholar
  6. 6.
    Douglas-Cowie, E., Campbell, N., Cowie, R., Roach, P.: Emotional speech: towards a new generation of databases. Speech Communication 40, 33–60 (2003)zbMATHCrossRefGoogle Scholar
  7. 7.
    François, H., Boëffard, O.: The greedy algorithm and its application to the construction of a continuous speech database. In: Proceedings of LREC, vol. 5, pp. 1420–1426 (2002)Google Scholar
  8. 8.
    Iriondo, I., Planet, S., Socoró, J., Alías, F.: Objective and subjective evaluation of an expressive speech corpus. In: ITRW on Non Linear Speech Processing (NOLISP), Paris, France (2007)Google Scholar
  9. 9.
    Montoya, N.: El papel de la voz en la publicidad audiovisual dirigida a los niños (in Spanish). Zer. Revista de estudios de comunicación, 161–177 (1998)Google Scholar
  10. 10.
    Navas, E., Hernáez, I., Luengo, I.: An Objective and Subjective Study of the Role of Semantics and Prosodic Features in Building Corpora for Emotional TTS. IEEE Trans. on Audio, Speech and Language Processing 14 (2006)Google Scholar
  11. 11.
    Oudeyer, P.-Y.: The production and recognition of emotions in speech: features and algorithms. International Journal of Human Computer Interaction (special issue on Affective Computing) 59(1-2), 157–183 (2003)Google Scholar
  12. 12.
    Schröder, M.: Speech and Emotion Research: An overview of research frameworks and a dimensional approach to emotional speech synthesis. PhD thesis, PHONUS 7, Research Report of the Institute of Phonetics, Saarland University (2004)Google Scholar
  13. 13.
    Schweitzer, A., Möbius, B.: On the structure of internal prosodic models. In: Proceedings of the 15th ICPhS, Barcelona, pp. 1301–1304 (2003)Google Scholar
  14. 14.
    Ververidis, D., Kotropoulos, C.: Emotional speech recognition: Resources, features, and methods. Speech Communication 48(9), 1162–1181 (2006)CrossRefGoogle Scholar
  15. 15.
    Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)zbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  1. 1.Department of Communications and Signal Theory, Enginyeria i Arquitectura La Salle, Ramon Llull University, Pg. Bonanova 8, 08022 BarcelonaSpain

Personalised recommendations