Validation of an Expressive Speech Corpus by Mapping Automatic Classification to Subjective Evaluation
This paper presents the validation of the expressive content of an acted corpus produced to be used in speech synthesis. The use of acted speech can be rather lacking in authenticity and therefore its expressiveness validation is required. The goal is to obtain an automatic classifier able to prune the bad utterances –with wrong expressiveness–. Firstly, a subjective test has been conducted with almost ten percent of the corpus utterances. Secondly, objective techniques have been carried out by means of automatic identification of emotions using different algorithms applied to statistical features computed over the speech prosody. The relationship between both evaluations is achieved by an attribute selection process guided by a metric that measures the matching between the misclassified utterances by the users and the automatic process. The experiments show that this approach can be useful to provide a subset of utterances with poor or wrong expressive content.
Unable to display preview. Download preview PDF.
- 1.Alías, F., Monzo, C., Socoró, J.C.: A pitch marks filtering algorithm based on restricted dynamic programming. In: InterSpeech 2006 -International Conference on Spoken Language Processing (ICSLP), pp. 1698–1701 (2006)Google Scholar
- 2.Campbell, N.W.: Databases of emotional speech. In: Proceedings of the ISCA Workshop on Speech and Emotion, pp. 34–38 (2000)Google Scholar
- 7.François, H., Boëffard, O.: The greedy algorithm and its application to the construction of a continuous speech database. In: Proceedings of LREC, vol. 5, pp. 1420–1426 (2002)Google Scholar
- 8.Iriondo, I., Planet, S., Socoró, J., Alías, F.: Objective and subjective evaluation of an expressive speech corpus. In: ITRW on Non Linear Speech Processing (NOLISP), Paris, France (2007)Google Scholar
- 9.Montoya, N.: El papel de la voz en la publicidad audiovisual dirigida a los niños (in Spanish). Zer. Revista de estudios de comunicación, 161–177 (1998)Google Scholar
- 10.Navas, E., Hernáez, I., Luengo, I.: An Objective and Subjective Study of the Role of Semantics and Prosodic Features in Building Corpora for Emotional TTS. IEEE Trans. on Audio, Speech and Language Processing 14 (2006)Google Scholar
- 11.Oudeyer, P.-Y.: The production and recognition of emotions in speech: features and algorithms. International Journal of Human Computer Interaction (special issue on Affective Computing) 59(1-2), 157–183 (2003)Google Scholar
- 12.Schröder, M.: Speech and Emotion Research: An overview of research frameworks and a dimensional approach to emotional speech synthesis. PhD thesis, PHONUS 7, Research Report of the Institute of Phonetics, Saarland University (2004)Google Scholar
- 13.Schweitzer, A., Möbius, B.: On the structure of internal prosodic models. In: Proceedings of the 15th ICPhS, Barcelona, pp. 1301–1304 (2003)Google Scholar