Chapter

Computational and Ambient Intelligence

Volume 4507 of the series Lecture Notes in Computer Science pp 646-653

Validation of an Expressive Speech Corpus by Mapping Automatic Classification to Subjective Evaluation

  • Ignasi IriondoAffiliated withDepartment of Communications and Signal Theory, Enginyeria i Arquitectura La Salle, Ramon Llull University, Pg. Bonanova 8, 08022 Barcelona
  • , Santiago PlanetAffiliated withDepartment of Communications and Signal Theory, Enginyeria i Arquitectura La Salle, Ramon Llull University, Pg. Bonanova 8, 08022 Barcelona
  • , Francesc AlíasAffiliated withDepartment of Communications and Signal Theory, Enginyeria i Arquitectura La Salle, Ramon Llull University, Pg. Bonanova 8, 08022 Barcelona
  • , Joan-Claudi SocoróAffiliated withDepartment of Communications and Signal Theory, Enginyeria i Arquitectura La Salle, Ramon Llull University, Pg. Bonanova 8, 08022 Barcelona
  • , Elisa MartínezAffiliated withDepartment of Communications and Signal Theory, Enginyeria i Arquitectura La Salle, Ramon Llull University, Pg. Bonanova 8, 08022 Barcelona

* Final gross prices may vary according to local VAT.

Get Access

Abstract

This paper presents the validation of the expressive content of an acted corpus produced to be used in speech synthesis. The use of acted speech can be rather lacking in authenticity and therefore its expressiveness validation is required. The goal is to obtain an automatic classifier able to prune the bad utterances –with wrong expressiveness–. Firstly, a subjective test has been conducted with almost ten percent of the corpus utterances. Secondly, objective techniques have been carried out by means of automatic identification of emotions using different algorithms applied to statistical features computed over the speech prosody. The relationship between both evaluations is achieved by an attribute selection process guided by a metric that measures the matching between the misclassified utterances by the users and the automatic process. The experiments show that this approach can be useful to provide a subset of utterances with poor or wrong expressive content.