Skip to main content
Log in

Emilia: a speech corpus for Argentine Spanish text to speech synthesis

  • Original Paper
  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

This paper introduces Emilia, a speech corpus created to build a female voice in Spanish spoken in Buenos Aires for the Aromo text-to-speech system. Aromo is a unit selection text-to-speech system, which employs diphones as units of synthesis. The key requirements and design criteria for Emilia were: to synthesize any text in Spanish into high-quality speech with a minimum corpus size. The text corpus was designed to guarantee the phonetic and prosodic coverage. A three-stage strategy was used: in the first stage, 741 sentences were designed with all of the syllables of Spanish spoken in Argentina, with and without stress, and in all positions within the word; in the second stage, 852 sentences were added to balance out the distribution of the diphones; and after a perceptual evaluation of the quality of synthesized speech, in the third and final stage, 625 sentences were added to achieve the specified unit coverage, and to introduce sentences with more complex syntactic and prosodic structures. Issues from all three corpus building stages are reported. The paper also presents the results from the quality perceptual evaluations of the synthesized voice. Emilia has a duration of three hours and 15 minutes; its speech quality synthesized with Aromo system is similar to the level obtained with commercial systems, with a real-time ratio less than one.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. http://htk.eng.cam.ac.uk.

  2. http://www.ilc.cnr.it/EAGLES96/home.html.

  3. http://nlp.lsi.upc.edu/freeling/doc/tagsets/tagset-es.html.

  4. http://www.speech.kth.se/software/.

  5. http://www.fon.hum.uva.nl/praat/.

  6. http://www.mathworks.com/.

  7. These resources are available, in full, partial or demonstrations, for academic or commercial purpose(s), by e-mail to the authors.

References

Download references

Acknowledgements

The authors would like to thank the anonymous reviewers for their insightful feedback. This research was supported by Ministerio de Ciencia y Tecnología and Consejo Nacional de Investigaciones Científicas y Técnicas, Argentina.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Humberto M. Torres.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Torres, H.M., Gurlekian, J.A., Evin, D.A. et al. Emilia: a speech corpus for Argentine Spanish text to speech synthesis. Lang Resources & Evaluation 53, 419–447 (2019). https://doi.org/10.1007/s10579-019-09447-7

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-019-09447-7

Keywords

Navigation