Language Resources and Evaluation

, Volume 44, Issue 4, pp 371–386 | Cite as

Remote-based text-to-speech modules’ evaluation framework: the RES framework



The ECESS consortium (European Center of Excellence in Speech Synthesis) aims to speed up progress in speech synthesis technology, by providing an appropriate evaluation framework. The key element of the evaluation framework is based on the partition of a text-to-speech synthesis system into distributed TTS modules. A text processing, prosody generation, and an acoustic synthesis module have been specified currently. A split into various modules has the advantage that the developers of an institution active in ECESS, can concentrate its efforts on a single module, and test its performance in a complete system using missing modules from the developers of other institutions. In this way, complete TTS systems can be built using high performance modules from different institutions. In order to evaluate the modules and to connect modules efficiently, a remote evaluation platform—the Remote Evaluation System (RES) based on the existing internet infrastructure—has been developed within ECESS. The RES is based on client–server architecture. It consists of RES module servers, which encapsulate the modules of the developers, a RES client, which sends data to and receives data from the RES module servers, and a RES server, which connects the RES module servers, and organizes the flow of information. RES can be used by developers for selecting RES module from the internet, which contains a missing TTS module needed to test and improve the performances of their own modules. Finally, the RES allows for the evaluation of TTS modules running at different institutions worldwide. When using the RES client, the institution performing the evaluation is able to set-up and performs various evaluation tasks by sending test data via the RES client and receiving results from the RES module servers. Currently ELDA is setting-up an evaluation using the RES client, which will then be extended to an evaluation client specializing in the envisaged evaluation tasks.


Remote text-to-speech synthesis evaluation Text-to-speech synthesis modules ECESS consortium 


  1. Bonafonte, A., Höge, H., Kiss, I., Moreno, A., Ziegenhain, U., Van den Heuvel, H., et al. (2006). TC-STAR: Specifications of language resources and evaluation for speech synthesis, Proceedings of LREC. Google Scholar
  2. Burke, D. (2007). Speech processing for IP networks/media resource control protocol (MRCP). West Sussex: Wiley.Google Scholar
  3. Copeland, T. (2007). Generating parsers with JavaCC. Alexandria: Centennial Books.Google Scholar
  4. Höge, H., Kacic, Z., Kotnik, B., Rojc, M., Moreau, N., & Hain, H.-U. (2008). Evaluation of modules and tools for speech synthesis—The ECESS framework. Proceedings of LREC.Google Scholar
  5. Perez, J., Bonafonte, A., Hain, H-U., Keller, E., Breuer, S. & Tian, J. (2006). ECESS inter-module interface specification for speech synthesis, Proceedings of LREC.Google Scholar
  6. Shalyto, A. A. (2001). Logic control and “reactive” systems: Algorithmization and programming. Automation and remote control, Vol. 62, No. 1, pp. 1–29. (Avtomatika i Telemekhanika, Trans. No. 1, pp. 3–39).Google Scholar
  7. Terrazas, A., Ostuni, J., & Barlow, M. (2002). Java media APIs: Cross-platform imaging, media and visualization. Sams publishing.Google Scholar
  8. Weyns, D., Boucke, N., Holvoet, T., & Demarsin, B. (2007). DynCNET: A protocol for flexible transport assignment in AGV transportation systems. Katholieke Universiteit Leuven, Report CW 478.Google Scholar

Copyright information

© Springer Science+Business Media B.V. 2009

Authors and Affiliations

  1. 1.Faculty of Electrical Engineering and Computer ScienceUniversity of MariborMariborSlovenia
  2. 2.IC 5Siemens AG, Corporate TechnologyMünchenGermany

Personalised recommendations