Designing High-Coverage Multi-level Text Corpus for Non-professional-voice Conservation

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9811)


The paper focuses on building a text corpus suitable for the conservation of the voices of non-professional speakers, who are loosing their voices due to serious healthy problems. Since we do not know in advance, how many sentences a speaker will be able to record, we propose a multi-level greedy algorithm which can ensure the coverage of selected texts by various phonetic and prosodic units. The comparison of such coverage is presented for various corpus sizes, and compared to the generic TTS corpus recorded by a healthy professional speaker.


Voice conservation Voice banking Speech synthesis Text corpus Phone Diphone Greedy algorithm 


  1. 1.
    Baumann, T., Schlangen, D.: Evaluating prosodic processing for incremental speech synthesis. In: INTERSPEECH, pp. 438–441, Portland, OR, USA (2012)Google Scholar
  2. 2.
    Erro, D., Hernaez, I., Alonso, A., García-Lorenzo, D., Navas, E., Ye, J., Arzelus, H., Jauk, I., Hy, N.Q., Magariňos, C., Pérez-Ramón, R., Sulír, M., Tian, X., Wang, X., Vitoria, B.: Personalized synthetic voices for speaking impaired: website and app. In: INTERSPEECH, pp. 1251–1254, Dresden, Germany (2015)Google Scholar
  3. 3.
  4. 4.
    Hanzlíček, Z., Matoušek, J., Tihelka, D.: Experiments on reducing footprint of unit selection TTS system. In: Habernal, I. (ed.) TSD 2013. LNCS, vol. 8082, pp. 249–256. Springer, Heidelberg (2013)Google Scholar
  5. 5.
    Hanzlíček, Z., Romportl, J., Matoušek, J.: Voice conservation: towards creating a speech-aid system for total Laryngectomees. In: Kelemen, J., Romportl, J., Zackova, E. (eds.) Beyond Artificial Intelligence. TIEI, vol. 4, pp. 205–214. Springer, Heidelberg (2013)Google Scholar
  6. 6.
    Jůzová, M., Romportl, J., Tihelka, D.: Speech corpus preparation for voice banking of Laryngectomised patients. In: Král, P., Matoušek, V. (eds.) TSD 2015. LNCS, vol. 9302, pp. 282–290. Springer, Heidelberg (2015)CrossRefGoogle Scholar
  7. 7.
    Matoušek, J., Psutka, J., Krůta, J.: Design of speech corpus for text-to-speech synthesis. In: Eurospeech 2001 - Interspeech, Proceedings of the 7th European Conference on Speech Communication and Technology, pp. 2047–2050, Aalborg, Denmark (2001)Google Scholar
  8. 8.
    Matoušek, J., Romportl, J.: On Building phonetically and Prosodically rich speech corpus for text-to-speech synthesis. In: Proceedings of the 2nd IASTED International Conference on Computational Intelligence, pp. 442–447. ACTA Press, San Francisco (2006)Google Scholar
  9. 9.
    Matoušek, J., Tihelka, D., Romportl, J.: Current state of Czech text-to-speech system ARTIC. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2006. LNCS (LNAI), vol. 4188, pp. 439–446. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  10. 10.
    Merritt, T., Clark, R.A.J., Wu, Z., Yamagishi, J., King, S.: Deep neural network-guided unit selection synthesis. In: Proceedings of ICASSP (2016)Google Scholar
  11. 11.
    Nakamura, K., Toda, T., Saruwatari, H., Shikano, K.: The use of air-pressure sensor in electrolaryngeal speech enhancement based on statistical voice conversion. In: INTERSPEECH, pp. 1628–1631, Makuhari, Japan (2010)Google Scholar
  12. 12.
    Romportl, J., Matoušek, J.: Formal prosodic structures and their application in NLP. In: Matoušek, V., Mautner, P., Pavelka, T. (eds.) TSD 2005. LNCS (LNAI), vol. 3658, pp. 371–378. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  13. 13.
    Romportl, J., Řepová, B., Betka, J.: Vocal rehabilitation of Laryngectomised patients by personalised computer speech synthesis. In: Phoniatrics. European Manual of Medicine. Springer, Heidelberg (2015) (in press)Google Scholar
  14. 14.
    Tihelka, D., Matoušek, J.: Unit selection and its relation to symbolic prosody: a new approach. In: INTERSPEECH 2006 – ICSLP, Proceedings of 9th ICSLP, vol. 1, pp. 2042–2045. ISCA, Bonn (2006)Google Scholar
  15. 15.
    Yamagishi, J., Veaux, C., King, S., Renals, S.: Speech synthesis technologies for individuals with vocal disabilities: voice banking and reconstruction. Acoust. Sci. Technol. 33, 1–5 (2012)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Faculty of Applied Sciences, Department of CyberneticsUniversity of West BohemiaPilsenCzech Republic
  2. 2.NTIS-New Technologies for the Information Society, Faculty of Applied SciencesUniversity of West BohemiaPilsenCzech Republic

Personalised recommendations