Skip to main content

Modelling F0 Dynamics in Unit Selection Based Speech Synthesis

  • Conference paper

Part of the Lecture Notes in Computer Science book series (LNAI,volume 8655)

Abstract

In the common unit selection implementations, F0 continuity is measured as one of concatenation cost features with the expectation that smooth units transition (regarding speech melody) is ensured when the difference of F0 is low enough. This measure generally uses a static F0 value computed at the units boundary. In the present paper we show, however, that the use of static F0 values is not enough for smooth speech units concatenation, and that a dynamic nature of the F0 contour must be taken into account. Two schemes of dynamic F0 handling are presented, and speech generated by both schemes is compared by means of listening tests on specially selected phrases which are known to carry unnatural artefacts. Advantages and disadvantages of the individual schemes are also discussed.

Keywords

  • text-to-speech synthesis
  • unit selection
  • concatenation cost
  • fundamental frequency F0

The research has been supported by the European Regional Development Fund (ERDF), project “New Technologies for Information Society” (NTIS), European Centre of Excellence, ED1.1.00/02.0090, and by the Technology Agency of the Czech Republic, project No. TA01011264.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-10816-2_55
  • Chapter length: 8 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   79.99
Price excludes VAT (USA)
  • ISBN: 978-3-319-10816-2
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   99.99
Price excludes VAT (USA)

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bellegarda, J.R.: A novel discontinuity metric for unit selection text-to-speech synthesis. In: Proc. of 5th Speech Synthesis Workshop (SSW5), Pittsburgh, PA, USA, pp. 133–138 (2004)

    Google Scholar 

  2. Conkie, A., Syrdal, A.K.: Using F0 to constrain the unit selection Viterbi network. In: Proc. of Acoustics, Speech, and Signal Processing ICASSP, pp. 5376–5379. IEEE (2011)

    Google Scholar 

  3. Hunt, A.J., Black, A.W.: Unit selection in a concatenative speech synthesis system using a large speech database. In: Proc. of Acoustics, Speech, and Signal Processing ICASSP 1996, vol. 1, pp. 373–376. IEEE (1996)

    Google Scholar 

  4. Klabbers, E., Veldhuis, R.N.J.: Reducing audible spectral discontinuities. IEEE Transactions on Speech and Audio Processing 9(1), 39–51 (2001), http://dblp.uni-trier.de/db/journals/taslp/taslp9.html#KlabbersV01

    CrossRef  Google Scholar 

  5. Legát, M., Matoušek, J.: Design of the test stimuli for the evaluation of concatenation cost functions. In: Matoušek, V., Mautner, P. (eds.) TSD 2009. LNCS, vol. 5729, pp. 339–346. Springer, Heidelberg (2009)

    CrossRef  Google Scholar 

  6. Legát, M., Matoušek, J.: Collection and analysis of data for evaluation of concatenation cost functions. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2010. LNCS, vol. 6231, pp. 345–352. Springer, Heidelberg (2010)

    CrossRef  Google Scholar 

  7. Legát, M., Matoušek, J., Tihelka, D.: On the detection of pitch marks using a robust multi-phase algorithm. Speech Communication, 552–566 (2011), http://www.kky.zcu.cz/en/publications/LegatM_2011_Onthedetectionof

  8. Legát, M., Matoušek, J.: Pitch contours as predictors of audible concatenation artifacts. In: Proc. of World Congress on Engineering and Computer Science 2011, San Francisco, USA, pp. 525–529 (2011)

    Google Scholar 

  9. Matoušek, J., Romportl, J.: Automatic pitch-synchronous phonetic segmentation. In: INTERSPEECH 2008, Proc. of 9th Annual Conference of International Speech Communication Association, Brisbane, Australia, pp. 1626–1629 (2008)

    Google Scholar 

  10. Matoušek, J., Tihelka, D., Psutka, J.V.: Experiments with automatic segmentation for Czech speech synthesis. In: Matoušek, V., Mautner, P. (eds.) TSD 2003. LNCS (LNAI), vol. 2807, pp. 287–294. Springer, Heidelberg (2003), http://dx.doi.org/10.1007/978-3-540-39398-6_41

    CrossRef  Google Scholar 

  11. Matoušek, J., Tihelka, D., Romportl, J.: Current state of Czech text-to-speech system ARTIC. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2006. LNCS (LNAI), vol. 4188, pp. 439–446. Springer, Heidelberg (2006), http://dx.doi.org/10.1007/11846406_55

    CrossRef  Google Scholar 

  12. Narendra, N.P., Rao, K.S.: Syllable specific unit selection cost functions for text-to-speech synthesis. ACM Transactions on Speech and Language Processing 9(3), 5:1–5:24 (2012), http://doi.acm.org/10.1145/2382434.2382435

  13. Pantazis, Y., Stylianou, Y.: On the detection of discontinuities in concatenative speech synthesis. In: Stylianou, Y., Faundez-Zanuy, M., Esposito, A. (eds.) COST 277. LNCS, vol. 4391, pp. 89–100. Springer, Heidelberg (2007), http://dx.doi.org/10.1007/978-3-540-71505-4_6

    CrossRef  Google Scholar 

  14. Přibil, J., Přibilová, A.: Evaluation of influence of spectral and prosodic features on GMM classification of Czech and Slovak emotional speech. EURASIP Journal on Audio, Speech, and Music Processing 33(3), 1–22 (2013), http://dx.doi.org/10.1186/1687-4722-2013-8

    Google Scholar 

  15. Stylianou, Y., Syrdal, A.K.: Perceptual and objective detection of discontinuities in concatenative speech synthesis. In: Proc. IEEE Acoustics, Speech, and Signal Processing (ICASSP), pp. 837–840 (2001)

    Google Scholar 

  16. Syrdal, A.K., Conkie, A.D.: Data-driven perceptually based join costs. In: Proc. of 5th Speech Synthesis Workshop (SSW5), Pittsburgh, PA, USA, pp. 49–54 (2004)

    Google Scholar 

  17. Tihelka, D., Grůber, M., Hanzlíček, Z.: Robust methodology for TTS enhancement evaluation. In: Habernal, I. (ed.) TSD 2013. LNCS, vol. 8082, pp. 442–449. Springer, Heidelberg (2013), http://dx.doi.org/10.1007/978-3-642-40585-3_56

    Google Scholar 

  18. Tihelka, D., Kala, J., Matoušek, J.: Enhancements of Viterbi search for fast unit selection synthesis. In: INTERSPEECH 2010, Proc. of 11th Annual Conference of the International Speech Communication Association, pp. 174–177 (2010), http://www.isca-speech.org/archive/interspeech_2010/i10_0174.html

  19. Tihelka, D., Stanislav, P.: ARTIC for assistive technologies: Transformation to resource-limited hardware. In: Proc. of World Congress on Engineering and Computer Science 2011, San Francisco, USA, pp. 581–584 (2011)

    Google Scholar 

  20. Vepa, J., King, S.: Kalman–filter based join cost for unit–selection speech synthesis. In: Proc. EUROSPEECH 2003 – INTERSPEECH 2003, Proc. of 8th European Conference on Speech Communication and Technology, pp. 293–296. ISCA (2003)

    Google Scholar 

  21. Vepa, J., King, S.: Join cost for unit selection speech synthesis. Ph.D. thesis, The University of Edinburgh, College of Science and Engineering, School of Informatics (2004), https://www.era.lib.ed.ac.uk/handle/1842/1452

  22. Young, S., Kershaw, D., Odell, J., Ollason, D., Valtchev, V., Woodland, P.: The HTK Book Version 3.4. Cambridge University Press (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Tihelka, D., Matoušek, J., Hanzlíček, Z. (2014). Modelling F0 Dynamics in Unit Selection Based Speech Synthesis. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2014. Lecture Notes in Computer Science(), vol 8655. Springer, Cham. https://doi.org/10.1007/978-3-319-10816-2_55

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-10816-2_55

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-10815-5

  • Online ISBN: 978-3-319-10816-2

  • eBook Packages: Computer ScienceComputer Science (R0)