Application of Expressive Speech in TTS System with Cepstral Description

Přibil, Jiří; Přibilová, Anna

doi:10.1007/978-3-540-70872-8_15

Jiří Přibil²³ &
Anna Přibilová²⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5042))

1008 Accesses
6 Citations

Abstract

Expressive speech synthesis representing different human emotions has been in the interests of researchers for a longer time. Recently, some experiments with storytelling speaking style have been performed. This particular speaking style is suitable for applications aimed at children as well as special applications aimed at blind people. Analyzing human storytellers’ speech, we designed a set of prosodic parameters prototypes for converting speech produced by the text-to-speech (TTS) system into storytelling speech. In addition to suprasegmental characteristics (pitch, intensity, and duration) included in these speech prototypes, also information about significant frequencies of spectral envelope and spectral flatness determining degree of voicing was used.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Přibilová, A., Přibil, J.: Non-linear Frequency Scale Mapping for Voice Conversion in Text-to-Speech System with Cepstral Description. Speech Communication 48, 1691–1703 (2006)
Article Google Scholar
Iida, A., Campbell, N., Higuchi, F., Yasumura, M.: A Corpus-Based Speech Synthesis System with Emotion. Speech Communication 40, 161–187 (2003)
Article MATH Google Scholar
Navas, E., Hernáez, I., Luengo, I.: An Objective and Subjective Study of the Role of Semantics and Prosodic Features in Building Corpora for Emotional TTS. IEEE Transactions on Audio, Speech, and Language Processing 14, 1117–1127 (2006)
Article Google Scholar
Tao, J., Kang, Y., Li, A.: Prosody Conversion from Neutral Speech to Emotional Speech. IEEE Transactions on Audio, Speech, and Language Processing 14, 1145–1154 (2006)
Article Google Scholar
Přibil, J., Přibilová, A.: Emotional Style Conversion in the TTS System with Cepstral Description. In: Esposito, A., Faundez-Zanuy, M., Keller, E., Marinaro, M. (eds.) COST Action 2102. LNCS (LNAI), vol. 4775, pp. 65–73. Springer, Heidelberg (2007)
Chapter Google Scholar
House, D., Bell, L., Gustafson, K., Johansson, L.: Child-Directed Speech Synthesis: Evaluation of Prosodic Variation for an Educational Computer Program. In: Proceedings of Eurospeech, Budapest, pp. 1843–1846 (1999)
Google Scholar
Theune, M., Meijs, K., Heylen, D., Ordelman, R.: Generating Expressive Speech for Storytelling Applications. IEEE Transactions on Audio, Speech, and Language Processing 14, 1137–1144 (2006)
Article Google Scholar
Přibil, J., Přibilová, A.: Voicing Transition Frequency Determination for Harmonic Speech Model. In: Proceedings of the 13th International Conference on Systems, Signals and Image Processing, Budapest, pp. 25–28 (2006)
Google Scholar
Vích, R.: Cepstral Speech Model, Padé Approximation, Excitation, and Gain Matching in Cepstral Speech Synthesis. In: Proceedings of the 15th Biennial International EURASIP Conference Biosignal, Brno, pp. 77–82 (2000)
Google Scholar
Gray, A.H., Markel, J.D.: A Spectral-Flatness Measure for Studying the Autocorrelation Method of Linear Prediction of Speech Analysis. IEEE Transactions on Acoustics, Speech, and Signal Processing ASSP-22, 207–217 (1974)
Article Google Scholar
Esposito, A., Stejskal, V., Smékal, Z., Bourbakis, N.: The Significance of Empty Speech Pauses: Cognitive and Algorithmic Issues. In: Proceedings of the 2nd International Symposium on Brain Vision and Artificial Intelligence, Naples, pp. 542–554 (2007)
Google Scholar
Ito, T., Takeda, K., Itakura, F.: Analysis and Recognition of Whispered Speech. Speech Communication 45, 139–152 (2005)
Article Google Scholar
Přibil, J., Madlová, A.: Two Synthesis Methods Based on Cepstral Parameterization. Radioengineering 11(2), 35–39 (2002)
Google Scholar
Unser, M.: Splines. A Perfect Fit for Signal and Image Processing. IEEE Signal Processing Magazine 16, 22–38 (1999)
Article Google Scholar
Akande, O.O., Murphy, P.J.: Estimation of the Vocal Tract Transfer Function with Application to Glottal Wave Analysis. Speech Communication 46, 15–36 (2005)
Article Google Scholar
Přibil, J., Přibilová, A.: Distributed Listening Test Program for Synthetic Speech Evaluation. In: Proceedings of the 34 Jahrestagung für Akustik DAGA 2008, Dresden (to be published, 2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Photonics and Electronics, Academy of Sciences CR, v.v.i., Chaberská 57, CZ-182 51, Prague 8, Czech Republic
Jiří Přibil
Faculty of Electrical Engineering & Information Technology, Dept. of Radio Electronics, Slovak University of Technology, Ilkovičova 3, SK-812 19, Bratislava, Slovakia
Anna Přibilová

Authors

Jiří Přibil
View author publications
You can also search for this author in PubMed Google Scholar
Anna Přibilová
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Psychology, Second University of Naples, and IIASS, Via Pellegrino 19, 84019, Vietri sul Mare (SA), Italy
Anna Esposito
ATRC Center, Wright State University, Dayton, OH, USA
Nikolaos G. Bourbakis
Human Computer Interaction Group, University of Patras, Rio Patras, Greece
Nikolaos Avouris
Department of Computer Engineering, University of Patras, Patras, Greece
Ioannis Hatzilygeroudis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Přibil, J., Přibilová, A. (2008). Application of Expressive Speech in TTS System with Cepstral Description. In: Esposito, A., Bourbakis, N.G., Avouris, N., Hatzilygeroudis, I. (eds) Verbal and Nonverbal Features of Human-Human and Human-Machine Interaction. Lecture Notes in Computer Science(), vol 5042. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70872-8_15

Download citation

DOI: https://doi.org/10.1007/978-3-540-70872-8_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-70871-1
Online ISBN: 978-3-540-70872-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics