Abstract
Spectral and prosodic modifications for emotional speech synthesis using harmonic modelling are described. Autoregressive parameterization of inverse Fourier transformed log spectral envelope is used. Spectral flatness determines the voicing transition frequency dividing spectrum of synthesized speech into minimum phases and random phases of the harmonic model. Female emotional voice conversion is evaluated by a listening test.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Navas, E., Hernáez, I., Luengo, I.: An Objective and Subjective Study of the Role of Semantics and Prosodic Features in Building Corpora for Emotional TTS. IEEE Transactions on Audio, Speech, and Language Processing 14, 1117–1127 (2006)
Tao, J., Kang, Y., Li, A.: Prosody Conversion from Neutral Speech to Emotional Speech. IEEE Transactions on Audio, Speech, and Language Processing 14, 1145–1154 (2006)
Ververidis, D., Kotropoulos, C.: Emotional Speech Recognition: Resources, Features, and Methods. Speech Communication 48, 1162–1181 (2006)
Tóth, S.L., Sztahó, D., Vicsi, K.: Speech Emotion Perception by Human and Machine. In: Esposito, A., Bourbakis, N.G., Avouris, N., Hatzilygeroudis, I. (eds.) HH and HM Interaction. LNCS (LNAI), vol. 5042, pp. 213–224. Springer, Heidelberg (2008)
Zainkó, C., Fék, M., Németh, G.: Expressive Speech Synthesis Using Emotion-Specific Speech Inventories. In: Esposito, A., Bourbakis, N.G., Avouris, N., Hatzilygeroudis, I. (eds.) HH and HM Interaction. LNCS (LNAI), vol. 5042, pp. 225–234. Springer, Heidelberg (2008)
Kostoulas, T., Ganchev, T., Fakotakis, N.: Study on Speaker-Independent Emotion Recognition from Speech on Real-World Data. In: Esposito, A., Bourbakis, N.G., Avouris, N., Hatzilygeroudis, I. (eds.) HH and HM Interaction. LNCS (LNAI), vol. 5042, pp. 235–242. Springer, Heidelberg (2008)
Ringeval, F., Chetouani, M.: Exploiting a Vowel Based Approach for Acted Emotion Recognition. In: Esposito, A., Bourbakis, N.G., Avouris, N., Hatzilygeroudis, I. (eds.) HH and HM Interaction. LNCS (LNAI), vol. 5042, pp. 243–254. Springer, Heidelberg (2008)
Callejas, Z., López-Cózar, R.: Influence of Contextual Information in Emotion Annotation for Spoken Dialogue Systems. Speech Communication 50, 416–433 (2008)
McAulay, R.J., Quatieri, T.F.: Low-Rate Speech Coding Based on the Sinusoidal Model. In: Furui, S., Sondhi, M.M. (eds.) Advances in Speech Signal Processing, pp. 165–208. Marcel Dekker, New York (1992)
McAulay, R.J., Quatieri, T.F.: Sinusoidal Coding. In: Kleijn, W.B., Paliwal, K.K. (eds.) Speech Coding and Synthesis, pp. 121–173. Elsevier Science, Amsterdam (1995)
Dutoit, T., Gosselin, B.: On the Use of a Hybrid Harmonic/Stochastic Model for TTS Synthesis-by-Concatenation. Speech Communication 19, 119–143 (1996)
Bailly, G.: Accurate Estimation of Sinusoidal Parameters in a Harmonic+Noise Model for Speech Synthesis. In: Eurospeech 1999, Budapest, pp. 1051–1054 (1999)
Stylianou, Y.: Applying the Harmonic Plus Noise Model in Concatenative Speech Synthesis. IEEE Transactions on Speech and Audio Processing 9, 21–29 (2001)
Yegnanarayana, B., d’Alessandro, C., Darsinos, V.: An Iterative Algorithm for Decomposition of Speech Signals into Periodic and Aperiodic Components. IEEE Transactions on Speech and Audio Processing 6, 1–11 (1998)
Drioli, C., Tisato, G., Cosi, P., Tesser, F.: Emotions and Voice Quality: Experiments with Sinusoidal Modeling. In: Proceedings of Voice Quality, Geneva, pp. 127–132 (2003)
Ramamohan, S., Dandapat, S.: Sinusoidal Model-Based Analysis and Classification of Stressed Speech. IEEE Transactions on Audio, Speech, and Language Processing 14, 737–746 (2006)
Gray, A.H., Markel, J.D.: A Spectral-Flatness Measure for Studying the Autocorrelation Method of Linear Prediction of Speech Analysis. IEEE Transactions on Acoustics, Speech, and Signal Processing ASSP 22, 207–217 (1974)
Vích, R., Vondra, M.: Speech Spectrum Envelope Modeling. In: Esposito, A., Faundez-Zanuy, M., Keller, E., Marinaro, M. (eds.) COST Action 2102. LNCS (LNAI), vol. 4775, pp. 129–137. Springer, Heidelberg (2007)
Unser, M.: Splines. A Perfect Fit for Signal and Image Processing. IEEE Signal Processing Magazine 16, 22–38 (1999)
Scherer, K.R.: Vocal Communication of Emotion: A Review of Research Paradigms. Speech Communication 40, 227–256 (2003)
Fant, G.: Acoustical Analysis of Speech. In: Crocker, M.J. (ed.) Encyclopedia of Acoustics, pp. 1589–1598. John Wiley & Sons, Chichester (1997)
Fant, G.: Speech Acoustics and Phonetics. Kluwer Academic Publishers, Dordrecht (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Přibilová, A., Přibil, J. (2009). Harmonic Model for Female Voice Emotional Synthesis. In: Fierrez, J., Ortega-Garcia, J., Esposito, A., Drygajlo, A., Faundez-Zanuy, M. (eds) Biometric ID Management and Multimodal Communication. BioID 2009. Lecture Notes in Computer Science, vol 5707. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04391-8_6
Download citation
DOI: https://doi.org/10.1007/978-3-642-04391-8_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04390-1
Online ISBN: 978-3-642-04391-8
eBook Packages: Computer ScienceComputer Science (R0)