Skip to main content

Spectral Flatness Analysis for Emotional Speech Synthesis and Transformation

  • Conference paper
Cross-Modal Analysis of Speech, Gestures, Gaze and Facial Expressions

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5641))

Abstract

According to psychological research of emotional speech different emotions are accompanied by different spectral noise. We control its amount by spectral flatness according to which the high frequency noise is mixed in voiced frames during cepstral speech synthesis. Our experiments are aimed at statistical analysis of spectral flatness in three emotions (joy, sadness, anger), and a neutral state for comparison. Calculated histograms of spectral flatness distribution are visually compared and modelled by Gamma probability distribution. Obtained statistical parameters and emotional-to-neutral ratios of their mean values show good correlation for both male and female voices and all three emotions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Gray Jr., A.H., Markel, J.D.: A Spectral-Flatness Measure for Studying the Autocorrelation Method of Linear Prediction of Speech Analysis. IEEE Transactions on Acoustics, Speech, and Signal Processing ASSP-22, 207–217 (1974)

    Article  Google Scholar 

  2. Esposito, A., Stejskal, V., Smékal, Z., Bourbakis, N.: The Significance of Empty Speech Pauses: Cognitive and Algorithmic Issues. In: Proceedings of the 2nd International Symposium on Brain Vision and Artificial Intelligence, Naples, pp. 542–554 (2007)

    Google Scholar 

  3. Ito, T., Takeda, K., Itakura, F.: Analysis and Recognition of Whispered Speech. Speech Communication 45, 139–152 (2005)

    Article  Google Scholar 

  4. Přibil, J., Přibilová, A.: Voicing Transition Frequency Determination for Harmonic Speech Model. In: Proceedings of the 13th International Conference on Systems, Signals and Image Processing, Budapest, pp. 25–28 (2006)

    Google Scholar 

  5. Přibil, J., Madlová, A.: Two Synthesis Methods Based on Cepstral Parameterization. Radioengineering 11(2), 35–39 (2002)

    Google Scholar 

  6. Scherer, K.R.: Vocal Communication of Emotion: A Review of Research Paradigms. Speech Communication 40, 227–256 (2003)

    Article  MATH  Google Scholar 

  7. Vích, R.: Cepstral Speech Model, Padé Approximation, Excitation, and Gain Matching in Cepstral Speech Synthesis. In: Proceedings of the 15th Biennial International EURASIP Conference Biosignal, Brno, pp. 77–82 (2000)

    Google Scholar 

  8. Paeschke, A.: Global Trend of Fundamental Frequency in Emotional Speech. In: Proceedings of Speech Prosody, Nara, Japan, pp. 671–674 (2004)

    Google Scholar 

  9. Bulut, M., Lee, S., Narayanan, S.: A Statistical Approach for Modeling Prosody Features Using POS Tags for Emotional Speech Synthesis. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, Honolulu, Hawai, pp. 1237–1240 (2007)

    Google Scholar 

  10. Markel, J.D., Gray Jr., A.H.: Linear Prediction of Speech. Springer, Heidelberg (1976)

    Book  MATH  Google Scholar 

  11. Suhov, Y., Kelbert, M.: Probability and Statistics by Example. Basic Probability and Statistics, vol. I. Cambridge University Press, Cambridge (2005)

    Book  MATH  Google Scholar 

  12. Everitt, B.S.: The Cambridge Dictionary of Statistics, 3rd edn. Cambridge University Press, Cambridge (2006)

    MATH  Google Scholar 

  13. Boersma, P., Weenink, D.: Praat: Doing Phonetics by Computer (Version 5.0.32) [Computer Program], http://www.praat.org/ (retrieved August 12, 2008)

  14. Boersma, P., Weenink, D.: Praat - Tutorial, Intro 4. Pitch analysis (September 5, 2007), http://www.fon.hum.uva.nl/praat/manual/Intro_4__Pitch_analysis.html

  15. Přibil, J., Přibilová, A.: Application of Expressive Speech in TTS System with Cepstral Description. In: Esposito, A., Bourbakis, N.G., Avouris, N., Hatzilygeroudis, I. (eds.) HH and HM Interaction. LNCS (LNAI), vol. 5042, pp. 200–212. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  16. Přibilová, A., Přibil, J.: Spectrum Modification for Emotional Speech Synthesis. In: Esposito, A., et al. (eds.) Multimodal Signals: Cognitive and Algorithmic Issues. LNCS (LNAI), vol. 5398, pp. 232–241. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Přibil, J., Přibilová, A. (2009). Spectral Flatness Analysis for Emotional Speech Synthesis and Transformation. In: Esposito, A., Vích, R. (eds) Cross-Modal Analysis of Speech, Gestures, Gaze and Facial Expressions. Lecture Notes in Computer Science(), vol 5641. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03320-9_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-03320-9_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-03319-3

  • Online ISBN: 978-3-642-03320-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics