Skip to main content

Robust Emotion Recognition using Pitch Synchronous and Sub-syllabic Spectral Features

  • Chapter
  • First Online:
Robust Emotion Recognition using Spectral and Prosodic Features

Part of the book series: SpringerBriefs in Electrical and Computer Engineering ((BRIEFSSPEECHTECH))

Abstract

This chapter discusses the use of vocal tract information for recognizing the emotions. Linear prediction cepstral coefficients (LPCC) and mel frequency cepstral coefficients (MFCC) are used as the correlates of vocal tract information. In addition to LPCCs and MFCCs, formant related features are also explored in this work for recognizing emotions from speech. Extraction of the above mentioned spectral features is discussed in brief. Further extraction of these features from sub-syllabic regions such as consonants, vowels and consonant-vowel transition regions is discussed. Extraction of spectral features from pitch synchronous analysis is also discussed. Basic philosophy and use of Gaussian mixture models is discussed in this chapter for classifying the emotions. The emotion recognition performance obtained from different vocal tract features is compared. Proposed spectral features are evaluated on Indian and Berlin emotion databases. Performance of Gaussian mixture models in classifying the emotional utterances using vocal tract features is compared with neural network models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. D. Ververidis, C. Kotropoulos, Emotional speech recognition: Resources, features, and methods. SPC 48, 1162–1181 (2006)

    Google Scholar 

  2. D. Neiberg, K. Elenius, K. Laskowski, Emotion recognition in spontaneous speech using GMMs, in INTERSPEECH 2006—ICSLP, (Pittsburgh, Pennsylvania), pp. 809–812, 17–19 Sept 2006

    Google Scholar 

  3. D. Bitouk, R. Verma, A. Nenkova, Class-level spectral features for emotion recognition, Speech Commun. (2010) (in Press)

    Google Scholar 

  4. S.G. Koolagudi, S. Maity, V.A. Kumar, S. Chakrabarti, K.S. Rao, IITKGP-SESC: Speech Database for Emotion Analysis. Communications in Computer and Information Science, JIIT University, Noida, India, Springer. ISSN: 1865–0929 ed., 17–19 Aug 2009

    Google Scholar 

  5. F. Burkhardt, A. Paeschke, M. Rolfes, W. Sendlmeier, B. Weiss, A database of german emotional speech, in Interspeech, Lissabon, 2005

    Google Scholar 

  6. L.R. Rabiner, B.H. Juang, Fundamentals of Speech Recognition (Prentice-Hall, Englewood Cliffs, New Jersy, 1993)

    Google Scholar 

  7. J. Chen, Y.A. Huang, Q. Li, K.K. Paliwal, Recognition of noisy speech using dynamic spectral subband centroids. IEEE Signal Process. Lett. 11, 258–261 (2004)

    Article  Google Scholar 

  8. S.V. Gangashetty, C.C. Sekhar, B. Yegnanarayana, Detection of vowel on set points in continuous speech using auto-associative neural network models, in INTERSPEECH, IEEE, 2004

    Google Scholar 

  9. K.S. Rao, B. Yegnanarayana, Duration modification using glottal closure instants and vowel onset points. Speech Commun. 51, 1263–1269 (2009)

    Article  Google Scholar 

  10. S.R.M. Prasanna, B.V.S. Reddy, P. Krishnamoorthy, Vowel onset point detection using source, spectral peaks, and modulation spectrum energies. IEEE Trans. Audio Speech Lang. Process. 17, 556–565 (2009)

    Article  Google Scholar 

  11. Y. Zeng, H. Wu, R. Gao, Pitch synchronous analysis method and fisher criterion based speaker identification, in Third International Conference on Natural Computation, vol. 2 (IEEE Computer Society, Washington DC, USA, 2007), pp. 691–695. ISBN: 0-7695-2875-9

    Google Scholar 

  12. H. Muta, T. Baer, K. Wagatsuma, T. Muraoka, H. Fukuda, Pitch synchronous analysis of hoarseness in running speech. J. Acoust. Soc. Am. 84, 1292–1301 (1988)

    Article  Google Scholar 

  13. K. Murty, B. Yegnanarayana, Epoch extraction from speech signals. IEEE Trans. Audio Speech Lang. Process. 16, 1602–1613 (2008)

    Article  Google Scholar 

  14. B. Yegnanarayana, S.P. Kishore, AANN an alternative to GMM for pattern recognition. Neural Networks 15, 459–469 (2002)

    Article  Google Scholar 

  15. B. Yegnanarayana, Artificial Neural Networks (Prentice-Hall, New Delhi, India, 1999)

    Google Scholar 

  16. S. Haykin, Neural Networks: A Comprehensive Foundation (Pearson Education Aisa, Inc., New Delhi, India, 1999)

    Google Scholar 

  17. K.I. Diamantaras, S.Y. Kung, Principal Component Neural Networks: Theory and Applications (Wiley, New York, 1996)

    Google Scholar 

  18. M.S. Ikbal, H. Misra, B. Yegnanarayana, Analysis of autoassociative mapping neural networks, in Proceedings of the International Joint Conference on Neural Networks (IJCNN), (USA, 1999), pp. 854–858

    Google Scholar 

  19. S.P. Kishore, B. Yegnanarayana, Online text-independent speaker verification system using autoassociative neural network models, in Proceedings of the International Joint Conference on Neural Networks (IJCNN), vol. 2 (Washington, DC, USA, 2001), pp. 1548–1553

    Google Scholar 

  20. A.V.N.S. Anjani, Autoassociate neural network models for processing degraded speech, Master’s Thesis, MS Thesis, Department of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai 600 036, India, 2000

    Google Scholar 

  21. K.S. Reddy, Source and system features for speaker recognition, Master’s Thesis, MS Thesis, Department of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai 600 036, India, 2004

    Google Scholar 

  22. C.S. Gupta, Significance of source features for speaker recognition, Master’s Thesis, MS Thesis, Department of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai 600 036, India, 2003

    Google Scholar 

  23. S. Desai, A. W. Black, B.Yegnanarayana, K. Prahallad, Spectral mapping using artificial neural networks for voice conversion. IEEE Trans. Audio Speech Lang. Process. 18, 954–964 (2010)

    Google Scholar 

  24. K.S. Rao, B. Yegnanarayana, Intonation modeling for indian languages. Comput. Speech Lang. 23, 240–256 (2009)

    Article  Google Scholar 

  25. C.K. Mohan, B. Yegnanarayana, Classification of sport videos using edge-based features and autoassociative neural network models. Signal Image Video Process. 4, 61–73 (2008). doi: 10.1007/s11760-008-0097-9

  26. L. Mary, B. Yegnanarayana, Autoassociative neural network models for language identification, in International Conference on Intelligent Sensing and Information Processing, IEEE, pp. 317–320, 24 Aug 2004. doi: 10.1109/ICISIP.2004.1287674

  27. B. Yegnanarayana, K.S. Reddy, S.P. Kishore, Source and system features for speaker recognition using aann models, in IEEE International Conference on Acoustics, Speech, and Signal Processing, (Salt Lake City, UT), May 2001

    Google Scholar 

  28. C.S. Gupta, S.R.M. Prasanna, B. Yegnanarayana, Autoassociative neural network models for online speaker verification using source features from vowels, in International Joint Conference on Neural Networks, (Honululu, Hawii, USA), May 2002

    Google Scholar 

  29. B. Yegnanarayana, K.S. Reddy, S.P. Kishore, Source and system features for speaker recognition using AANN models, in Proceedings of the IEEE International Conference Acoustics, Speech, Signal Processing, (Salt Lake City, Utah, USA), pp. 409–412, May 2001

    Google Scholar 

  30. S.G. Koolagudi, K.S. Rao, Emotion recognition from speech using sub-syllabic and pitch synchronous spectral features. Int. J. Speech Technol. 15, 495–511 (2012). doi:10.1007/s10772-012-9150-8

    Article  Google Scholar 

  31. O.M. Mubarak, E. Ambikairajah, J. Epps, Analysis of an mfcc-based audio indexing system for efficient coding of multimedia sources, in The 8th International Symposium on Signal Processing and its Applications, (Sydney, Australia), 28–31 Aug 2005

    Google Scholar 

  32. R.O. Duda, P.E. Hart, D.G. Stork, Pattern Classification, 2nd edn. (A Wiley-interscience Publications, Singapore, 2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to K. Sreenivasa Rao .

Rights and permissions

Reprints and permissions

Copyright information

© 2013 The Author(s)

About this chapter

Cite this chapter

Rao, K.S., Koolagudi, S.G. (2013). Robust Emotion Recognition using Pitch Synchronous and Sub-syllabic Spectral Features. In: Robust Emotion Recognition using Spectral and Prosodic Features. SpringerBriefs in Electrical and Computer Engineering(). Springer, New York, NY. https://doi.org/10.1007/978-1-4614-6360-3_2

Download citation

  • DOI: https://doi.org/10.1007/978-1-4614-6360-3_2

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4614-6359-7

  • Online ISBN: 978-1-4614-6360-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics