Emotion Recognition Using Prosodic Information

  • Sreenivasa Rao Krothapalli
  • Shashidhar G. Koolagudi
Chapter
Part of the SpringerBriefs in Electrical and Computer Engineering book series (BRIEFSELECTRIC)

Abstract

In this chapter prosodic features are investigated for in view of discriminating the emotions from speech. The motivation for exploring the prosodic features to recognize the emotions is illustrated using the gross statistics and time varying patterns of prosodic parameters. Global prosodic features representing the gross statistics of prosody and local prosodic features representing the finer variations in prosody are introduced in this chapter for discriminating the emotions. Extraction procedures of global and local prosodic features are briefly discussed. In this study, support vector machines are used for capturing the emotion-specific information from the proposed global and local prosodic features. Performance of the developed emotion recognition systems are analyzed with respect to individual components of prosody and their combinations.

Keywords

Padding 

References

  1. 1.
    D. Ververidis and C. Kotropoulos, “A state of the art review on emotional speech databases,” in Eleventh Australasian International Conference on Speech Science and Technology, (Auckland, New Zealand), Dec. 2006.Google Scholar
  2. 2.
    S. G. Koolagudi, N. Kumar, and K. S. Rao, “Speech emotion recognition using segmental level prosodic analysis,” in International Conference on Devices and Communication, (Mesra, India), Birla Institute of Technology, IEEE Press, Feb. 2011.Google Scholar
  3. 3.
    M.Schubiger, English intonation: its form and function. Tubingen, Germany: Niemeyer, 1958.Google Scholar
  4. 4.
    J. Connor and G.Arnold, Intonation of Colloquial English. London, UK: Longman, second ed., 1973.Google Scholar
  5. 5.
    M. E. Ayadi, M. S.Kamel, and F. Karray, “Survey on speech emotion recognition: Features,classification schemes, and databases,” Pattern Recognition, vol. 44, pp. 572–587, 2011.MATHCrossRefGoogle Scholar
  6. 6.
    P. Ekman, Handbook of Cognition and Emotion, ch. Basic Emotions. Sussex, UK: John Wiley and Sons Ltd, 1999.Google Scholar
  7. 7.
    R.Cowie, E.Douglas-Cowie, N.Tsapatsoulis, S.Kollias, W.Fellenz, and J.Taylor, “Emotion recognition in human-computer interaction,” IEEE Signal Processing Magazine, vol. 18, pp. 32–80, 2001.CrossRefGoogle Scholar
  8. 8.
    J. William, “What is an emotion?,” Mind, vol. 9, p. 188–205, 1984.Google Scholar
  9. 9.
    A. D. Craig, Handbook of Emotion, ch. Interoception and emotion: A neuroanatomical perspective. New York: The Guildford Press, September 2009. ISBN 978-1-59385-650-2.Google Scholar
  10. 10.
    C. E. Williams and K. N. Stevens, “Vocal correlates of emotional states,” Speech Evaluation in Psychiatry, p. 189–220., 1981. Grune and Stratton Inc.Google Scholar
  11. 11.
    J.Cahn, “The generation of affect in synthesized speech,” Journal of American Voice Input/Output Society, vol. 8, pp. 1–19, 1990.Google Scholar
  12. 12.
    G. M. David, “Theories of emotion,” Psychology, vol. 7, 2004. New York, worth publishers.Google Scholar
  13. 13.
    X. Jin and Z. Wang, “An emotion space model for recognition of emotions in spoken chinese,” in ACII (J. Tao, T. Tan, and R. Picard, eds.), pp. 397–402, LNCS 3784, Springer-Verlag Berlin Heidelberg, 2005.Google Scholar
  14. 14.
    J. Makhoul, “Linear prediction: A tutorial review,” Proceedings of the IEEE, vol. 63, no. 4, pp. 561–580, 1975.CrossRefGoogle Scholar
  15. 15.
    L. R. Rabiner and B. H. Juang, Fundamentals of Speech Recognition. Englewood Cliffs, New Jersy: Prentice-Hall, 1993.Google Scholar
  16. 16.
    J. Benesty, M. M. Sondhi, and Y. Huang, eds., Springer Handbook on Speech Processing. Springer Publishers, 2008.Google Scholar
  17. 17.
    S. G. Koolagudi and K. S. Rao, “Emotion recognition from speech using source, system and prosodic features,” International Journal of Speech Technology, Springer, vol. 15, no. 3, pp. 265–289, 2012.CrossRefGoogle Scholar
  18. 18.
    M. Schroder, R. Cowie, E. Douglas-Cowie, M. Westerdijk, and S. Gielen, “Acoustic correlates of emotion dimensions in view of speech synthesis,” (Aalborg, Denmark), EUROSPEECH 2001 Scandinavia, 2nd INTERSPEECH Event, September 3–7 2001. 7th European Conference on Speech Communication and Technology.Google Scholar
  19. 19.
    C.Williams and K.Stevens, “Emotionsandspeech:someacousticalcorrelates,” Journal of Acoustic Society of America, vol. 52, no. 4 pt 2, pp. 1238–1250, 1972.CrossRefGoogle Scholar
  20. 20.
    A. Batliner, J. Buckow, H. Niemann, E. Nöth, and VolkerWarnke, Verbmobile Foundations of speech to speech translation. ISBN 3540677836, 9783540677833: springer, 2000.Google Scholar
  21. 21.
    D. Ververidis and C. Kotropoulos, “Emotional speech recognition: Resources, features, and methods,” SPC, vol. 48, p. 1162–1181, 2006.Google Scholar
  22. 22.
    F. Burkhardt, A. Paeschke, M. Rolfes, W. Sendlmeier, and B. Weiss, “A database of german emotional speech,” in Interspeech, 2005.Google Scholar
  23. 23.
    S. G. Koolagudi, S. Maity, V. A. Kumar, S. Chakrabarti, and K. S. Rao, IITKGP-SESC : Speech Database for Emotion Analysis. Communications in Computer and Information Science, JIIT University, Noida, India: Springer, issn: 1865-0929 ed., August 17–19 2009.Google Scholar
  24. 24.
    E. McMahon, R. Cowie, S. Kasderidis, J. Taylor, and S. Kollias, “What chance that a dc could recognize hazardous mental states from sensor inputs?,” in Tales of the disappearing computer, (Santorini , Greece), 2003.Google Scholar
  25. 25.
    C. M. Lee and S. S. Narayanan, “Toward detecting emotions in spoken dialogs,” IEEE Trans. Speech and Audio Processing, vol. 13, pp. 293–303, March 2005.CrossRefGoogle Scholar
  26. 26.
    B. Schuller, G. Rigoll, and M. Lang, “Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture,” in IEEE International Conference on Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP ’04), (ISBN: 0-7803-8484-9), pp. I– 577–80, IEEE Press, May 17–21 2004.Google Scholar
  27. 27.
    F. Dellert, T. Polzin, and A. Waibel, “Recognizing emotion in speech,” (Philadelphia, PA, USA), pp. 1970–1973, 4th International Conference on Spoken Language Processing, October 3–6 1996.Google Scholar
  28. 28.
    R. Nakatsu, J. Nicholson, and N. Tosa, “Emotion recognition and its application to computer agents with spontaneous interactive capabilities,” Knowledge-Based Systems, vol. 13, pp. 497–504, December 2000.CrossRefGoogle Scholar
  29. 29.
    F. Charles, D. Pizzi, M. Cavazza, T. Vogt, and E. André, “Emoemma: Emotional speech input for interactive storytelling,” in 8th Int. Conf. on Autonomous Agents and Multiagent Systems (AAMAS 2009) (Decker, Sichman, Sierra, and Castelfranchi, eds.), (Budapest, Hungary), pp. 1381–1382, International Foundation for Autonomous Agents and Multi-agent Systems, May, 10–15 2009.Google Scholar
  30. 30.
    T.V.Sagar, “Characterisation and synthesis of emotionsin speech using prosodic features,” Master’s thesis, Dept. of Electronics and communications Engineering, Indian Institute of Technology Guwahati, May. 2007.Google Scholar
  31. 31.
    D.J.France, R.G.Shiavi, S.Silverman, M.Silverman, and M.Wilkes, “Acoustical properties of speech as indicators of depression and suicidal risk,” IEEE Transactions on Biomedical Eng, vol. 47, no. 7, pp. 829–837, 2000.CrossRefGoogle Scholar
  32. 32.
    P.-Y. Oudeyer, “The production and recognition of emotions in speech: features and algorithms,” International Journal of Human Computer Studies, vol. 59, p. 157–183, 2003.CrossRefGoogle Scholar
  33. 33.
    J.Hansen and D.Cairns, “Icarus: source generator based real-time recognition of speech in noisy stressful and lombard effect environments,” Speech Communication, vol. 16, no. 4, pp. 391–422, 1995.CrossRefGoogle Scholar
  34. 34.
    M. Schroder and R. Cowie, “Issues in emotion-oriented computing – toward a shared understanding,” in Workshop on Emotion and Computing, 2006. HUMAINE.Google Scholar
  35. 35.
    S. G. Koolagudi and K. S. Rao, “Real life emotion classification using vop and pitch based spectral features,” in INDICON-2010, (KOLKATA-700032, INDIA), Jadavpur University, December 2010.Google Scholar
  36. 36.
    H. Wakita, “Residual energy of linear prediction to vowel and speaker recognition,” IEEE Trans. Acoust. Speech Signal Process, vol. 24, pp. 270–271, 1976.CrossRefGoogle Scholar
  37. 37.
    K. S. Rao, S. R. M. Prasanna, and B. Yegnanarayana, “Determination of instants of significant excitation in speech using hilbert envelope and group delay function,” IEEE Signal Processing Letters, vol. 14, pp. 762–765, October 2007.CrossRefGoogle Scholar
  38. 38.
    A. Bajpai and B. Yegnanarayana, “Exploring features for audio clip classification using lp residual and aann models,” (Chennai, India), pp. 305–310, The international Conference on Intelligent Sensing and Information Processing 2004 (ICISIP 2004), January, 4–7 2004.Google Scholar
  39. 39.
    B. Yegnanarayana, R. K. Swamy, and K.S.R.Murty, “Determining mixing parameters from multispeaker data using speech-specific information,” IEEE Trans. Audio, Speech, and Language Processing, vol. 17, no. 6, pp. 1196–1207, 2009. ISSN 1558–7916.Google Scholar
  40. 40.
    G. Seshadri and B. Yegnanarayana, “Perceived loudness of speech based on the characteristics of glottal excitation source,” Journal of Acoustic Society of America, vol. 126, p. 2061–2071, October 2009.CrossRefGoogle Scholar
  41. 41.
    K. E. Cummings and M. A. Clements, “Analysis of the glottal excitation of emotionally styled and stressed speech,” Journal of Acoustic Society of America, vol. 98, pp. 88–98, July 1995.CrossRefGoogle Scholar
  42. 42.
    L. Z. Hua and H. Y. andf Wang Ren Hua, “A novel source analysis method by matching spectral characters of lf model with straight spectrum.” Springer-Verlag, Berlin, Heidelberg, 2005. 441–448.Google Scholar
  43. 43.
    D. O’Shaughnessy, Speech Communication Human and Mechine. Addison-Wesley publishing company, 1987.Google Scholar
  44. 44.
    M. Schröder, “Emotional speech synthesis: A review,” in 7th European Conference on Speech Communication and Technology, (Aalborg, Denmark), EUROSPEECH 2001 Scandinavia, September 3–7 2001.Google Scholar
  45. 45.
    S. G. Koolagudi and K. S. Rao, “Emotion recognition from speech : A review,” International Journal of Speech Technology, Springer.Google Scholar
  46. 46.
    E. Douglas-Cowie, N. Campbell, R. Cowie, and P. Roach, “Emotional speech: Towards a new generation of databases,” SPC, vol. 40, p. 33–60, 2003.MATHGoogle Scholar
  47. 47.
    The 15th Oriental COCOSDA Conference, December 9–12, 2012, Macau, China. (http://www.ococosda2012.org/)
  48. 48.
    D. C. Ambrus, “Collecting and recording of an emotional speech database,” tech. rep., Faculty of Electrical Engineering, Institute of Electronics, Univ. of Maribor, 2000.Google Scholar
  49. 49.
    M. Alpert, E. R. Pouget, and R. R. Silva, “Reflections of depression in acoustic measures of the patient’s speech,” Journal of Affect Disord., vol. 66, pp. 59–69, September 2001.CrossRefGoogle Scholar
  50. 50.
    A. Batliner, C. Hacker, S. Steidl, E. Noth, D. S. Archy, M. Russell, and M. Wong, “You stupid tin box – children interacting with the aibo robot: a cross-linguistic emotional speech corpus.,” in Proc. Language Resources and Evaluation (LREC ’04), (Lisbon), 2004.Google Scholar
  51. 51.
    R. Cowie and E. Douglas-Cowie, “Automatic statistical analysis of the signal and prosodic signs of emotion in speech,” in Fourth International Conference on Spoken Language Processing (ICSLP ’96),, (Philadelphia, PA, USA), pp. 1989–1992, October 1996.Google Scholar
  52. 52.
    R. Cowie and R. R. Cornelius, “Describing the emotional states that are expressed in speech,” Speech Communication, vol. 40, pp. 5–32, Apr. 2003.MATHCrossRefGoogle Scholar
  53. 53.
    M. Edgington, “Investigating the limitations of concatenative synthesis,” in European Conference on Speech Communication and Technology (Eurospeech ’97),, (Rhodes/Athens, Greece), pp. 593–596, 1997.Google Scholar
  54. 54.
    G. M. Gonzalez, “Bilingual computer-assisted psychological assessment: an innovative approach for screening depression in chicanos/latinos,” tech. report-39, Univ. Michigan, 1999.Google Scholar
  55. 55.
    C. Pereira, “Dimensions of emotional meaning in speech,” in Proc. ISCA Workshop on Speech and Emotion, (Belfast, Northern Ireland), pp. 25–28, 2000.Google Scholar
  56. 56.
    T. Polzin and A. Waibel, “Emotion sensitive human computer interfaces,” in ISCA Workshop on Speech and Emotion, Belfast, pp. 201–206, 2000.Google Scholar
  57. 57.
    M. Rahurkar and J. H. L. Hansen, “Frequency band analysis for stress detection using a teager energy operator based feature,” in Proc. international conf. on spoken language processing(ICSLP’02), pp. Vol.3, 2021–2024, 2002.Google Scholar
  58. 58.
    K. R. Scherer, D. Grandjean, L. T. Johnstone, and T. B. G. Klasmeyer, “Acoustic correlates of task load and stress,” in International Conference on Spoken Language Processing (ICSLP ’02), (Colorado), pp. 2017–2020, 2002.Google Scholar
  59. 59.
    M. Slaney and G. McRoberts, “Babyears: A recognition system for affective vocalizations,” Speech Communication, vol. 39, p. 367–384, February 2003.MATHCrossRefGoogle Scholar
  60. 60.
    S. Yildirim, M. Bulut, C. M. Lee, A. Kazemzadeh, C. Busso, Z. Deng., S. Lee, and S. Narayanan, “An acoustic study of emotions expressed in speech,” (Jeju island, Korean), International Conference on Spoken Language Processing (ICSLP 2004), October 2004.Google Scholar
  61. 61.
    F. Burkhardt and W. F. Sendlmeier, “Verification of acousical correlates of emotional speech using formant-synthesis,” (Newcastle, Northern Ireland, UK), pp. 151–156, ITRW on Speech and Emotion, September 5–7 2000.Google Scholar
  62. 62.
    A. Batliner, S. Biersacky, and S. Steidl, “The prosody of pet robot directed speech: Evidence from children,” in Speech Prosody 2006, (Dresden), pp. 1–4, 2006.Google Scholar
  63. 63.
    M. Schroder and M. Grice, “Expressing vocal effort in concatenative synthesis,” in International Conference on Phonetic Sciences (ICPhS ’03), (Barcelona), 2003.Google Scholar
  64. 64.
    M. Schroder, “Experimental study of affect bursts,” Speech Communication - Special issue on speech and emotion, vol. 40, no. 1–2, 2003.Google Scholar
  65. 65.
    M. Grimm, K. Kroschel, and S. Narayanan, “The vera am mittag german audio-visual emotional speech database,” in IEEE International Conference Multimedia and Expo, (Hannover), pp. 865–868, April 2008. DOI: 10.1109/ICME.2008.4607572.Google Scholar
  66. 66.
    C. H. Wu, Z. J. Chuang, and Y. C. Lin, “Emotion recognition from text using semantic labels and separable mixture models,” ACM Transactions on Asian Language Information Processing (TALIP) TALIP, vol. 5, pp. 165–182, June 2006.Google Scholar
  67. 67.
    T. L. Nwe, S. W. Foo, and L. C. D. Silva, “Speech emotion recognition using hidden Markov models,” Speech Communication, vol. 41, pp. 603–623, Nov. 2003.CrossRefGoogle Scholar
  68. 68.
    F. Yu, E. Chang, Y. Q. Xu, and H. Y. Shum, “Emotion detection from speech to enrich multimedia content,” in Proc. IEEE Pacific Rim Conference on Multimedia, (Beijing), Vol.1 pp. 550–557, 2001.Google Scholar
  69. 69.
    J. Yuan, L. Shen, and F. Chen, “The acoustic realization of anger, fear, joy and sadness in chinese,” in International Conference on Spoken Language Processing (ICSLP ’02),, (Denver, Colorado, USA), pp. 2025–2028, September 2002.Google Scholar
  70. 70.
    I. Iriondo, R. Guaus, A. Rodríguez, P. Lázaro, N. Montoya, J. M. Blanco, D. Bernadas, J.M. Oliver, D. Tena, and L. Longhi, “Validation of an acoustical modeling of emotional expression in spanish using speech synthesis techniques,” in ITRW on Speech and Emotion, (NewCastle, Northern Ireland, UK), September 2000. ISCA Archive.Google Scholar
  71. 71.
    J. M. Montro, J. Gutterrez-Arriola, J. Colas, E. Enriquez, and J. M. Pardo, “Analysis and modeling of emotional speech in spanish,” in Proc. Int.Conf. on Phonetic Sciences, pp.957–960, 1999.Google Scholar
  72. 72.
    A. Iida, N. Campbell, F. Higuchi, and M. Yasumura, “A corpus-based speech synthesis system with emotion,” Speech Communication, vol. 40, pp. 161–187, Apr. 2003.MATHCrossRefGoogle Scholar
  73. 73.
    V. Makarova and V. A. Petrushin, “Ruslana: A database of russian emotional utterances,” in International Conference on Spoken Language Processing (ICSLP ’02),, pp. 2041–2044, 2002.Google Scholar
  74. 74.
    M. Nordstrand, G. Svanfeldt, B. Granstrom, and D. House, “Measurements of ariculatory variation in expressive speech for a set of swedish vowels,” Speech Communication, vol. 44, pp. 187–196, September 2004.CrossRefGoogle Scholar
  75. 75.
    E. M. Caldognetto, P. Cosi, C. Drioli, G. Tisato, and F. Cavicchio, “Modifications of phonetic labial targets in emotive speech: effects of the co-production of speech and emotions,” Speech Communication, vol. 44, no. 1–4, pp. 173–185, 2004.CrossRefGoogle Scholar
  76. 76.
    J. Makhoul, “Linear prediction: A tutorial review,” Proc. IEEE, vol. 63, pp. 561–580, Apr. 1975.CrossRefGoogle Scholar
  77. 77.
    S. R. M. Kodukula, Significance of Excitation Source Information for Speech Analysis. PhD thesis, Dept. of Computer Science, IIT, Madras, March 2009.Google Scholar
  78. 78.
    T. V. Ananthapadmanabha and B. Yegnanarayana, “Epoch extraction from linear prediction residual for identification of closed glottis interval,” IEEE Trans. Acoustics, Speech, and Signal Processing, vol. 27, pp. 309–319, Aug. 1979.Google Scholar
  79. 79.
    B.Yegnanarayana, S.R.M.Prasanna, and K. Rao, “Speech enhancement using excitation source information,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, vol. 1, (Orlando, Florida, USA), pp. 541–544, May 2002.Google Scholar
  80. 80.
    A. Bajpai and B.Yegnanarayana, “Combining evidence from sub-segmental and segmental features for audio clip classification,” in IEEE Region 10 Conference (TENCON), (India), pp. 1–5, IIIT, Hyderabad, Nov. 2008.Google Scholar
  81. 81.
    B. S. Atal, “Automatic speaker recognition based on pitch contours,” Journal of Acoustic Society of America, vol. 52, no. 6, pp. 1687–1697, 1972.CrossRefGoogle Scholar
  82. 82.
    P. Thevenaz and H. Hugli, “Usefulness of lpc residue in textindependent speaker verification,” Speech Communication, vol. 17, pp. 145–157, 1995.CrossRefGoogle Scholar
  83. 83.
    J. H. L. Liu and G. Palm, “On the use of features from prediction residual signal in speaker recognition,” pp. 313–316, Proc. European Conf. Speech Processing, Technology (EUROSPEECH), 1997.Google Scholar
  84. 84.
    B. Yegnanarayana, P. S. Murthy, C. Avendano, and H. Hermansky, “Enhancement of reverberant speech using lp residual,” in IEEE International Conference on Acoustics, Speech and Signal Processing, (Seattle, WA , USA), pp. 405–408 vol.1, IEEE Xplore, May 1998. DOI:10.1109/ICASSP.1998.674453.Google Scholar
  85. 85.
    K. S. Kumar, M. S. H. Reddy, K. S. R. Murty, and B. Yegnanarayana, “Analysis of laugh signals for detecting in continuous speech,” (Brighton, UK), pp. 1591–1594, INTERSPEECH, September, 6–10 2009.Google Scholar
  86. 86.
    G. Bapineedu, B. Avinash, S. V. Gangashetty, and B. Yegnanarayana, “Analysis of lombard speech using excitation source information,” (Brighton, UK), pp. 1091–1094, INTERSPEECH, September, 6–10 2009.Google Scholar
  87. 87.
    O. M. Mubarak, E. Ambikairajah, and J. Epps, “Analysis of an mfcc-based audio indexing system for efficient coding of multimedia sources,” in The 8th International Symposium on Signal Processing and its Applications, (Sydney, Australia), 28–31 August 2005.Google Scholar
  88. 88.
    T. L. Pao, Y. T. Chen, J. H. Yeh, and W. Y. Liao, “Combining acoustic features for improved emotion recognition in mandarin speech,” in ACII (J. Tao, T. Tan, and R. Picard, eds.), (LNCS 3784), pp. 279–285, ©Springer-Verlag Berlin Heidelberg, 2005.Google Scholar
  89. 89.
    T. L. Pao, Y. T. Chen, J. H. Yeh, Y. M. Cheng, and C. S. Chien, Feature Combination for Better Differentiating Anger from Neutral in Mandarin Emotional Speech. LNCS 4738, ACII 2007: Springer-Verlag Berlin Heidelberg, 2007.Google Scholar
  90. 90.
    N. Kamaruddin and A. Wahab, “Features extraction for speech emotion,” Journal of Computational Methods in Science and Engineering, vol. 9, no. 9, pp. 1–12, 2009. ISSN:1472–7978 (Print) 1875–8983 (Online).Google Scholar
  91. 91.
    D. Neiberg, K. Elenius, and K. Laskowski, “Emotion recognition in spontaneous speech using GMMs,” in INTERSPEECH 2006 - ICSLP, (Pittsburgh, Pennsylvania), pp. 809–812, 17–19 September 2006.Google Scholar
  92. 92.
    D. Bitouk, R. Verma, and A. Nenkova, “Class-level spectral features for emotion recognition,” Speech Communication, 2010. Article in press.Google Scholar
  93. 93.
    M. Sigmund, “Spectral analysis of speech under stress,” IJCSNS International Journal of Computer Science and Network Security, vol. 7, pp. 170–172, April 2007.Google Scholar
  94. 94.
    K. S. Rao and B. Yegnanarayana, “Prosody modification using instants of significant excitation,” IEEE Trans. Speech and Audio Processing, vol. 14, pp. 972–980, May 2006.CrossRefGoogle Scholar
  95. 95.
    S. Werner and E. Keller, “Prosodic aspects of speech,” in Fundamentals of Speech Synthesis and Speech Recognition: Basic Concepts, State of the Art, the Future Challenges (E. Keller, ed.), pp. 23–40, Chichester: John Wiley, 1994.Google Scholar
  96. 96.
    T. Banziger and K. R. Scherer, “The role of intonation in emotional expressions,” Speech Communication, no. 46, pp. 252–267, 2005.Google Scholar
  97. 97.
    R. Cowie and R. R. Cornelius, “Describing the emotional states that are expressed in speech,” Speech Communication, vol. 40, pp. 5–32, Apr. 2003.MATHCrossRefGoogle Scholar
  98. 98.
    F. Dellaert, T. Polzin, and A. Waibel, “Recognising emotions in speech,” ICSLP 96, Oct. 1996.Google Scholar
  99. 99.
    M. Schroder, “Emoptional speech synthesis: A review,” (Seventh european conference on speech communication and technology Aalborg, Denmark), Eurospeech 2001, Sept. 2001.Google Scholar
  100. 100.
    I. R. Murray and J. L. Arnott, “Implementation and testing of a system for producing emotion by rule in synthetic speech,” Speech Communication, vol. 16, pp. 369–390, 1995.CrossRefGoogle Scholar
  101. 101.
    J. E. Cahn, “The generation of affect in synthesized speech,” JAVIOS, pp. 1–19, Jul. 1990.Google Scholar
  102. 102.
    I. R. Murray, J. L. Arnott, and E. A. Rohwer, “Emotional stress in synthetic speech: Progress and future directions,” Speech Communication, vol. 20, pp. 85–91, Nov. 1996.CrossRefGoogle Scholar
  103. 103.
    K. R. Scherer, “Vocal communication of emotion: A review of research paradigms,” Speech Communication, vol. 40, pp. 227–256, 2003.MATHCrossRefGoogle Scholar
  104. 104.
    S. McGilloway, R. Cowie, E. Douglas-Cowie, S. Gielen, M. Westerdijk, and S. Stroeve, “Approaching automatic recognition of emotion from voice: A rough benchmark,” (Belfast), 2000.Google Scholar
  105. 105.
    I. Luengo, E. Navas, I. Hernáez, and J. Sánchez, “Automatic emotion recognition using prosodic parameters,” in INTERSPEECH, (Lisbon, Portugal), pp. 493–496, IEEE, September 2005.Google Scholar
  106. 106.
    T. Iliou and C.-N. Anagnostopoulos, “Statistical evaluation of speech features for emotion recognition,” in Fourth International Conference on Digital Telecommunications, (Colmar, France), pp. 121–126, July 2009. ISBN: 978-0-7695-3695-8.Google Scholar
  107. 107.
    Y. hao Kao and L. shan Lee, “Feature analysis for emotion recognition from mandarin speech considering the special characteristics of chinese language,” in INTERSPEECH -ICSLP, (Pittsburgh, Pennsylvania), pp. 1814–1817, September 2006.Google Scholar
  108. 108.
    A. Zhu and Q. Luo, “Study on speech emotion recognition system in e learning,” in Human Computer Interaction, Part III, HCII (J. Jacko, ed.), (Berlin Heidelberg), pp. 544–552, Springer Verlag, 2007. LNCS:4552, DOI: 10.1007/978-3-540-73110-8-59.Google Scholar
  109. 109.
    M. Lugger and B. Yang, “The relevance of voice quality features in speaker independent emotion recognition,” in ICASSP, (Honolulu, Hawai, USA), pp. IV17–IV20, IEEE, May 2007.Google Scholar
  110. 110.
    Y. Wang, S. Du, and Y. Zhan, “Adaptive and optimal classification of speech emotion recognition,” in Fourth International Conference on Natural Computation, pp. 407–411, October 2008. http://doi.ieeecomputersociety.org/10.1109/ICNC.2008.713.
  111. 111.
    S. Zhang, “Emotion recognition in chinese natural speech by combining prosody and voice quality features,” in Advances in Neural Networks, Lecture Notes in Computer Science, Volume 5264 (S. et al., ed.), (Berlin Heidelberg), pp. 457–464, Springer Verlag, 2008. DOI: 10.1007/978-3-540-87734-9-52.Google Scholar
  112. 112.
    D. Ververidis, C. Kotropoulos, and I. Pitas, “Automatic emotional speech classification,” pp. I593–I596, ICASSP 2004, IEEE, 2004.Google Scholar
  113. 113.
    K. S. Rao, R. Reddy, S. Maity, and S. G. Koolagudi, “Characterization of emotions using the dynamics of prosodic features,” in International Conference on Speech Prosody, (Chicago, USA), May 2010.Google Scholar
  114. 114.
    K. S. Rao, S. R. M. Prasanna, and T. V. Sagar, “Emotion recognition using multilevel prosodic information,” in Workshop on Image and Signal Processing (WISP-2007), (Guwahati, India), IIT Guwahati, Guwahati, December 2007.Google Scholar
  115. 115.
    Y.Wang and L.Guan, “An investigation of speech-based human emotion recognition,” in IEEE 6th Workshop on Multimedia Signal Processing, pp. 15–18, IEEE press, October 2004.Google Scholar
  116. 116.
    Y. Zhou, Y. Sun, J. Zhang, and Y. Yan, “Speech emotion recognition using both spectral and prosodic features,” in International Conference on Information Engineering and Computer Science, ICIECS, (Wuhan), pp. 1–4, IEEE press, 19–20 Dec. 2009. DOI: 10.1109/ICIECS.2009.5362730.Google Scholar
  117. 117.
    C. E. X. Y. Yu, F. and H. Shum, “Emotion detection from speech to enrich multimedia content,” in Second IEEE Pacific-Rim Conference on Multimedia, (Beijing, China), October 2001.Google Scholar
  118. 118.
    V.Petrushin, Emotion in speech: Recognition and application to call centres. Artifi.Neu.Net. Engr.(ANNIE), 1999.Google Scholar
  119. 119.
    R. Nakatsu, J. Nicholson, and N. Tosa, “Emotion recognition and its application to computer agents with spontaneous interactive capabilities,” Knowledge Based Systems, vol. 13, pp.497–504, 2000.CrossRefGoogle Scholar
  120. 120.
    J. Nicholson, K. Takahashi, and R.Nakatsu, “Emotion recognition in speech using neural networks,” Neural computing and applications, vol. 11, pp. 290–296, 2000.CrossRefGoogle Scholar
  121. 121.
    R. Tato, R. Santos, R. Kompe1, and J. Pardo, “Emotional space improves emotion recognition,” (Denver, Colorado, USA), 7th International Conference on Spoken Language Processing, September 16–20 2002.Google Scholar
  122. 122.
    R. Fernandez and R. W. Picard, “Modeling drivers’ speech under stress,” Speech Communication, vol. 40, p. 145–159, 2003.MATHCrossRefGoogle Scholar
  123. 123.
    V. A. Petrushin, “Emotion in speech : Recognition and application to call centers,” Proceedings of the 1999 Conference on Artificial Neural Networks in Engineering (ANNIE ’99), 1999.Google Scholar
  124. 124.
    J. Nicholson, K. Takahashi, and R.Nakatsu, “Emotion recognition in speech using neural networks,” in 6th International Conference on Neural Information Processing, (Perth, WA, Australia), pp. 495–501, ICONIP-99, August 1999. 10.1109/ICONIP.1999.845644.Google Scholar
  125. 125.
    V. A. Petrushin, “Emotion recognition in speech signal: Experimental study, development and application,” in ICSLP, (Beijing, China), 2000.Google Scholar
  126. 126.
    C. M. Lee, S. Narayanan, and R. Pieraccini, “Recognition of negative emotion in the human speech signals,” in Workshop on Auto. Speech Recognition and Understanding, December 2001.Google Scholar
  127. 127.
    G. Zhou, J. H. L. Hansen, and J. F. Kaiser, “Nonlinear feature based classification of speech under stress,” IEEE Trans. Speech and Audio Processing, vol. 9, pp. 201–216, March 2001.CrossRefGoogle Scholar
  128. 128.
    K. S. Rao and S. G. Koolagudi, “Characterization and recognition of emotions from speech using excitation source information,” International Journal of Speech Technology, Springer. DOI 10.1007/s10772-012-9175-2.Google Scholar
  129. 129.
    K. S. R. Murty and B. Yegnanarayana, “Combining evidence from residual phase and mfcc features for speaker recognition,” IEEE SIGNAL PROCESSING LETTERS, vol. 13, pp.52–55, January 2006.CrossRefGoogle Scholar
  130. 130.
    K. Murty and B. Yegnanarayana, “Epoch extraction from speech signals,” IEEE Trans. Audio, Speech, and Language Processing, vol. 16, pp. 1602–1613, 2008.Google Scholar
  131. 131.
    B. Yegnanarayana, Artificial Neural Networks. New Delhi, India: Prentice-Hall, 1999.Google Scholar
  132. 132.
    S. Haykin, Neural Networks: A Comprehensive Foundation. New Delhi, India: Pearson Education Aisa, Inc., 1999.MATHGoogle Scholar
  133. 133.
    K. S. Rao, “Role of neural network models for developing speech systems,” Sadhana, Academy Proceedings in Engineering Sciences, Indian Academy of Sciences, Springer, vol. 36, pp. 783–836, Oct. 2011.Google Scholar
  134. 134.
    R. H. Laskar, D. Chakrabarty, F. A. Talukdar, K. S. Rao, and K. Banerjee, “Comparing ANN and GMM in a voice conversion framework,” Applied Soft Computing,Elsevier, vol. 12, pp. 3332–3342, Nov. 2012.Google Scholar
  135. 135.
    K. I. Diamantaras and S. Y. Kung, Principal Component Neural Networks: Theory and Applications. Newyork: John Wiley and Sons, 1996.MATHGoogle Scholar
  136. 136.
    M. S. Ikbal, H. Misra, and B. Yegnanarayana, “Analysis of autoassociative mapping neural networks,” (USA), pp. 854–858, Proc. Internat. Joint Conf. on Neural Networks (IJCNN), 1999.Google Scholar
  137. 137.
    S. P. Kishore and B. Yegnanarayana, “Online text-independent speaker verification system using autoassociative neural network models,” (Washington, DC, USA.), pp. 1548–1553 (V2), Proc. Internat. Joint Conf. on Neural Networks (IJCNN), August 2001.Google Scholar
  138. 138.
    A. V. N. S. Anjani, “Autoassociate neural network models for processing degraded speech,” Master’s thesis, MS thesis, Department of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai 600 036, India, 2000.Google Scholar
  139. 139.
    K. S. Reddy, “Source and system features for speaker recognition,” Master’s thesis, MS thesis, Department of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai 600 036, India, 2004.Google Scholar
  140. 140.
    C. S. Gupta, “Significance of source features for speaker recognition,” Master’s thesis, MS thesis, Department of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai 600 036, India, 2003.Google Scholar
  141. 141.
    S. Desai, A. W. Black, B.Yegnanarayana, and K. Prahallad, “Spectral mapping using artificial neural networks for voice conversion,” IEEE Trans. Audio, Speech, and Language Processing, vol. 18, pp. 954–964, 8 Apr. 2010.Google Scholar
  142. 142.
    K. S. Rao and B. Yegnanarayana, “Intonation modeling for indian languages,” Computer Speech and Language, vol. 23, pp. 240–256, April 2009.CrossRefGoogle Scholar
  143. 143.
    C. K. Mohan and B. Yegnanarayana, “Classification of sport videos using edge-based features and autoassociative neural network models,” Signal, Image and Video Processing, vol. 4, pp. 61–73, 15 Nov. 2008. DOI: 10.1007/s11760-008-0097-9.Google Scholar
  144. 144.
    L. Mary and B. Yegnanarayana, “Autoassociative neural network models for language identification,” in International Conference on Intelligent Sensing and Information Processing, pp. 317–320, IEEE, 24 Aug. 2004. DOI:10.1109/ICISIP.2004.1287674.Google Scholar
  145. 145.
    K. S. Rao, J. Yadav, S. Sarkar, S. G. Koolagudi, and A. K. Vuppala, “Neural network based feature transformation for emotion independent speaker identification,” International Journal of Speech Technology, Springer, vol. 15, no. 3, pp. 335–349, 2012.CrossRefGoogle Scholar
  146. 146.
    B. Yegnanarayana, K. S. Reddy, and S. P. Kishore, “Source and system features for speaker recognition using aann models,” (Salt Lake City, UT), IEEE Int. Conf. Acoust., Speech, and Signal Processing, May 2001.Google Scholar
  147. 147.
    C. S. Gupta, S. R. M. Prasanna, and B. Yegnanarayana, “Autoassociative neural network models for online speaker verification using source features from vowels,” in Int. Joint Conf. Neural Networks, (Honululu, Hawii, USA), May 2002.Google Scholar
  148. 148.
    B. Yegnanarayana, K. S. Reddy, and S. P. Kishore, “Source and system features for speaker recognition using AANN models,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, (Salt Lake City, Utah, USA), pp. 409–412, May 2001.Google Scholar
  149. 149.
    S. Theodoridis and K. Koutroumbas, Pattern Recognition. USA: Elsevier, Academic press, 3 ed., 2006.Google Scholar
  150. 150.
    K. S. Rao, Acquisition and incorporation prosody knowledge for speech systems in Indian languages. PhD thesis, Dept. of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai, India, May 2005.Google Scholar
  151. 151.
    S. R. M. Prasanna, B. V. S. Reddy, and P. Krishnamoorthy, “Vowel onset point detection using source, spectral peaks, and modulation spectrum energies,” IEEE Trans. Audio, Speech, and Language Processing, vol. 17, pp. 556–565, March 2009.Google Scholar
  152. 152.
    S. G. Koolagudi and K. S. Rao, “Emotion recognition from speech using sub-syllabic and pitch synchronous spectral features,” International Journal of Speech Technology, Springer. DOI 10.1007/s10772-012-9150-8.Google Scholar
  153. 153.
    J. Chen, Y. A. Huang, Q. Li, and K. K. Paliwal, “Recognition of noisy speech using dynamic spectral subband centroids,” IEEE signal processing letters, vol. 11, pp. 258–261, February 2004.CrossRefGoogle Scholar
  154. 154.
    B. Yegnanarayana and S. P. Kishore, “AANN an alternative to GMM for pattern recognition,” Neural Networks, vol. 15, pp. 459–469, Apr. 2002.CrossRefGoogle Scholar
  155. 155.
    R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification. Singapore: A Wiley-interscience Publications, 2 ed., 2004.Google Scholar
  156. 156.
    S. R. M. Prasanna, B. V. S. Reddy, and P. Krishnamoorthy, “Vowel onset point detection using source, spectral peaks, and modulation spectrum energies,” IEEE Trans. Audio, Speech, and Language Processing, vol. 17, pp. 556–565, May 2009.Google Scholar
  157. 157.
    Unicode Entity Codes for the Telugu Script, Accents, Symbols and Foreign Scripts, Penn State University, USA. (http://tlt.its.psu.edu/suggestions/international/bylanguage/teluguchart.html)
  158. 158.
    K. S. Rao, Predicting Prosody from Text for Text-to-Speech Synthesis. ISBN-13: 978-1461413370, Springer, 2012.Google Scholar
  159. 159.
    K. S. Rao and S. G. Koolagudi, “Selection of suitable features for modeling the durations of syllables,” Journal of Software Engineering and Applications, vol. 3, pp. 1107–1117, Dec. 2010.CrossRefGoogle Scholar
  160. 160.
    K. S. Rao, “Application of prosody models for developing speech systems in indian languages,” International Journal of Speech Technology, Springer, vol. 14, pp. 19–33, 2011.Google Scholar
  161. 161.
    N. P. Narendra, K. S. Rao, K. Ghosh, R. R. Vempada, and S. Maity, “Development of syllable-based text-to-speech synthesis system in bengali,” International Journal of Speech Technology, Springer, vol. 14, no. 3, pp. 167–181, 2011.CrossRefGoogle Scholar
  162. 162.
    K. S. Rao, S. G. Koolagudi, and R. R. Vempada, “Emotion recognition from speech using global and local prosodic features,” International Journal of Speech Technology, Springer, Aug. 2012. DOI: 10.1007/s10772-012-9172-2.Google Scholar
  163. 163.
    L. R. Rabiner, Digital Signal Processing. IEEE Press, 1972.Google Scholar
  164. 164.
    B. S. Atal and S. L. Hanauer, “Speech analysis and synthesis by linear prediction of the speech wave,” J. Acoust. Soc. Am., vol. 50, pp. 637–655, Aug. 1971.CrossRefGoogle Scholar
  165. 165.
    J. Makhoul, “Linear prediction: A tutorial review,” Proc. IEEE, vol. 63, pp. 561–580, Apr. 1975.CrossRefGoogle Scholar
  166. 166.
    B. S. Atal and M. R. Schroeder, “Linear prediction analysis of speech based on a pole-zero representation,” J. Acoust. Soc. Am., vol. 64, no. 5, pp. 1310–1318, 1978.CrossRefGoogle Scholar
  167. 167.
    D. O’Shaughnessy, “Linear predictive coding,” IEEE Potentials, vol. 7, pp. 29–32, Feb. 1988.CrossRefGoogle Scholar
  168. 168.
    T. Ananthapadmanabha and B. Yegnanarayana, “Epoch extraction from linear prediction residual for identification of closed glottis interval,” IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-27, pp. 309–319, Aug. 1979.Google Scholar
  169. 169.
    J. Picone, “Signal modeling techniques in speech recognition,” Proc. IEEE, vol. 81, pp.1215–1247, Sep. 1993.CrossRefGoogle Scholar
  170. 170.
    J. W. Picone, “Signal modeling techniques in speech recognition,” Proceedings of IEEE, vol. 81, pp. 1215–1247, Sep. 1993.CrossRefGoogle Scholar
  171. 171.
    J. R. Deller, J. H. Hansen, and J. G. Proakis, Discrete Time Processing of Speech Signals. 1st ed. Upper Saddle River, NJ, USA: Prentice Hall PTR, 1993.Google Scholar
  172. 172.
    J. Benesty, M. M. Sondhi, and Y. A. Huang, Springer Handbook of Speech Processing. Springer-Verlag New York, Inc., 2008.Google Scholar
  173. 173.
    J. Volkmann, S. Stevens, and E. Newman, “A scale for the measurement of the psychological magnitude pitch,” J. Acoust. Soc. Amer., vol. 8, pp. 185–190, Jan. 1937.CrossRefGoogle Scholar
  174. 174.
    Z. Fang, Z. Guoliang, and S. Zhanjiang, “Comparison of different implementations of MFCC,” J. Computer Science and Technology, vol. 16, no. 6, pp. 582–589, 2001.MATHCrossRefGoogle Scholar
  175. 175.
    G. K. T. Ganchev and N. Fakotakis, “Comparative evaluation of various MFCC implementations on the speaker verification task,” in Proc. of Int. Conf. on Speech and Computer, (Patras, Greece), pp. 191–194, 2005.Google Scholar
  176. 176.
    S. Furui, “Comparison of speaker recognition methods using statistical features and dynamic features,” IEEE Trans. Acoust., Speech, Signal Process., vol. 29, no. 3, pp. 342–350, 1981.Google Scholar
  177. 177.
    J. S. Mason and X. Zhang, “Velocity and acceleration features in speaker recognition,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, (Toronto, Canada), pp. 3673–3676, Apr. 1991.Google Scholar
  178. 178.
    D. A. Reynolds, “Speaker identification and verification using Gaussian mixture speaker models,” Speech Communication, vol. 17, pp. 91–108, Aug. 1995.CrossRefGoogle Scholar
  179. 179.
    F. Bimbot, J. F. Bonastre, C. Fredouille, G. Gravier, M. I. Chagnolleau, S. Meignier, T. Merlin, O. J. Garcia, D. Petrovska, and Reynolds, “A tutorial on text-independent speaker verification,” EURASIP Journal Applied Signal process, no. 4, pp. 430–451, 2004.Google Scholar
  180. 180.
    A. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm,” Journal Royal Statistical Society, vol. 39, no. 1, pp. 1–38, 1977.MathSciNetMATHGoogle Scholar
  181. 181.
    Y. Linde, A. Buzo, and R. Gray, “An algorithm for vector quantizer design,” IEEE Trans. Communications, vol. 28, pp. 84–95, Jan. 1980.CrossRefGoogle Scholar
  182. 182.
    J. B. MacQueen, “Some methods for classification and analysis of multivariate observations,” in Proc. of the fifth Berkeley Symposium on Mathematical Statistics and Probability (L. M. L. Cam and J. Neyman, eds.), vol. 1, pp. 281–297, University of California Press, 1967.Google Scholar
  183. 183.
    J. A. Hartigan and M. A. Wong, “A K-means clustering algorithm,” Applied Statistics, vol. 28, no. 1, pp. 100–108, 1979.MATHCrossRefGoogle Scholar
  184. 184.
    Q. Y. Hong and S. Kwong, “A discriminative training approach for text-independent speaker recognition,” Signal process., vol. 85, no. 7, pp. 1449–1463, 2005.MATHCrossRefGoogle Scholar
  185. 185.
    D. Reynolds and R. Rose, “Robust text-independent speaker identification using Gaussian mixture speaker models,” IEEE Trans. Speech Audio processeing, vol. 3, pp. 72–83, Jan. 1995.CrossRefGoogle Scholar
  186. 186.
    J. Gauvain and C.-H. Lee, “Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains,” IEEE Trans. Speech Audio process., vol. 2, pp.291–298, Apr. 1994.CrossRefGoogle Scholar
  187. 187.
    D. A. Reynolds, “Speaker verification using adapted Gaussian mixture models,” Digital Signal Process., vol. 10, pp. 19–41, Jan. 2000.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  • Sreenivasa Rao Krothapalli
    • 1
  • Shashidhar G. Koolagudi
    • 2
  1. 1.School of Information TechnologyIndian Institute of TechnologyKharagpurIndia
  2. 2.Department of Computer ScienceGraphic Era UniversityDehradunIndia

Personalised recommendations