Scores Selection for Emotional Speaker Recognition

  • Zhenyu Shan
  • Yingchun Yang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5558)


Emotion variability of the training and testing utterances is one of the largest challenges in speaker recognition. It is a common situation where training data is the neutral speech and testing data is the mixture of neutral and emotional speech. In this paper, we experimentally analyzed the performance of the GMM-based verification system with the utterances in this situation. It reveals that the verification performance improves as the emotion ratio decreases and the scores of neutral features against his/her model are distributed in the upper area than other three scores(neutral against the model of other speakers, and non-neutral speech against the model of himself/herself and other speakers). Based on these, we propose a scores selection method to reduce the emotion ratio of the testing utterance by eliminating the non-neutral features. It is applicable to the GMM-based recognition system without labeling the emotion state in the testing process. The experiments are carried on the MASC Corpus and the performance of the system with scores selection is improved with an EER reduction from 13.52% to 10.17%.


Speaker Recognition Emotional Speech Feature Selection 


  1. 1.
    Reynolds, D.A., Rose, R.C.: Robust Text-Independent Speaker Identification Using Gasussian Mixture Speaker Models. IEEE Ttransactions on Speech and Audio Processing 3(1), 72–83 (1995) Google Scholar
  2. 2.
    Wu, Z., Li, D., Yang, Y.: Rules Based Feature Modification for Affective Speaker Recognition. In: ICASSP 2006, vol. 1, pp. 661–664 (2006) Google Scholar
  3. 3.
    Scherer, K.R., Johnstone, T., Klasmeyer, G.: Can automatic speaker verification be improved by training the algorithms on emotional speech? In: Proceedings of ICSLP 2000, Beijing, China, pp. 807–810 (2000) Google Scholar
  4. 4.
    Wu, T., Yang, Y., Wu, Z., Li, D.: MASC: A Speech Corpus in Mandarin for Emotion Analysis and Affective Speaker Recognition. In: Odyssey 2006, June 2006, pp. 1–5 (2006) Google Scholar
  5. 5.
    Wu, W., Zheng, T.F., Xu, M.-X., Bao, H.-J.: Study on Speaker Verification on Emotional Speech. In: ICSLP 2006, September 2006, pp. 2102–2105 (2006) Google Scholar
  6. 6.
    Scherer, K.R.: A cross-cultural investigation of emotion inferences from voice and speech: implication for speech technology. In: Proc. ICSLP 2000 (2000) Google Scholar
  7. 7.
    Scherer, K.R., Johnstone, T., Bänziger, T.: Automatic verification of emotionally stressed speakers: The problem of individual differences. In: Proc. of SPECOM 1998 (1998) Google Scholar
  8. 8.
    Vergin, R., O’Shaughnessy, D., Farhat, A.: Generalized Mel Frequency Cepstral Coefficients for Large-Vocabulary Speaker-Independent Continuous-Speech Recognition. IEEE Ttransactions on Speech and Audio Processing 7(5), 525–532 (1999) Google Scholar
  9. 9.
    Auckenthaler, R., Carey, M., Lloyd-Thomas, H.: Score normalization for text-independent speaker verification systems. Digital Signal Processing 10, 42–54 (2000) Google Scholar
  10. 10.
    Shan, Z., Yang, Y., Wu, Z.: Natural-Emotion GMM Transformation Algorithm for Emotional Speaker Recognition. In: InterSpeech 2007, pp. 782–785 (2007) Google Scholar
  11. 11.
    Shan, Z., Yang, Y.: Polynomial Function Based Neutral-Emotion GMM Transformation for Speaker Recognition. In: ICPR 2008 (2008) (accepted) Google Scholar
  12. 12.
    Shan, Z., Yang, Y., Wu, Z.: SCS: A Speech Check-in System. In: The 8th International Conference on Signal Processing, vol. 4, pp. 752–756 (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Zhenyu Shan
    • 1
  • Yingchun Yang
    • 1
  1. 1.College of Computer Science and TechnologyZhejiang UniversityHangzhouChina

Personalised recommendations