Multimedia Tools and Applications

, Volume 78, Issue 6, pp 6441–6458 | Cite as

Emotional speaker recognition in real life conditions using multiple descriptors and i-vector speaker modeling technique

  • Asma MansourEmail author
  • Farah Chenchah
  • Zied Lachiri


Emotional speaker recognition under real life conditions becomes an urgent need for several applications. This paper proposes a novel approach using multiple feature extraction methods and i-vector modeling technique in order to improve emotional speaker recognition under real conditions. The performance of the proposed approach is evaluated on real condition speech signal (IEMOCAP corpus) under clean and noisy environments using various SNR levels. We examined divers known spectral features in speaker recognition (MFCC, LPCC and RASTA-PLP) and performed combined features called MFCC-SDC coefficients. The feature vectors are then classified using the multiclass Support Vector Machines (SVM). Experimental results illustrate good robustness of the proposed system against talking conditions (emotions) and against real life environment (noise). Besides, results reveal that MFCC-SDC features outperforms the conventional MFCCs.


Speaker recognition Emotion I-vector MFCC-SDC SVM Noise 



  1. 1.
    Ayadi ME, Kamel MS, Karray F (2011) Survey on speech emotion recognition :features classification schemes and databases. J Patt Recog 572–587Google Scholar
  2. 2.
    Boulianne G Kenny, P (2005) Eigenvoice modeling with sparse training data. IEEE Trans Speech Audio ProcGoogle Scholar
  3. 3.
    Busso C, Bulut M, Lee C, Kazemzadeh A, Mower E, Kim S, Chang J, Lee S, Narayanan S (2008) IEMOCAP: interactive emotional dyadic motion capture database. J Langua Resou Eval 42(4):335–359CrossRefGoogle Scholar
  4. 4.
    Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, CambridgeCrossRefGoogle Scholar
  5. 5.
    Davis SB, Mermelstein P (1980) Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans Acoust Speech Sign Proc ASSP-28(4):357–366CrossRefGoogle Scholar
  6. 6.
    Dhonde SB, SM Jagade (2015) Feature Extraction Techniques in Speaker Recognition: A Review, International Journal on Recent Technologies in Mechanical and Electrical Engineering (IJRMEE), ISSN: 2349–7947, 2Google Scholar
  7. 7.
    Hermansky H, Morgan H (1992) RASTA-PLP speech analysis technique. IEEE Int Conf Acoust Speech Sign Proc 1:121124Google Scholar
  8. 8.
    Hsu C, Lin C (2001) A comparison of methods for multiclass support vector machines. IEEE Trans Neural Netw 13(2):415425Google Scholar
  9. 9.
    Kenny P, Dehak N, Dehak R (2009) Support vector machines versus fast scoring in the low-dimensional total variability space for speaker verification. InterspeechGoogle Scholar
  10. 10.
    Mackov L (2014) Emotional speaker veri_cation based on i-vectors. IEEE International ConferenceGoogle Scholar
  11. 11.
    Mackov L et al (2015) Best feature selection for emotional speaker verification in i-vector representationGoogle Scholar
  12. 12.
    Murali Krishna N, Lakshmi PV, Srinivas Y (2013) Inferring the Human Emotional State of Mind using Assymetric Distrubution, International Journal of Advanced Computer Science and Applications(IJACSA) 4;1Google Scholar
  13. 13.
    Ouellet P, Dumouchel P, Kenny P, Boulianne G (2007) Joint factor analysis versus eigen channels in speaker recognition. IEEE TransGoogle Scholar
  14. 14.
    Prithvi P, Kumar TK (2015) Comparative Analysis of MFCC, LFCC, RASTA -PLP, International Journal of Scientific Engineering and Research (IJSER) 2347–3878Google Scholar
  15. 15.
    Quatieri TF, Reynolds DA, Dunn RB (2000) Speaker verification using adapted gaussian mixture models. Digital signal processingGoogle Scholar
  16. 16.
    Richardson F, Reynolds D, Dehak N (2015) A Uni_ed Deep Neural Network for Speaker and Language Recognition, International Speech Communication Association, INTERSPEECH, September 6–10, 2015, Dresden, GermanyGoogle Scholar
  17. 17.
    Rusu C, Ghiurcau MV, Astola J (2011) Speaker recognition in an emotional environment. Proceedings of SPAMECGoogle Scholar
  18. 18.
    Sarmah K, Bhattacharjee U (2014) GMM based Language Identification using MFCC and SDC Features. Int J Comput Appl 85(5)CrossRefGoogle Scholar
  19. 19.
    Shahin I (2009) Speaker Identification in Emotional Environments. Iranian J Electric Computer Eng 8 (1)Google Scholar
  20. 20.
    Shahin I (2013) Speaker identification in emotional talking environments using both gender and emotion cues. IEEEGoogle Scholar
  21. 21.
    Shashidhar G, Koolagudi K (2012) Sreenivasa Rao. Emotion recognition from speech : a review. Springer ScienceGoogle Scholar
  22. 22.
    Sirisha Devi J, Yarramalle S, Nandyala SP (2014) Speaker emotion recognition based on speech features and classification techniques, I.J. Comput Netw Info Sec 7:61–77Google Scholar
  23. 23.
    Sreenivasa Rao K, Koolagudi SG, Sharma K (2012) Speaker recognition in emotional environment. Communications in Computer and InformationGoogle Scholar
  24. 24.
    Tao D, Guo Y, Song M (2016) Person Re-Identification by Dual-RegularizedKISS Metric Learning. IEEE Transa Image Proc 25(6)MathSciNetCrossRefGoogle Scholar
  25. 25.
    Tao D, Cheng J, Song M (2016) Manifold Ranking-Based Matrix Factorization for Saliency Detection. IEEE Trans Neural Netw Learn Syst 27(6)MathSciNetCrossRefGoogle Scholar
  26. 26.
    Tao D, Guo Y, Li Y, Gao X (2018) Tensor Rank Preserving Discriminant Analysis for Facial Recognition. IEEE Trans Image Proc 27(1)MathSciNetCrossRefGoogle Scholar
  27. 27.
    Van Leeuwen D, Hasan Bahari M, Saeidi R (2013) Accent recognition using i-vector. ICASSPGoogle Scholar
  28. 28.
    Vapnik VN (1998) Statistical Learning Theory. Wiley-Interscience, New YorkzbMATHGoogle Scholar
  29. 29.
    Xia R, Liu Y (2012) Using i-vector space model for emotion recognition. INTERSPEECHGoogle Scholar
  30. 30.
    Xu M, Bao H, Zheng TF (2007) Emotion attribute projection for speaker recognition on emotional speech. in Proceedings of InterspeechGoogle Scholar
  31. 31.
    Yang Y, Li C (2011) Emotional Speaker Identification by Humans and Machines, Speech Commun: Springer. 167173CrossRefGoogle Scholar
  32. 32.
    Yang Y, Li C (2013) Emotional speaker recognition based on i-vector through atom aligned sparse representationGoogle Scholar
  33. 33.
    Yeh J-H, Pao T-L, Lin C-Y, Tsai Y-W, Chen Y-T (2011) Segment-based emotion recognition from continuous mandarin Chinese speech. Comput Hum Behav 27:1545–1552CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.National school of engineering of Tunis, LR SITI laboratoryUniversity of Tunis El ManarTunisTunisia

Personalised recommendations