Speech emotion recognition using discriminative dimension reduction by employing a modified quantum-behaved particle swarm optimization algorithm

  • Fatemeh Daneshfar
  • Seyed Jahanshah KabudianEmail author


In recent years, Speech Emotion Recognition (SER) has received considerable attention in affective computing field. In this paper, an improved system for SER is proposed. In the feature extraction step, a hybrid high-dimensional rich feature vector is extracted from both speech signal and glottal-waveform signal using techniques such as MFCC, PLPC, and MVDR. The prosodic features derived from fundamental frequency (f0) contour are also added to this feature vector. The proposed system is based on a holistic approach that employs a modified quantum-behaved particle swarm optimization (QPSO) algorithm (called pQPSO) to estimate both the optimal projection matrix for feature-vector dimension reduction and Gaussian Mixture Model (GMM) classifier parameters. Since the problem parameters are in a limited range and the standard QPSO algorithm performs a search in an infinite range, in this paper, the QPSO is modified in such a way that it uses a truncated probability distribution and makes the search more efficient. The system works in real-time and is evaluated on three standard emotional speech databases Berlin database of emotional speech (EMO-DB), Surrey Audio-Visual Expressed Emotion (SAVEE) and Interactive Emotional Dyadic Motion Capture (IEMOCAP). The proposed method improves the accuracy of the SER system compared to classical methods such as FA, PCA, PPCA, LDA, standard QPSO, wQPSO, and deep neural network, and also outperforms many state-of-the-art recent approaches that use the same datasets.


Speech emotion recognition Dimension reduction Quantum-behaved particle swarm optimization 



We hereby express our gratitude to Abbas Neekabadi for providing us with some source codes.


  1. 1.
    Albornoz EM, Milone DH, Rufiner HL (2017) Feature extraction based on bio-inspired model for robust emotion recognition. Soft Comput 21(17):5145–5158CrossRefGoogle Scholar
  2. 2.
    Badshah AM, Rahim N, Ullah N, Ahmad J, Muhammad K, Lee MY, Kwon S, Baik SW (2019) Deep features-based speech emotion recognition for smart affective services. Multimed Tools Appl 78(5):5571–5589CrossRefGoogle Scholar
  3. 3.
    Bashirpour M, Geravanchizadeh M (2016) Speech emotion recognition based on power normalized cepstral coefficients in noisy conditions. Iranian Journal of Electrical and Electronic Engineering 12(3):197–205Google Scholar
  4. 4.
    Bhargava M and Polzehl T (2012) Improving automatic emotion recognition from speech using rhythm and temporal feature, Proc. International Conference on Emerging Computation and Information Technologies Google Scholar
  5. 5.
    Burkhardt F, Paeschke A, Rolfes M, Sendlmeier WF, and Weiss B (2005) A database of German emotional speech. Ninth European Conference on Speech Communication and TechnologyGoogle Scholar
  6. 6.
    Buscicchio CA, Górecki P and Caponetti L (2006) Speech emotion recognition using spiking neural networks. International Symposium on Methodologies for Intelligent Systems. Springer Berlin HeidelbergGoogle Scholar
  7. 7.
    Busso C, Bulut M, Lee C-C, Kazemzadeh A, Mower E, Kim S, Chang JN, Lee S, Narayanan SS (2008) IEMOCAP: interactive emotional dyadic motion capture database. Lang Resour Eval 42(4):335CrossRefGoogle Scholar
  8. 8.
    Chen L, Mao X, Xue Y, Cheng LL (2012) Speech emotion recognition: features and classification models. Digital signal processing 22(6):1154–1160MathSciNetCrossRefGoogle Scholar
  9. 9.
    Cho Y-H, Park K-S, and Pak RJ (2007) Speech emotion pattern recognition agent in mobile communication environment using fuzzy-SVM. Fuzzy information and engineering. Springer Berlin Heidelberg, 419–430Google Scholar
  10. 10.
    Darekar RV, Dhande AP (2018) Emotion recognition from Marathi speech database using adaptive artificial neural network. Biologically Inspired Cognitive Architectures 23:35–42CrossRefGoogle Scholar
  11. 11.
    Deb S, and Dandapat S (2016) Emotion classification using residual sinusoidal peak amplitude. 2016 International Conference on Signal Processing and Communications (SPCOM). IEEEGoogle Scholar
  12. 12.
    Deb S, and Dandapat S (2017) Exploration of phase information for speech emotion classification. 2017 Twenty-third National Conference on Communications (NCC). IEEEGoogle Scholar
  13. 13.
    Degottex G, Kane J, Drugman T, Raitio T, and Scherer S (2014) COVAREP—A collaborative voice analysis repository for speech technologies. 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEEGoogle Scholar
  14. 14.
    Deng J, Zhang Z, Eyben F, Schuller B (2014) Autoencoder-based unsupervised domain adaptation for speech emotion recognition. IEEE Signal Processing Letters 21(9):1068–1072CrossRefGoogle Scholar
  15. 15.
    Duda RO, Hart PE, and Stork DG. (2001) Pattern classification, 2 nd Ed. John Wiley & SonsGoogle Scholar
  16. 16.
    Gangeh MJ, Fewzee P, Ghodsi A, Kamel MS, Karray F (2014) Multiview supervised dictionary learning in speech emotion recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing 22(6):1056–1068CrossRefGoogle Scholar
  17. 17.
    Gharavian D, Sheikhan M, Nazerieh A, Garoucy S (2012) Speech emotion recognition using FCBF feature selection method and GA-optimized fuzzy ARTMAP neural network. Neural Comput & Applic 21(8):2115–2126CrossRefGoogle Scholar
  18. 18.
    Gharavian D, Bejani M, Sheikhan M (2017) Audio-visual emotion recognition using FCBF feature selection method and particle swarm optimization for fuzzy ARTMAP neural networks. Multimed Tools Appl 76(2):2331–2352CrossRefGoogle Scholar
  19. 19.
    Ghosh S, Laksana E, Morency L-P, and Scherer S (2016) Representation Learning for Speech Emotion Recognition. InterspeechGoogle Scholar
  20. 20.
    Grimm M, Kroschel K, Mower E, Narayanan S (2007) Primitives-based evaluation and estimation of emotions in speech. Speech Comm 49(10):787–800CrossRefGoogle Scholar
  21. 21.
    Haq S, and Jackson PJB. (2011) Multimodal emotion recognition. Machine audition: principles, algorithms and systems. IGI Global, 398–423Google Scholar
  22. 22.
    Yogesh CK, Hariharan M, Ngadiran R, Adom AH, Yaacob S, Berkai C, Polat K (2017) A new hybrid PSO assisted biogeography-based optimization for emotion and stress recognition from speech signal. Expert Syst Appl 69:149–158CrossRefGoogle Scholar
  23. 23.
    Hermansky H (1990) Perceptual linear predictive (PLP) analysis of speech. The Journal of the Acoustical Society of America 87(4):1738–1752CrossRefGoogle Scholar
  24. 24.
    Huang Y, Wu A, Zhang G, Li Y (2015) Extraction of adaptive wavelet packet filter-bank-based acoustic feature for speech emotion recognition. IET Signal Processing 9(4):341–348CrossRefGoogle Scholar
  25. 25.
    Huang Z-w, Xue W-t, Mao Q-r (2015) Speech emotion recognition with unsupervised feature learning. Frontiers of Information Technology & Electronic Engineering 16(5):358–366CrossRefGoogle Scholar
  26. 26.
    Huang Z, Xue W, Mao Q, Zhan Y (2017) Unsupervised domain adaptation for speech emotion recognition using PCANet. Multimed Tools Appl 76(5):6785–6799CrossRefGoogle Scholar
  27. 27.
    Idris I and Salam MS (2016) Improved Speech Emotion Classification from Spectral Coefficient Optimization. Advances in Machine Learning and Signal Processing. Springer International Publishing, 247–257Google Scholar
  28. 28.
    Junqua J-C, and Haton J-P (2012) Robustness in automatic speech recognition: Fundamentals and applications. Vol. 341. Springer Science & Business MediaGoogle Scholar
  29. 29.
    Kabudian J, Mehdi Homayounpour M, and Mohammad Ahadi S (2008) Time-inhomogeneous hidden Bernoulli model: An alternative to hidden Markov model for automatic speech recognition. 2008 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEEGoogle Scholar
  30. 30.
    Kadiri SR, Gangamohan P, Gangashetty SV, and Yegnanarayana B (2015) Analysis of excitation source features of speech for emotion recognition. In Sixteenth Annual Conference of the International Speech Communication Association Google Scholar
  31. 31.
    Kalinli O (2016) Analysis of Multi-Lingual Emotion Recognition Using Auditory Attention Features. INTERSPEECHGoogle Scholar
  32. 32.
    Keyvanrad MA, and Homayounpour MM. (2014) A brief survey on deep belief networks and introducing a new object oriented toolbox (DeeBNet). arXiv preprint arXiv:1408.3264 Google Scholar
  33. 33.
    Khan A and Roy UK (2017) Emotion recognition using prosodie and spectral features of speech and Naïve Bayes Classifier. 2017 international conference on wireless communications, signal processing and networking (WiSPNET). IEEEGoogle Scholar
  34. 34.
    Kim EH, Hyun KH, Kim SH, Kwak YK (2009) Improved emotion recognition with a novel speaker-independent feature. IEEE/ASME Transactions on Mechatronics 14(3):317–325CrossRefGoogle Scholar
  35. 35.
    Li X, Li X, Zheng X, Zhang D (2010) EMD-TEO Based speech emotion recognition. Life System Modeling and Intelligent Computing. Springer Berlin Heidelberg. 180–189Google Scholar
  36. 36.
    Li Y, Chao L, Liu Y, Bao W, and Tao J (2015) From simulated speech to natural speech, what are the robust features for emotion recognition?. 2015 International Conference on Affective Computing and Intelligent Interaction (ACII). IEEEGoogle Scholar
  37. 37.
    Li P, Song Y, McLoughlin I, Guo W, and Dai L (2018) An Attention Pooling based Representation Learning Method for Speech Emotion Recognition. Proc. Interspeech (2018): 3087–3091Google Scholar
  38. 38.
    Liu Z-T, Xie Q, Wu M, Cao W-H, Mei Y, Mao J-W (2018) Speech emotion recognition based on an improved brain emotion learning model. Neurocomputing 309:145–156CrossRefGoogle Scholar
  39. 39.
    Lotfidereshgi R, and Gournay P (2017) Biologically inspired speech emotion recognition. 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEEGoogle Scholar
  40. 40.
    Luengo I, Navas E, Hernáez I (2010) Feature analysis and evaluation for automatic emotion identification in speech. IEEE Transactions on Multimedia 12(6):490–501CrossRefGoogle Scholar
  41. 41.
    Mak MW (2016) Feature Selection and Nuisance Attribute Projection for Speech Emotion Recognition, Technical Report and Lecture Note Series, Department of Electronic and Information Engineering, The Hong Kong Polytechnic UniversityGoogle Scholar
  42. 42.
    Mao Q, Dong M, Huang Z, Zhan Y (2014) Learning salient features for speech emotion recognition using convolutional neural networks. IEEE Transactions on Multimedia 16(8):2203–2213CrossRefGoogle Scholar
  43. 43.
    Mirhosseini SH, Yarmohamadi H, and Kabudian J (2014) MiGSA: A new simulated annealing algorithm with mixture distribution as generating function. 2014 4th International Conference on Computer and Knowledge Engineering (ICCKE). IEEEGoogle Scholar
  44. 44.
    Mistry K, Zhang L, Neoh SC, Lim CP, Fielding B (2016) A micro-GA embedded PSO feature selection approach to intelligent facial emotion recognition. IEEE Transactions on Cybernetics 47(6):1496–1509CrossRefGoogle Scholar
  45. 45.
    Moore E II, Clements MA, Peifer JW, Weisser L (2007) Critical analysis of the impact of glottal features in the classification of clinical depression in speech. IEEE Trans Biomed Eng 55(1):96–107CrossRefGoogle Scholar
  46. 46.
    Muthusamy H, Polat K, Yaacob S (2015) Particle swarm optimization based feature enhancement and feature selection for improved emotion recognition in speech and glottal signals. PLoS One 10(3):e0120344CrossRefGoogle Scholar
  47. 47.
    Neekabadi A, and Kabudian SJ. (2018) A New Quantum-PSO Metaheuristic and Its Application to ARMA Modeling of Speech Spectrum. 2018 4th Iranian Conference on Signal Processing and Intelligent Systems (ICSPIS). IEEEGoogle Scholar
  48. 48.
    Noroozi F, Sapiński T, Kamińska D, Anbarjafari G (2017) Vocal-based emotion recognition using random forests and decision tree. International Journal of Speech Technology 20(2):239–246CrossRefGoogle Scholar
  49. 49.
    Pant M, Thangaraj R, and Abraham A (2008) A new quantum behaved particle swarm optimization. Proceedings of the 10th annual conference on Genetic and evolutionary computation. ACMGoogle Scholar
  50. 50.
    Papakostas M, Spyrou E, Giannakopoulos T, Siantikos G, Sgouropoulos D, Mylonas P, Makedon F (2017) Deep visual attributes vs. hand-crafted audio features on multidomain speech emotion recognition. Computation 5(2):26CrossRefGoogle Scholar
  51. 51.
    Park J-S, Kim J-H, Yung-Hwan O (2009) Feature vector classification based speech emotion recognition for service robots. IEEE Trans Consum Electron 55(3):1590–1596CrossRefGoogle Scholar
  52. 52.
    Pohjalainen J and Alku P (2014) Multi-scale modulation filtering in automatic detection of emotions in telephone speech. Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on. IEEEGoogle Scholar
  53. 53.
    Satt A, Rozenberg S, and Hoory R (2017) Efficient Emotion Recognition from Speech Using Deep Learning on Spectrograms. INTERSPEECHGoogle Scholar
  54. 54.
    Sheikhan M, Bejani M, Gharavian D (2013) Modular neural-SVM scheme for speech emotion recognition using ANOVA feature selection method. Neural Comput & Applic 23(1):215–227CrossRefGoogle Scholar
  55. 55.
    Shekofteh Y, Kabudian J, Goodarzi MM, Rezaei IS (2012) Confidence measure improvement using useful predictor features and support vector machines. 20th Iranian Conference on Electrical Engineering (ICEE2012). IEEEGoogle Scholar
  56. 56.
    Shirani A, and Nilchi ARN (2016) Speech Emotion Recognition based on SVM as Both Feature Selector and Classifier. International Journal of Image, Graphics & Signal Processing 8, 4Google Scholar
  57. 57.
    Sidorov M, Minker W, Semenkin ES (2016) Speech-based emotion recognition and static speaker representation. Journal of the Siberian Federal University The series Mathematics and Physics 9(4):518–523CrossRefGoogle Scholar
  58. 58.
    Sinith MS, Aswathi E, Deepa TM, Shameema CP and Rajan S (2015) Emotion recognition from audio signals using Support Vector Machine. 2015 IEEE Recent Advances in Intelligent Computational Systems (RAICS). IEEEGoogle Scholar
  59. 59.
    Song P, Jin Y, Cheng Z, Zhao L (2015) Speech emotion recognition method based on hidden factor analysis. Electron Lett 51(1):112–114CrossRefGoogle Scholar
  60. 60.
    Stewart GW (1998) Matrix Algorithms: Volume 1: Basic Decompositions. Vol. 1. SIAMGoogle Scholar
  61. 61.
    Stuhlsatz A, Meyer C, Eyben F, Zielke T, Meier G, and Schuller B. (2011) Deep neural networks for acoustic emotion recognition: raising the benchmarks. In 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 5688–5691. IEEEGoogle Scholar
  62. 62.
    Sun Y, Wen G (2015) Emotion recognition using semi-supervised feature selection with speaker normalization. International Journal of Speech Technology 18(3):317–331CrossRefGoogle Scholar
  63. 63.
    Sun Y, Wen G (2017) Ensemble softmax regression model for speech emotion recognition. Multimed Tools Appl 76(6):8305–8328CrossRefGoogle Scholar
  64. 64.
    Sun J, Lai C-H, and Wu X-J. (2011) Particle swarm optimisation: classical and quantum perspectives. CRC PressGoogle Scholar
  65. 65.
    Tzinis E, and Potamianos A (2017) Segment-based speech emotion recognition using recurrent neural networks. 2017 Seventh International Conference on Affective Computing and Intelligent Interaction (ACII). IEEEGoogle Scholar
  66. 66.
    Tzinis E, Paraskevopoulos G, Baziotis C, and Potamianos A. (2018) Integrating Recurrence Dynamics for Speech Emotion Recognition. Proc. Interspeech (2018): 927–931Google Scholar
  67. 67.
    Vasquez-Correa JC, Arias-Vergara T, Orozco-Arroyave JR, Vargas-Bonilla JF and Noeth E (2016) Wavelet-based time-frequency representations for automatic recognition of emotions from speech. Speech Communication; 12. ITG Symposium. VDE Google Scholar
  68. 68.
    Ververidis D, Kotropoulos C (2006) Emotional speech recognition: resources, features, and methods. Speech Comm 48(9):1162–1181CrossRefGoogle Scholar
  69. 69.
    Wang K, An N, Li BN, Zhang Y, Li L (2015) Speech emotion recognition using Fourier parameters. IEEE Trans Affect Comput 6(1):69–75CrossRefGoogle Scholar
  70. 70.
    Wang S-H, Phillips P, Dong Z-C, Zhang Y-D (2018) Intelligent facial emotion recognition based on stationary wavelet entropy and Jaya algorithm. Neurocomputing 272:668–676CrossRefGoogle Scholar
  71. 71.
    Wen G, Li H, Huang J, Li D, and Xun E (2017) Random deep belief networks for recognizing emotions from speech signals. Computational intelligence and neuroscience 2017Google Scholar
  72. 72.
    Xi M, Sun J, Xu W (2008) An improved quantum-behaved particle swarm optimization algorithm with weighted mean best position. Appl Math Comput 205(2):751–759zbMATHGoogle Scholar
  73. 73.
    Xu X, Deng J, Zheng W, Zhao L, and Schuller B (2015) Dimensionality reduction for speech emotion features by multiscale kernels. Sixteenth Annual Conference of the International Speech Communication Association Google Scholar
  74. 74.
    Yang N, Yuan J, Zhou Y, Demirkol I, Duan Z, Heinzelman W, Sturge-Apple M (2017) Enhanced multiclass SVM with thresholding fusion for speech-based emotion classification. International Journal of Speech Technology 20(1):27–41CrossRefGoogle Scholar
  75. 75.
    Yapanel UH, Hansen JHL (2008) A new perceptually motivated MVDR-based acoustic front-end (PMVDR) for robust automatic speech recognition. Speech Comm 50(2):142–152CrossRefGoogle Scholar
  76. 76.
    Yogesh CK, Hariharan M, Ngadiran R, Adom AH, Yaacob S, Polat K (2017) Hybrid BBO_PSO and higher order spectral features for emotion and stress recognition from natural speech. Appl Soft Comput 56:217–232CrossRefGoogle Scholar
  77. 77.
    Yüncü E, Hacihabiboglu H, and Bozsahin C (2014) Automatic speech emotion recognition using auditory models with binary decision tree and svm. 2014 22nd International Conference on Pattern Recognition. IEEE Google Scholar
  78. 78.
    Zaidan NA, and Salam MS (2016) MFCC Global Features Selection in Improving Speech Emotion Recognition Rate. Advances in Machine Learning and Signal Processing. Springer International Publishing, 141–153Google Scholar
  79. 79.
    Zao L, Cavalcante D, Coelho R (2014) Time-frequency feature and AMS-GMM mask for acoustic emotion classification. IEEE Signal Processing Letters 21(5):620–624CrossRefGoogle Scholar
  80. 80.
    Zhang S, Zhao X, and Lei B (2013) Speech emotion recognition using an enhanced kernel isomap for human-robot interaction, Int J Adv Robot Syst, 10Google Scholar
  81. 81.
    Zhang S, Zhang S, Huang T, Gao W (2017) Speech emotion recognition using deep convolutional neural network and discriminant temporal pyramid matching. IEEE Transactions on Multimedia 20(6):1576–1590CrossRefGoogle Scholar
  82. 82.
    Zhao Z, Zhao Y, Bao Z, Wang H, Zhang Z, and Li C (2018) Deep Spectrum Feature Representations for Speech Emotion Recognition. Proceedings of the Joint Workshop of the 4th Workshop on Affective Social Multimedia Computing and first Multi-Modal Affective Computing of Large-Scale Multimedia Data. ACM Google Scholar
  83. 83.
    Zheng W, Xin M, Wang X, Wang B (2014) A novel speech emotion recognition method via incomplete sparse least square regression. IEEE Signal Processing Letters 21(5):569–572CrossRefGoogle Scholar
  84. 84.
    Zong Y, Zheng W, Zhang T, Huang X (2016) Cross-corpus speech emotion recognition based on domain-adaptive least-squares regression. IEEE Signal Processing Letters 23(5):585–589CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Computer Engineering and Information TechnologyRazi UniversityKermanshahIran

Personalised recommendations