Abstract
This paper proposes a speech emotion recognition technique based on Optimized Deep Neural Network. The speech signals are denoised by presenting a novel adaptive wavelet transform with a modified galactic swarm optimization algorithm (AWT_MGSO). From the noise removed speech signals, the spectral features like LPC (Linear Prediction Coefficients), MFCC (Mel frequency cepstral coefficients), PSD (power spectral density) and prosodic features like energy, entropy, formant frequencies and pitch are extracted and certain features are selected by ASFO (Adaptive Sunflower Optimization Algorithm). The optimized DNN-DHO (Deep Neural Network with Deer Hunting Optimization Algorithm) is proposed for emotion classification. An enhanced squirrel search algorithm is proposed to update the weight in the optimized DNN_DHO classifier. In this study, all the eight emotions of the speech from RAVDESS (Ryerson Audio-Visual Database of Emotional Speech and Song) and TESS (Toronto Emotional Speech Set) databases for English and IITKGP-SEHSC (Indian Institute of Technology Kharagpur Simulated Emotion Hindi Speech Corpus) database for Hindi are classified. The experimental results are obtained and compared with the classifiers such as DNN_DHO, DNN (Deep Neural Network) and DAE (Deep Auto Encoder). The experimental results show that the proposed algorithm obtains maximum accuracy as 97.85% by the TESS dataset, 97.14% by the RAVDESS dataset and 93.75% by the IITKGP-SEHSC dataset by the DNN-HHO classifier.














References
Al-Anzi F, Zeina DA (2018) Literature survey of Arabic speech recognition. In: 2018 International Conference on Computing Sciences and Engineering (ICCSE), Kuwait City, pp 1–6. https://doi.org/10.1109/ICCSE1.2018.8374215
Arafa MN, Elbarougy R, Ewees AA, Behery GM (2018) A dataset for speech recognition to support Arabic phoneme pronunciation. Int J Image Graph Signal Proc 10(4):31
Arora V, Lahiri A, Reetz H (2018) Phonological feature-based speech recognition system for pronunciation training in non-native language learning. J Acoust Soc Am 143(1):98–108
Awan SK, Dunoyer EJ, Genuario KE, Levy AC, O'Connor KP, Serhatli S, Gerling GJ (2018) Using voice recognition enabled smartwatches to improve nurse documentation. In: 2018 Systems and Information Engineering Design Symposium (SIEDS), Charlottesville, VA, pp 159–164. https://doi.org/10.1109/SIEDS.2018.8374728
Badshah MA, Rahim N, Ullah N, Ahmad J, Muhammad K, Lee MY, Kwon S, Baik SW (2019) Deep features-based speech emotion recognition for smart effective services. Multimed Tools Appl 78(5):5571–5589
Barker J, Watanabe S, Vincent E, Trmal J (2018) The fifth 'CHiME' speech separation and recognition challenge: dataset, task and baselines. arXiv preprint arXiv:1803.10609
Bernal E, Castillo O, Soria J, Valdez F (2018) Galactic swarm optimization with adaptation of parameters using fuzzy logic for the optimization of mathematical functions. In: Fuzzy Logic Augmentation of Neural and Optimization Algorithms: Theoretical Aspects and Real Applications. Springer, Cham, vol. 749, no. 1, pp 131–140
Bhavan A, Chauhan P, Shah RR (2019) Bagged support vector machines for emotion recognition from speech. Knowl-Based Syst 184:104886
Brammya G, Praveena S, Ninu Preetha NS, Ramya R, Rajakumar BR, Binu D (2019) Deer hunting optimization algorithm: a new nature-inspired meta-heuristic paradigm. Comput J
Daqrouq K, Tutunji TA (2015) Speaker identification using vowels features through a combined method of formants, wavelets, and neural network classifiers. Appl Soft Comput 27:231–239
Darabkh KA, Haddad L, Sweidan SZ, Hawa M, Saifan R, Alnabelsi SH (2018) An efficient speech recognition system for arm-disabled students based on isolated words. Comput Appl Eng Educ 26(2):285–301
Gardini S (2018) Data preparation and improvement of NLP software modules for parametric speech synthesis
Gomes GF, da Cunha SS, Ancelotti AC (2019) A sunflower optimization (SFO) algorithm applied to damage identification on laminated composite plates. Eng Comput 35(2):619–626
Gong N, Idé T, Kim S, Boybat I, Sebastian A, Narayanan V, Ando T (2018) Signal and noise extraction from analog memory elements for neuromorphic computing. Nat Commun 9(1):2102
Gupta D, Bansal P, Choudhary K (2018) The state of the art of feature extraction techniques in speech recognition. In: Speech and Language Processing for Human-Machine Communications. Springer, Singapore, vol. 2, no. 1, pp 195–207
Hamsa S, Shahin I, Iraqi Y, Werghi N (2020) Emotion recognition from speech using wavelet packet transform Cochlear filter Bank and random Forest classifier. IEEE Access 8:96994–97006. https://doi.org/10.1109/ACCESS.2020.2991811
Haridas VA, Marimuthu R, Sivakumar VG (2018) A critical review and analysis of techniques of speech recognition: the road ahead. Int J Knowl-Based Intell Eng Syst 22(1):39–57
Huang CZ, Epps J (2018) An investigation of partition-based and phonetically-aware acoustic features for continuous emotion prediction from speech. IEEE Trans Affect Comput 1:1–11
Karle KN, Ethofer T, Jacob H, Brück C, Ml E, Lotze M, Nizielski S, Schütz A, Wildgruber D, Kreifelts B (2018) Neurobiological correlates of emotional intelligence in voice and face perception networks. Soc Cogn Affect Neurosci 13(2):233–244
Koolagudi GS, Reddy R, Yadav J, Rao KS (2011) IITKGP-SEHSC: Hindi speech corpus for emotion analysis. In devices and communications (ICDeCom), 2011 international conference on IEEE 1-5
Kwon S (2020) A CNN-assisted enhanced audio signal processing for speech emotion recognition. Sensors 20(1):183
Latif S, Rana R, Khalifa S, Jurdak R, Epps J, Schuller BW (2020) Multi-task semi-supervised adversarial autoencoding for speech emotion recognition. IEEE Trans Affect Comput
Liu J-C, Leu F-Y, Lin G-L, Susanto H (2018) An MFCC-based text-independent speaker identification system for access control. Concurr Comput Pract Exp 30(2):e4255
Livingstone SR, Russo FA (2018) The Ryerson audio-visual database of emotional speech and Song (RAVDESS): a dynamic, multimodal set of facial and vocal expressions in north American English. PLoS One 13(5):e0196391
Mannepalli K, Sastry PN, Suman M (2018) Analysis of emotion recognition system for Telugu using prosodic and formant features. In Speech and Language Processing for Human-Machine Communications. Springer, Singapore, pp 137–144
Mirzaei SM, Meshgi K, Kawahara T (2018) Exploiting automatic speech recognition errors to enhance partial and synchronized caption for facilitating second language listening. Comput Speech Lang 49:17–36
Moro-Velázquez L, Gómez-García JA, Godino-Llorente JI, Villalba J, Orozco-Arroyave JR, Dehak N (2018) Analysis of speaker recognition methodologies and the influence of kinetic changes to automatically detect Parkinson's disease. Appl Soft Comput 62:649–666
Özseven T (2019) A novel feature selection method for speech emotion recognition. Appl Acoust 146:320–326
Patel P, Chaudhari A, Kale R, Pund M (2017) Emotion recognition from speech with gaussian mixture models & via boosted GMM. Int J Res Sci Eng 3(2):47–53
Price M, Glass J, Chandrakasan AP (2018) A low-power speech recognizer and voice activity detector using deep neural networks. IEEE J Solid State Circuits 53:66–75
Sajjad M, Kwon S (2020) Clustering-based speech emotion recognition by incorporating learned features and deep BiLSTM. IEEE Access 8:79861–79875
Song P, Zheng W (2018) Feature selection based transfer subspace learning for speech emotion recognition. IEEE Trans Affect Comput 11:373–382
Vryzas N, Vrysis L, Matsiola M, Kotsakis R, Dimoulas C, Kalliris G (2020) Continuous speech emotion recognition with convolutional neural networks. J Audio Eng Soc 68(1/2):14–24
Wang W-C, Pestana MH and Moutinho L (2018) The effect of emotions on brand recall by gender using voice emotion response with optimal data analysis. In: Innovative research methodologies in management. Palgrave Macmillan, Cham, pp 103–133
Wei P, Zhao Y (2019) A novel speech emotion recognition algorithm based on wavelet kernel sparse classifier in stacked deep auto-encoder model. Pers Ubiquit Comput 23(3–4):521–529
Zhang Y, Zhang E, Chen W (2016) Deep neural network for halftone image classification based on sparse auto-encoder. Eng Appl Artif Intell 50:245–255
Zhao J, Mao X, Chen L (2019) Speech emotion recognition using deep 1D & 2D CNN LSTM networks. Biomed Signal Proc Control 47:312–323
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Agarwal, G., Om, H. Performance of deer hunting optimization based deep learning algorithm for speech emotion recognition. Multimed Tools Appl 80, 9961–9992 (2021). https://doi.org/10.1007/s11042-020-10118-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-020-10118-x