Emotion Speech Recognition Based on Adaptive Fractional Deep Belief Network and Reinforcement Learning

Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 768)


The identification of emotion is a challenging task due to the rapid development of human–computer interaction framework. Speech Emotion Recognition (SER) can be characterized as the extraction of the emotional condition of the narrator from their spoken utterances. The detection of emotion is troublesome to the computer since it differs according to the speaker. To solve this setback, the system is implemented based on Adaptive Fractional Deep Belief Network (AFDBN) and Reinforcement Learning (RL). Pitch chroma, spectral flux, tonal power ratio and MFCC features are extracted from the speech signal to achieve the desired task. The extracted feature is then given into the classification task. Finally, the performance is analyzed by the evaluation metrics which is compared with the existing systems.


Emotion recognition Adaptive fraction deep belief network Reinforcement learning 


  1. 1.
    Mencattini, A., Martinelli, E., Costantini, G., Todisco, M., Basile, B., Bozzali, M., Di Natale, C.: Speech emotion recognition using amplitude modulation parameters. Knowl.-Based Syst. 63, 68–81 (2014)Google Scholar
  2. 2.
    Omar, M.K.: A factor analysis model of sequences for language recognition. In: Spoken Language Technology Workshop (SLT), pp. 341–347. IEEE, California (2016)Google Scholar
  3. 3.
    Lu, C.-X., Sun, Z.-Y., Shi, Z.-Z., Cao, B.-X.: Using emotions as intrinsic motivation to accelerate classic reinforcement learning. In: International Conference on Information System and Artificial Intelligence (ISAI), pp. 332–337. IEEE, China (2016)Google Scholar
  4. 4.
    Newland, E.J., Xu, S., Miranker, W.L.: A neural network-based approach to modeling the allocation of behaviors in concurrent schedule, variable interval learning. In: Fourth International Conference on Natural Computation, ICNC’08, vol. 2, pp. 245–249. IEEE, China (2008)Google Scholar
  5. 5.
    Wang, K., An, N., Li, B.N., Zhang, Y., Li, L.: Speech emotion recognition using Fourier parameters. IEEE Trans. Affect. Comput. 6(1), 69–75 (2015)CrossRefGoogle Scholar
  6. 6.
    Jang, E.-H., Park, B.-J., Kim, S.-H., Chung, M.-A., Park, M.-S., Sohn, J.-H.: Emotion classification based on bio-signals emotion recognition using machine learning algorithms. In: International Conference on Information Science, Electronics and Electrical Engineering (ISEEE), vol. 3, pp. 1373–1376. IEEE, Japan (2014)Google Scholar
  7. 7.
    Hinton, G.E., Osindero, S., Teh, Y.-W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)Google Scholar
  8. 8.
    Ghahabi, O., Hernando, J.: Deep learning backend for single and multisession i-vector speaker recognition. J. IEEE/ACM Trans. Audio Speech Lang. Process. 25(4), 807–817 (2017)Google Scholar
  9. 9.
    Cruz, F., Twiefel, J., Magg, S., Weber, C., Wermter, S.: Interactive reinforcement learning through speech guidance in a domestic scenario. In: IEEE International Joint Conference on Neural Networks (IJCNN), pp. 1341–1348, Killarney, Ireland (2015)Google Scholar
  10. 10.
    Kim, E.H., Hyun, K.H., Kim, S.H., Kwak, Y.K.: Improved emotion recognition with a novel speaker-independent feature. IEEE/ASME Trans. Mechatron. 14(3), 317–325 (2009)CrossRefGoogle Scholar
  11. 11.
    Mao, Q., Dong, M., Huang, Z., Zhan, Y.: Learning salient features for speech emotion recognition using convolutional neural networks. IEEE Trans. Multimedia 16(8), 2203–2213 (2014)CrossRefGoogle Scholar
  12. 12.
    Hoque, S., Salauddin, F., Rahman, A.: Neighbour cell list optimization based on cooperative q-learning and reinforced back-propagation technique. In: Radio Science Meeting (Joint with AP-S Symposium), 2015 USNC-URSI, pp. 215–215. IEEE, Canada (2015)Google Scholar
  13. 13.
    Gharsellaoui, S., Selouani, S.-A., Dahmane, A.O.: Automatic emotion recognition using auditory and prosodic indicative features. In: 2015 IEEE 28th Canadian Conference on, Electrical and Computer Engineering (CCECE), pp. 1265–1270. IEEE, Canada (2015)Google Scholar
  14. 14.
    Lerch, A.: An Introduction to Audio Content Analysis: Applications in Signal Processing and Music Informatics, p. 272. Wiley IEEE Press, July 2012Google Scholar
  15. 15.
    Peeters, G.: Chroma-based estimation of musical key from audio-signal analysis. In: Proceedings of the 7th International Conference on Music Information Retrieval. Victoria (BC), Canada (2006)Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  1. 1.Department of Information Technology/SOCSASTRA Deemed UniversityThanjavurIndia
  2. 2.Department of ECEUniversity College of Engineering, BIT Campus, Anna UniversityTiruchirappalliIndia

Personalised recommendations