Multimedia Tools and Applications

, Volume 76, Issue 2, pp 2331–2352 | Cite as

Audio-visual emotion recognition using FCBF feature selection method and particle swarm optimization for fuzzy ARTMAP neural networks

  • Davood GharavianEmail author
  • Mehdi Bejani
  • Mansour Sheikhan


Humans use many modalities such as face, speech and body gesture to express their feeling. So, to make emotional computers and make the human-computer interaction (HCI) more naturally and friendly, computers should be able to understand human feelings using speech and visual information. In this paper, we recognize the emotions from audio and visual information using fuzzy ARTMAP neural network (FAMNN). Audio and visual systems fuse at decision and feature levels. Finally, the particle swarm optimization (PSO) is employed to determine the optimum values of the choice parameter (α), the vigilance parameters (ρ), and the learning rate (β) of the FAMNN. Experimental results showed that the feature-level and decision-level fusions improve the outcome of unimodal systems. Also PSO improved the recognition rate. By using the PSO-optimized FAMNN at feature level fusion, the recognition rate was improved by about 57 % with respect to the audio system and by about 4.5 % with respect to the visual system. The final emotion recognition rate on the SAVEE database was reached to 98.25 % using audio and visual features by using optimized FAMNN.


Audio-visual emotion recognition Particle swarm optimization Fuzzy ARTMAP neural network 



This work was supported by Islamic Azad University-South Tehran Branch under a research project entitled “Audio-Visual Emotion Modeling to Improve human–computer interaction”.


  1. 1.
    3DMD 4D Capture System. Online:, accessed on 3 May, 2009
  2. 2.
    Ambady N, Rosenthal R (1992) Thin slices of expressive behavior as predictors of interpersonal consequences: a meta-analysis. Psychol Bull 111(2):256–274CrossRefGoogle Scholar
  3. 3.
    Atashpaz-Gargari E, Lucas C, (2007) Imperialist competitive algorithm: an algorithm for optimization inspired by imperialistic competition. IEEE Congress on Evolutionary Computation 4661–4667Google Scholar
  4. 4.
    Banda N, Robinson P (2011) Noise analysis in audio-visual emotion recognition. International Conference on Multimodal Interaction, Alicante, SpainGoogle Scholar
  5. 5.
    Bejani M, Gharavian D, Moghaddam Charkari N (2012) Audiovisual emotion recognition using ANOVA feature selection method and multi-classifier neural networks. Neural Comput & Applic 24(2):399–412CrossRefGoogle Scholar
  6. 6.
    Boersma P, Weenink D (2007) Praat: doing phonetics by computer (version 4.6.12) [computer program]Google Scholar
  7. 7.
    Busso C et al (2004) Analysis of emotion recognition using facial expressions, speech and multimodal information. In: Proceedings of the sixth ACM International Conference on Multimodal Interfaces (ICMI ‘04), pp 205–211Google Scholar
  8. 8.
    Carpenter GA (2003) Default ARTMAP. In: Proceedings of the International Joint Conference on Neural Networks, Portland, Oregon, USA vol 2. pp 1396–1401Google Scholar
  9. 9.
    Carpenter GA, Grossberg S, Markuzon N, Reynolds JH, Rosen DB (1992) Fuzzy ARTMAP: a neural network architecture for incremental supervised learning of analog multidimensional maps. IEEE Trans Neural Netw 3:698–713CrossRefGoogle Scholar
  10. 10.
    Chen C, Huang Y, Cook P (2005) Visual/acoustic emotion recognition. ICME 2005:1468–1471Google Scholar
  11. 11.
    Cheng-Yao C, Yue-Kai H, Cook P (2005) Visual/acoustic emotion recognition, pp 1468–1471Google Scholar
  12. 12.
    Cootes TF, Edwards GJ, Taylor CJ (2001) Active appearance models. IEEE Trans Pattern Anal Mach Intell 23(6):681–685CrossRefGoogle Scholar
  13. 13.
    Dai W, HanD DY, Xu D (2015) Emotion recognition and affective computing on vocal social media. Inf Manag. doi: 10.1016/ Google Scholar
  14. 14.
    De Silva LC, Pei Chi N (2000) Bimodal emotion recognition. In: Proceedings of the Fourth IEEE International Conference on Automatic Face and Gesture Recognition, vol 1. pp 332–335Google Scholar
  15. 15.
    Devillers L, Vidrascu L (2006) Real-life emotions detection with lexical and paralinguistic cues on human human call center dialogs. In: The proceedings of Interspeech, pp 801–804Google Scholar
  16. 16.
    Ekman P (1971) Universals and cultural differences in facial expressions of emotion. Proc Nebr Symp Motiv 19:207–283Google Scholar
  17. 17.
    Ekman P, Rosenberg EL (2005) What the face reveals: Basic and applied studies of spontaneous expression using the facial action coding system. Second ed. Oxford Univ PressGoogle Scholar
  18. 18.
    Fleuret F (2004) Fast binary feature selection with conditional mutual information. J Mach Learn Res 5:1531–1555MathSciNetzbMATHGoogle Scholar
  19. 19.
    Gharavian D, Sheikhan M, Nazerieh AR, Garoucy S (2011) Speech emotion recognition using FCBF feature selection method and GA-optimized fuzzy ARTMAP neural network. Neural Comput & Applic. doi: 10.1007/s00521-011-0643-1 Google Scholar
  20. 20.
    Haq S, Asif M, Ali A, Jan T, Ahmad N, Khan Y (2015) Audio-visual emotion classification using filter and wrapper feature selection approaches. Sindh Univ Res J (Sci Ser) 47(1):67–72Google Scholar
  21. 21.
    Haq S, Jackson PJB (2009) Speaker-dependent audio-visual emotion recognition. In: Proc. Int’l Conf. on Auditory-Visual Speech Processing, pp 53–58Google Scholar
  22. 22.
    Harley Jason M et al (2015) A multi-componential analysis of emotions during complex learning with an intelligent multi-agent system. Comput Hum Behav 48:615–625. doi: 10.1016/j.chb.2015.02.013 CrossRefGoogle Scholar
  23. 23.
    Hassan A, Damper R (2010) Multi-class and hierarchical SVMs for emotion recognition. ISCA, INTERSPEECH, pp 2354–2357Google Scholar
  24. 24.
    Hoch S, Althoff F, McGlaun G, Rigooll G (2005) Bimodal fusion of emotional data in an automotive environment. In: The Proceedings of the International Conference on Acoustics, Speech, and Signal Processing vol 2. pp 1085–1088Google Scholar
  25. 25.
    Kennedy J, Eberhart R (1995) Particle swarm optimization. In: Proceedings of the IEEE International Conference on Neural Networks, Perth, Australia vol 4. pp 1942–1948Google Scholar
  26. 26.
    Klein J, Moon Y, Picard RW (2002) This computer responds to user frustration: theory, design and results. Interact Comput 14:119–140CrossRefGoogle Scholar
  27. 27.
    Lee C-C, Mower E, Busso C, Lee S, Narayanan S (2009) Emotion recognition using a hierarchical binary decision tree approach. In: The proceedings of Interspeech, pp 320–323Google Scholar
  28. 28.
    López-de-Ipiña K, Alonso-Hernández JB et al (2015) Feature selection for automatic analysis of emotional response based on nonlinear speech modeling suitable for diagnosis of Alzheimer׳s disease. Neurocomputing 150:392–401. doi: 10.1016/j.neucom.2014.05.083 CrossRefGoogle Scholar
  29. 29.
    Luxand FaceSDK 5.0.1 Face Detection and Recognition Library. online:
  30. 30.
    Mansoorizadeh M, Moghaddam Charkari N (2009) Hybrid feature and decision level fusion of face and speech information for bimodal emotion recognition. Proceedings of the 14th International CSI Computer ConferenceGoogle Scholar
  31. 31.
    Mansoorizadeh M, Moghaddam Charkari N (2009) Multimodal information fusion application to human emotion recognition from face and speech. Multimed Tools ApplGoogle Scholar
  32. 32.
    Martin O, Kotsia I, Macq B, Pitas I (2006) The enterface’05 audio-visual emotion database. In: Proc. 22nd intl. conf. on data engineering workshops (ICDEW’06)Google Scholar
  33. 33.
    Mehrabian A (1968) Communication without words. In: Psychology Today, vol 2. pp 53–56Google Scholar
  34. 34.
    Mirjalili SA, Mirjalili SM, Lewis A (2014) Grey wolf optimizer. Adv Eng Softw 69:46–61. (
  35. 35.
    Morrison D, Wang R, Silva D (2007) Ensemble methods for spoken emotion recognition in call-centres. Speech Comm 49(2):98–112CrossRefGoogle Scholar
  36. 36.
    Oudeyer P-Y (2003) The production and recognition of emotions in speech: features and algorithms. Int J Hum Comput Interact Stud 59:157–183CrossRefGoogle Scholar
  37. 37.
    Paleari M, Benmokhtar R, Huet B (2008) Evidence theory-based multimodal emotion recognition. InMMM’09, pp 435–446Google Scholar
  38. 38.
    Paleari M, Huet B (2008) Toward emotion indexing of multimedia excerpts. In: CBMIGoogle Scholar
  39. 39.
    Pantic M, Rothkrantz LJM (2000) Automatic analysis of facial expressions: the state of the art. IEEE Trans Pattern Anal Mach Intell 22:1424–1445CrossRefGoogle Scholar
  40. 40.
    Picard RW (1997) Affective computing. MIT PressGoogle Scholar
  41. 41.
    Polzehl T, Sundaram S, Ketabdar H, Wagner M, Metze F (2009) Emotion classification in children’s speech using fusion of acoustic and linguistic features. In: The proceedings of Interspeech, pp 340–343Google Scholar
  42. 42.
    Rajabioun R (2011) Cuckoo optimization algorithm. Appl Soft Comput l.11:5508–5518. (
  43. 43.
    Schuller B, Arsic D, Rigoll G, Wimmer M, Radig B (2007) Audiovisual behavior modeling by combined feature spaces. In: ICASSP, pp 733–736Google Scholar
  44. 44.
    Sheikhan M, Bejani M, Gharavian D (2012) Modular neural-SVM scheme for speech emotion recognition using ANOVA feature selection method. Neural Comput Appl J. doi: 10.1007/s00521-012-0814-8 Google Scholar
  45. 45.
    Shlens J (2005) A tutorial on principal component analysis. Systems Neurobiology Laboratory, Salk Institute for Biological Studies, La JollaGoogle Scholar
  46. 46.
    Song M, You M, Li N, Chen C (2008) A robust multimodal approach for emotion recognition. NeurocomputingGoogle Scholar
  47. 47.
    Weisgerber A, Vermeulen N et al (2015) Facial, vocal and musical emotion recognition is altered in paranoid schizophrenic patients. Psychiatry Res. doi: 10.1016/j.psychres.2015.07.042 Google Scholar
  48. 48.
    Zeng Z, Hu Y, Roisman GI, Wen Z, Fu Y, Huang TS (2007) Audio-visual spontaneous emotion recognition. Artif Intell Hum Comput 4451:72–90CrossRefGoogle Scholar
  49. 49.
    Zeng Z, Pantic M, Roisman GI, Huang TS (2009) A survey of affect recognition methods: audio, visual, and spontaneous expressions. PAMI 31:39–58CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  • Davood Gharavian
    • 1
    • 2
    Email author
  • Mehdi Bejani
    • 3
  • Mansour Sheikhan
    • 1
  1. 1.Department of Electrical EngineeringIslamic Azad UniversityTehranIran
  2. 2.Department of Electrical EngineeringShahid Beheshti UniversityTehranIran
  3. 3.Islamic Azad UniversityTehranIran

Personalised recommendations