Skip to main content
Log in

Audio-visual emotion recognition using FCBF feature selection method and particle swarm optimization for fuzzy ARTMAP neural networks

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Humans use many modalities such as face, speech and body gesture to express their feeling. So, to make emotional computers and make the human-computer interaction (HCI) more naturally and friendly, computers should be able to understand human feelings using speech and visual information. In this paper, we recognize the emotions from audio and visual information using fuzzy ARTMAP neural network (FAMNN). Audio and visual systems fuse at decision and feature levels. Finally, the particle swarm optimization (PSO) is employed to determine the optimum values of the choice parameter (α), the vigilance parameters (ρ), and the learning rate (β) of the FAMNN. Experimental results showed that the feature-level and decision-level fusions improve the outcome of unimodal systems. Also PSO improved the recognition rate. By using the PSO-optimized FAMNN at feature level fusion, the recognition rate was improved by about 57 % with respect to the audio system and by about 4.5 % with respect to the visual system. The final emotion recognition rate on the SAVEE database was reached to 98.25 % using audio and visual features by using optimized FAMNN.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. 3DMD 4D Capture System. Online: http://www.3dmd.com, accessed on 3 May, 2009

  2. Ambady N, Rosenthal R (1992) Thin slices of expressive behavior as predictors of interpersonal consequences: a meta-analysis. Psychol Bull 111(2):256–274

    Article  Google Scholar 

  3. Atashpaz-Gargari E, Lucas C, (2007) Imperialist competitive algorithm: an algorithm for optimization inspired by imperialistic competition. IEEE Congress on Evolutionary Computation 4661–4667

  4. Banda N, Robinson P (2011) Noise analysis in audio-visual emotion recognition. International Conference on Multimodal Interaction, Alicante, Spain

  5. Bejani M, Gharavian D, Moghaddam Charkari N (2012) Audiovisual emotion recognition using ANOVA feature selection method and multi-classifier neural networks. Neural Comput & Applic 24(2):399–412

    Article  Google Scholar 

  6. Boersma P, Weenink D (2007) Praat: doing phonetics by computer (version 4.6.12) [computer program]

  7. Busso C et al (2004) Analysis of emotion recognition using facial expressions, speech and multimodal information. In: Proceedings of the sixth ACM International Conference on Multimodal Interfaces (ICMI ‘04), pp 205–211

  8. Carpenter GA (2003) Default ARTMAP. In: Proceedings of the International Joint Conference on Neural Networks, Portland, Oregon, USA vol 2. pp 1396–1401

  9. Carpenter GA, Grossberg S, Markuzon N, Reynolds JH, Rosen DB (1992) Fuzzy ARTMAP: a neural network architecture for incremental supervised learning of analog multidimensional maps. IEEE Trans Neural Netw 3:698–713

    Article  Google Scholar 

  10. Chen C, Huang Y, Cook P (2005) Visual/acoustic emotion recognition. ICME 2005:1468–1471

    Google Scholar 

  11. Cheng-Yao C, Yue-Kai H, Cook P (2005) Visual/acoustic emotion recognition, pp 1468–1471

  12. Cootes TF, Edwards GJ, Taylor CJ (2001) Active appearance models. IEEE Trans Pattern Anal Mach Intell 23(6):681–685

    Article  Google Scholar 

  13. Dai W, HanD DY, Xu D (2015) Emotion recognition and affective computing on vocal social media. Inf Manag. doi:10.1016/j.im.2015.02.003

    Google Scholar 

  14. De Silva LC, Pei Chi N (2000) Bimodal emotion recognition. In: Proceedings of the Fourth IEEE International Conference on Automatic Face and Gesture Recognition, vol 1. pp 332–335

  15. Devillers L, Vidrascu L (2006) Real-life emotions detection with lexical and paralinguistic cues on human human call center dialogs. In: The proceedings of Interspeech, pp 801–804

  16. Ekman P (1971) Universals and cultural differences in facial expressions of emotion. Proc Nebr Symp Motiv 19:207–283

    Google Scholar 

  17. Ekman P, Rosenberg EL (2005) What the face reveals: Basic and applied studies of spontaneous expression using the facial action coding system. Second ed. Oxford Univ Press

  18. Fleuret F (2004) Fast binary feature selection with conditional mutual information. J Mach Learn Res 5:1531–1555

    MathSciNet  MATH  Google Scholar 

  19. Gharavian D, Sheikhan M, Nazerieh AR, Garoucy S (2011) Speech emotion recognition using FCBF feature selection method and GA-optimized fuzzy ARTMAP neural network. Neural Comput & Applic. doi:10.1007/s00521-011-0643-1

    Google Scholar 

  20. Haq S, Asif M, Ali A, Jan T, Ahmad N, Khan Y (2015) Audio-visual emotion classification using filter and wrapper feature selection approaches. Sindh Univ Res J (Sci Ser) 47(1):67–72

    Google Scholar 

  21. Haq S, Jackson PJB (2009) Speaker-dependent audio-visual emotion recognition. In: Proc. Int’l Conf. on Auditory-Visual Speech Processing, pp 53–58

  22. Harley Jason M et al (2015) A multi-componential analysis of emotions during complex learning with an intelligent multi-agent system. Comput Hum Behav 48:615–625. doi:10.1016/j.chb.2015.02.013

    Article  Google Scholar 

  23. Hassan A, Damper R (2010) Multi-class and hierarchical SVMs for emotion recognition. ISCA, INTERSPEECH, pp 2354–2357

  24. Hoch S, Althoff F, McGlaun G, Rigooll G (2005) Bimodal fusion of emotional data in an automotive environment. In: The Proceedings of the International Conference on Acoustics, Speech, and Signal Processing vol 2. pp 1085–1088

  25. Kennedy J, Eberhart R (1995) Particle swarm optimization. In: Proceedings of the IEEE International Conference on Neural Networks, Perth, Australia vol 4. pp 1942–1948

  26. Klein J, Moon Y, Picard RW (2002) This computer responds to user frustration: theory, design and results. Interact Comput 14:119–140

    Article  Google Scholar 

  27. Lee C-C, Mower E, Busso C, Lee S, Narayanan S (2009) Emotion recognition using a hierarchical binary decision tree approach. In: The proceedings of Interspeech, pp 320–323

  28. López-de-Ipiña K, Alonso-Hernández JB et al (2015) Feature selection for automatic analysis of emotional response based on nonlinear speech modeling suitable for diagnosis of Alzheimer׳s disease. Neurocomputing 150:392–401. doi:10.1016/j.neucom.2014.05.083

    Article  Google Scholar 

  29. Luxand FaceSDK 5.0.1 Face Detection and Recognition Library. online: https://www.luxand.com/facesdk/index.php

  30. Mansoorizadeh M, Moghaddam Charkari N (2009) Hybrid feature and decision level fusion of face and speech information for bimodal emotion recognition. Proceedings of the 14th International CSI Computer Conference

  31. Mansoorizadeh M, Moghaddam Charkari N (2009) Multimodal information fusion application to human emotion recognition from face and speech. Multimed Tools Appl

  32. Martin O, Kotsia I, Macq B, Pitas I (2006) The enterface’05 audio-visual emotion database. In: Proc. 22nd intl. conf. on data engineering workshops (ICDEW’06)

  33. Mehrabian A (1968) Communication without words. In: Psychology Today, vol 2. pp 53–56

  34. Mirjalili SA, Mirjalili SM, Lewis A (2014) Grey wolf optimizer. Adv Eng Softw 69:46–61. (http://www.mathworks.com/matlabcentral/fileexchange/44974-grey-wolf-optimizer--gwo-)

  35. Morrison D, Wang R, Silva D (2007) Ensemble methods for spoken emotion recognition in call-centres. Speech Comm 49(2):98–112

    Article  Google Scholar 

  36. Oudeyer P-Y (2003) The production and recognition of emotions in speech: features and algorithms. Int J Hum Comput Interact Stud 59:157–183

    Article  Google Scholar 

  37. Paleari M, Benmokhtar R, Huet B (2008) Evidence theory-based multimodal emotion recognition. InMMM’09, pp 435–446

  38. Paleari M, Huet B (2008) Toward emotion indexing of multimedia excerpts. In: CBMI

  39. Pantic M, Rothkrantz LJM (2000) Automatic analysis of facial expressions: the state of the art. IEEE Trans Pattern Anal Mach Intell 22:1424–1445

    Article  Google Scholar 

  40. Picard RW (1997) Affective computing. MIT Press

  41. Polzehl T, Sundaram S, Ketabdar H, Wagner M, Metze F (2009) Emotion classification in children’s speech using fusion of acoustic and linguistic features. In: The proceedings of Interspeech, pp 340–343

  42. Rajabioun R (2011) Cuckoo optimization algorithm. Appl Soft Comput l.11:5508–5518. (http://www.mathworks.com/matlabcentral/fileexchange/35635-cuckoo-optimization-algorithm)

  43. Schuller B, Arsic D, Rigoll G, Wimmer M, Radig B (2007) Audiovisual behavior modeling by combined feature spaces. In: ICASSP, pp 733–736

  44. Sheikhan M, Bejani M, Gharavian D (2012) Modular neural-SVM scheme for speech emotion recognition using ANOVA feature selection method. Neural Comput Appl J. doi:10.1007/s00521-012-0814-8

    Google Scholar 

  45. Shlens J (2005) A tutorial on principal component analysis. Systems Neurobiology Laboratory, Salk Institute for Biological Studies, La Jolla

    Google Scholar 

  46. Song M, You M, Li N, Chen C (2008) A robust multimodal approach for emotion recognition. Neurocomputing

  47. Weisgerber A, Vermeulen N et al (2015) Facial, vocal and musical emotion recognition is altered in paranoid schizophrenic patients. Psychiatry Res. doi:10.1016/j.psychres.2015.07.042

    Google Scholar 

  48. Zeng Z, Hu Y, Roisman GI, Wen Z, Fu Y, Huang TS (2007) Audio-visual spontaneous emotion recognition. Artif Intell Hum Comput 4451:72–90

    Article  Google Scholar 

  49. Zeng Z, Pantic M, Roisman GI, Huang TS (2009) A survey of affect recognition methods: audio, visual, and spontaneous expressions. PAMI 31:39–58

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported by Islamic Azad University-South Tehran Branch under a research project entitled “Audio-Visual Emotion Modeling to Improve human–computer interaction”.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Davood Gharavian.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gharavian, D., Bejani, M. & Sheikhan, M. Audio-visual emotion recognition using FCBF feature selection method and particle swarm optimization for fuzzy ARTMAP neural networks. Multimed Tools Appl 76, 2331–2352 (2017). https://doi.org/10.1007/s11042-015-3180-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-015-3180-6

Keywords

Navigation