Abstract
Humans use many modalities such as face, speech and body gesture to express their feeling. So, to make emotional computers and make the human-computer interaction (HCI) more naturally and friendly, computers should be able to understand human feelings using speech and visual information. In this paper, we recognize the emotions from audio and visual information using fuzzy ARTMAP neural network (FAMNN). Audio and visual systems fuse at decision and feature levels. Finally, the particle swarm optimization (PSO) is employed to determine the optimum values of the choice parameter (α), the vigilance parameters (ρ), and the learning rate (β) of the FAMNN. Experimental results showed that the feature-level and decision-level fusions improve the outcome of unimodal systems. Also PSO improved the recognition rate. By using the PSO-optimized FAMNN at feature level fusion, the recognition rate was improved by about 57 % with respect to the audio system and by about 4.5 % with respect to the visual system. The final emotion recognition rate on the SAVEE database was reached to 98.25 % using audio and visual features by using optimized FAMNN.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
3DMD 4D Capture System. Online: http://www.3dmd.com, accessed on 3 May, 2009
Ambady N, Rosenthal R (1992) Thin slices of expressive behavior as predictors of interpersonal consequences: a meta-analysis. Psychol Bull 111(2):256–274
Atashpaz-Gargari E, Lucas C, (2007) Imperialist competitive algorithm: an algorithm for optimization inspired by imperialistic competition. IEEE Congress on Evolutionary Computation 4661–4667
Banda N, Robinson P (2011) Noise analysis in audio-visual emotion recognition. International Conference on Multimodal Interaction, Alicante, Spain
Bejani M, Gharavian D, Moghaddam Charkari N (2012) Audiovisual emotion recognition using ANOVA feature selection method and multi-classifier neural networks. Neural Comput & Applic 24(2):399–412
Boersma P, Weenink D (2007) Praat: doing phonetics by computer (version 4.6.12) [computer program]
Busso C et al (2004) Analysis of emotion recognition using facial expressions, speech and multimodal information. In: Proceedings of the sixth ACM International Conference on Multimodal Interfaces (ICMI ‘04), pp 205–211
Carpenter GA (2003) Default ARTMAP. In: Proceedings of the International Joint Conference on Neural Networks, Portland, Oregon, USA vol 2. pp 1396–1401
Carpenter GA, Grossberg S, Markuzon N, Reynolds JH, Rosen DB (1992) Fuzzy ARTMAP: a neural network architecture for incremental supervised learning of analog multidimensional maps. IEEE Trans Neural Netw 3:698–713
Chen C, Huang Y, Cook P (2005) Visual/acoustic emotion recognition. ICME 2005:1468–1471
Cheng-Yao C, Yue-Kai H, Cook P (2005) Visual/acoustic emotion recognition, pp 1468–1471
Cootes TF, Edwards GJ, Taylor CJ (2001) Active appearance models. IEEE Trans Pattern Anal Mach Intell 23(6):681–685
Dai W, HanD DY, Xu D (2015) Emotion recognition and affective computing on vocal social media. Inf Manag. doi:10.1016/j.im.2015.02.003
De Silva LC, Pei Chi N (2000) Bimodal emotion recognition. In: Proceedings of the Fourth IEEE International Conference on Automatic Face and Gesture Recognition, vol 1. pp 332–335
Devillers L, Vidrascu L (2006) Real-life emotions detection with lexical and paralinguistic cues on human human call center dialogs. In: The proceedings of Interspeech, pp 801–804
Ekman P (1971) Universals and cultural differences in facial expressions of emotion. Proc Nebr Symp Motiv 19:207–283
Ekman P, Rosenberg EL (2005) What the face reveals: Basic and applied studies of spontaneous expression using the facial action coding system. Second ed. Oxford Univ Press
Fleuret F (2004) Fast binary feature selection with conditional mutual information. J Mach Learn Res 5:1531–1555
Gharavian D, Sheikhan M, Nazerieh AR, Garoucy S (2011) Speech emotion recognition using FCBF feature selection method and GA-optimized fuzzy ARTMAP neural network. Neural Comput & Applic. doi:10.1007/s00521-011-0643-1
Haq S, Asif M, Ali A, Jan T, Ahmad N, Khan Y (2015) Audio-visual emotion classification using filter and wrapper feature selection approaches. Sindh Univ Res J (Sci Ser) 47(1):67–72
Haq S, Jackson PJB (2009) Speaker-dependent audio-visual emotion recognition. In: Proc. Int’l Conf. on Auditory-Visual Speech Processing, pp 53–58
Harley Jason M et al (2015) A multi-componential analysis of emotions during complex learning with an intelligent multi-agent system. Comput Hum Behav 48:615–625. doi:10.1016/j.chb.2015.02.013
Hassan A, Damper R (2010) Multi-class and hierarchical SVMs for emotion recognition. ISCA, INTERSPEECH, pp 2354–2357
Hoch S, Althoff F, McGlaun G, Rigooll G (2005) Bimodal fusion of emotional data in an automotive environment. In: The Proceedings of the International Conference on Acoustics, Speech, and Signal Processing vol 2. pp 1085–1088
Kennedy J, Eberhart R (1995) Particle swarm optimization. In: Proceedings of the IEEE International Conference on Neural Networks, Perth, Australia vol 4. pp 1942–1948
Klein J, Moon Y, Picard RW (2002) This computer responds to user frustration: theory, design and results. Interact Comput 14:119–140
Lee C-C, Mower E, Busso C, Lee S, Narayanan S (2009) Emotion recognition using a hierarchical binary decision tree approach. In: The proceedings of Interspeech, pp 320–323
López-de-Ipiña K, Alonso-Hernández JB et al (2015) Feature selection for automatic analysis of emotional response based on nonlinear speech modeling suitable for diagnosis of Alzheimer׳s disease. Neurocomputing 150:392–401. doi:10.1016/j.neucom.2014.05.083
Luxand FaceSDK 5.0.1 Face Detection and Recognition Library. online: https://www.luxand.com/facesdk/index.php
Mansoorizadeh M, Moghaddam Charkari N (2009) Hybrid feature and decision level fusion of face and speech information for bimodal emotion recognition. Proceedings of the 14th International CSI Computer Conference
Mansoorizadeh M, Moghaddam Charkari N (2009) Multimodal information fusion application to human emotion recognition from face and speech. Multimed Tools Appl
Martin O, Kotsia I, Macq B, Pitas I (2006) The enterface’05 audio-visual emotion database. In: Proc. 22nd intl. conf. on data engineering workshops (ICDEW’06)
Mehrabian A (1968) Communication without words. In: Psychology Today, vol 2. pp 53–56
Mirjalili SA, Mirjalili SM, Lewis A (2014) Grey wolf optimizer. Adv Eng Softw 69:46–61. (http://www.mathworks.com/matlabcentral/fileexchange/44974-grey-wolf-optimizer--gwo-)
Morrison D, Wang R, Silva D (2007) Ensemble methods for spoken emotion recognition in call-centres. Speech Comm 49(2):98–112
Oudeyer P-Y (2003) The production and recognition of emotions in speech: features and algorithms. Int J Hum Comput Interact Stud 59:157–183
Paleari M, Benmokhtar R, Huet B (2008) Evidence theory-based multimodal emotion recognition. InMMM’09, pp 435–446
Paleari M, Huet B (2008) Toward emotion indexing of multimedia excerpts. In: CBMI
Pantic M, Rothkrantz LJM (2000) Automatic analysis of facial expressions: the state of the art. IEEE Trans Pattern Anal Mach Intell 22:1424–1445
Picard RW (1997) Affective computing. MIT Press
Polzehl T, Sundaram S, Ketabdar H, Wagner M, Metze F (2009) Emotion classification in children’s speech using fusion of acoustic and linguistic features. In: The proceedings of Interspeech, pp 340–343
Rajabioun R (2011) Cuckoo optimization algorithm. Appl Soft Comput l.11:5508–5518. (http://www.mathworks.com/matlabcentral/fileexchange/35635-cuckoo-optimization-algorithm)
Schuller B, Arsic D, Rigoll G, Wimmer M, Radig B (2007) Audiovisual behavior modeling by combined feature spaces. In: ICASSP, pp 733–736
Sheikhan M, Bejani M, Gharavian D (2012) Modular neural-SVM scheme for speech emotion recognition using ANOVA feature selection method. Neural Comput Appl J. doi:10.1007/s00521-012-0814-8
Shlens J (2005) A tutorial on principal component analysis. Systems Neurobiology Laboratory, Salk Institute for Biological Studies, La Jolla
Song M, You M, Li N, Chen C (2008) A robust multimodal approach for emotion recognition. Neurocomputing
Weisgerber A, Vermeulen N et al (2015) Facial, vocal and musical emotion recognition is altered in paranoid schizophrenic patients. Psychiatry Res. doi:10.1016/j.psychres.2015.07.042
Zeng Z, Hu Y, Roisman GI, Wen Z, Fu Y, Huang TS (2007) Audio-visual spontaneous emotion recognition. Artif Intell Hum Comput 4451:72–90
Zeng Z, Pantic M, Roisman GI, Huang TS (2009) A survey of affect recognition methods: audio, visual, and spontaneous expressions. PAMI 31:39–58
Acknowledgments
This work was supported by Islamic Azad University-South Tehran Branch under a research project entitled “Audio-Visual Emotion Modeling to Improve human–computer interaction”.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Gharavian, D., Bejani, M. & Sheikhan, M. Audio-visual emotion recognition using FCBF feature selection method and particle swarm optimization for fuzzy ARTMAP neural networks. Multimed Tools Appl 76, 2331–2352 (2017). https://doi.org/10.1007/s11042-015-3180-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-015-3180-6