Design of a hierarchy modular neural network and its application in multimodal emotion recognition

  • Wenjing LiEmail author
  • Minghui Chu
  • Junfei Qiao
Methodologies and Application


Achievement of the fusion for different modalities is a critical issue for multimodal emotion recognition. Feature-level fusion methods cannot deal with missing or corrupted data, while decision-level fusion methods may lose the correlation information between different modalities. To solve the above problems, a hierarchy modular neural network (HMNN) is proposed and is applied for multimodal emotion recognition. First, an HMNN is constructed to mimic the hierarchy modular architecture as demonstrated in the human brain. Each module contains several submodules dealing with features from different modalities. Connections are built between submodules within the same module and between corresponding submodules from different modules. Then, a learning algorithm based on Hebbian learning is used to train the connection weights in HMNN, which simulates the learning mechanism of the human brain. HMNN recognizes the label based on the activity level of each module and adopts the winner-take-all strategy. Finally, the proposed HMNN is applied on a public dataset for multimodal emotion recognition. Experimental results show that the proposed HMNN improves the recognition results, when compared with other decision-fusion methods, including support vector machine, as well as neural networks such as back-propagation and radial basis function neural networks. Furthermore, the inter-submodule connections in one module realizes information integration from different modalities and improves the performance of HMNN. Besides, the experiments suggest the effectiveness of HMNN on dealing with missing/corrupted data.


Hierarchy modular neural network (HMNN) Inter-submodule connections Hebbian learning rule Multimodal emotion recognition 



This work was supported by the National Natural Science Foundation of China (No. 61603009); Beijing Natural Science Foundation (No. 4182007); the Beijing Municipal Education Commission Foundation (No. KM201910005023); the Key Project of National Natural Science Foundation of China (No. 61533002); and Rixin Scientist” Foundation of Beijing University of Technology (No. 2017-RX(1)-04).

Compliance with ethical standards

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.


  1. Abadi MK, Subramanian R, Kia SM, Avesani P, Patras I, Sebe N (2015) DECAF: MEG-based multimodal database for decoding affective physiological responses. IEEE Trans Affect Comput 6(3):209–222CrossRefGoogle Scholar
  2. Ali M, Sarwar A, Sharma V, Suri J (2017) Artificial neural network based screening of cervical cancer using a hierarchical modular neural network architecture (HMNNA) and novel benchmark uterine cervix cancer database. Neural Comput Appl 4:1–15Google Scholar
  3. Bejani M, Gharavian D, Charkari NM (2014) Audiovisual emotion recognition using ANOVA feature selection method and multi-classifier neural networks. Neural Comput Appl 24(2):399–412CrossRefGoogle Scholar
  4. Bertolero MA, Yeo BT, D’Esposito M (2015) The modular and integrative functional architecture of the human brain. Proc Natl Acad Sci USA 112(49):e6798CrossRefGoogle Scholar
  5. Bhattacharya A, Choudhury D, Dey D (2018) Edge-enhanced bi-dimensional empirical mode decomposition-based emotion recognition using fusion of feature set. Soft Comput 22(3):889–903CrossRefGoogle Scholar
  6. Bliss TVP, Collingridge GL (1993) A synaptic model of memory: long-term potentiation in the hippocampus. Nature 361(6407):31–39CrossRefGoogle Scholar
  7. Chanel G, Kierkels JJM, Soleymani M, Pun T (2009) Short-term emotion assessment in a recall paradigm. Int J Hum Comput Stud 67(8):607–627CrossRefGoogle Scholar
  8. Chen J, Hu B, Xu L, Moore P, Su Y (2015) Feature-level fusion of multimodal physiological signals for emotion recognition. In: The IEEE international conference on bioinformatics and biomedicine, pp 395–399Google Scholar
  9. Chen ZJ, He Y, Rosa-Neto P, Germann J, Evans AC (2008) Revealing modular architecture of human brain structural networks by using cortical thickness from MRI. Cereb Cortex 18(10):2374–2381CrossRefGoogle Scholar
  10. Chetouani M, Mahdhaoui A, Ringeval F (2009) Time-scale feature extractions for emotional speech characterization. Cognit Comput 1(2):194–201CrossRefGoogle Scholar
  11. Derrac J, García S, Molina D, Herrera F (2011) A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms. Swarm Evolut Comput 1(1):3–18CrossRefGoogle Scholar
  12. Fan GF, Peng LL, Hong WC (2018) Short term load forecasting based on phase space reconstruction algorithm and bi-square kernel regression model. Appl Energy 224:13–33CrossRefGoogle Scholar
  13. Goltsev A (2004) Secondary learning in the assembly neural network. Neurocomputing 62(3):405–426CrossRefGoogle Scholar
  14. Goltsev A, Gritsenko V (2009) Modular neural networks with Hebbian learning rule. Neurocomputing 72(10):2477–2482CrossRefGoogle Scholar
  15. Gonalves VP, Giancristofaro GT, Filho GPR, Johnson T, Carvalho V, Pessin G, Neris VPDA, Ueyama J (2017) Assessing users’ emotion at interaction time: a multimodal approach with multiple sensors. Soft Comput 21(18):5309–5323CrossRefGoogle Scholar
  16. He Y, Wang J, Wang L, Chen ZJ, Yan C, Yang H, Tang H, Zhu C, Gong Q, Zang Y, Evans AC (2009) Uncovering intrinsic modular organization of spontaneous brain activity in humans. PLoS ONE 4(4):e5226CrossRefGoogle Scholar
  17. Hilgetag CC, Hütt MT (2014) Hierarchical modular brain connectivity is a stretch for criticality. Trends Cognit Sci 18(3):114–115CrossRefGoogle Scholar
  18. Hirsch JC, Barrionuevo G, Crepel F (1992) Homo- and heterosynaptic changes in efficacy are expressed in prefrontal neurons: an in vitro study in the rat. Synapse 12(1):82–85CrossRefGoogle Scholar
  19. Ioannou S, Kessous L, Caridakis G, Karpouzis K, Aharonson V, Kollias S (2006) Adaptive on-line neural network retraining for real life multimodal emotion recognition. In: International conference on artificial neural networks, pp 81–92Google Scholar
  20. Jacobs RA, Jordan MI, Nowlan SJ, Hinton GE (2014) Adaptive mixtures of local experts. Neural Comput 3(1):79–87CrossRefGoogle Scholar
  21. Karpouzis K, Caridakis G, Cowie R, Douglas-Cowie E (2013) Induction, recording and recognition of natural emotions from facial expressions and speech prosody. J Multimodal User Interfaces 7(3):195–206CrossRefGoogle Scholar
  22. Kessous L, Castellano G, Caridakis G (2010) Multimodal emotion recognition in speech-based interaction using facial expression, body gesture and acoustic analysis. J Multimodal User Interfaces 3(1):33–48CrossRefGoogle Scholar
  23. Lu BL, Ito M (1999) Task decomposition and module combination based on class relations: a modular neural network for pattern classification. IEEE Trans Neural Netw 10(5):1244–1256CrossRefGoogle Scholar
  24. Meunier D, Lambiotte R, Fornito A, Ersche KD, Bullmore ET (2009) Hierarchical modularity in human brain functional networks. Front Neuroinform 3:37CrossRefGoogle Scholar
  25. Mitsuyama S, Motoike J, Matsuo H (1999) Automatic classification of urinary sediment images by using a hierarchical modular neural network. In: SPIE’s international symposium on medical imaging, pp 680–688Google Scholar
  26. Mozaffari A, Scott KA, Chenouri S, Azad NL (2017) A modular ridge randomized neural network with differential evolutionary distributor applied to the estimation of sea ice thickness. Soft Comput 21(16):4635–4659CrossRefGoogle Scholar
  27. Planet S, Iriondo I (2013) Children’s emotion recognition from spontaneous speech using a reduced set of acoustic and linguistic features. Cognit Comput 5(4):526–532CrossRefGoogle Scholar
  28. Russell NT, Bakker HHC, Chaplin RI (2000) Modular neural network modelling for long-range prediction of an evaporator. Control Eng Pract 8(1):49–59CrossRefGoogle Scholar
  29. Sánchez D, Melin P, Castillo O (2015) Optimization of modular granular neural networks using a hierarchical genetic algorithm based on the database complexity applied to human recognition. Inf Sci 309:73–101CrossRefGoogle Scholar
  30. Sheikhan M, Bejani M, Gharavian D (2013) Modular neural-SVM scheme for speech emotion recognition using ANOVA feature selection method. Neural Comput Appl 23(1):215–227CrossRefGoogle Scholar
  31. Shibata K, Ikeda Y (2009) Effect of number of hidden neurons on learning in large-scale layered neural networks. In: ICCAS-SICE, pp 5008–5013Google Scholar
  32. Soleymani M, Lichtenauer J, Pun T, Pantic M (2012a) A multimodal database for affect recognition and implicit tagging. IEEE Trans Affect Comput 3(1):42–55CrossRefGoogle Scholar
  33. Soleymani M, Pantic M, Pun T (2012b) Multimodal emotion recognition in response to videos. IEEE Trans Affect Comput 3(2):211–223CrossRefGoogle Scholar
  34. Sun B, Li L, Wu X, Zuo T, Chen Y, Zhou G, He J, Zhu X (2016) Combining feature-level and decision-level fusion in a hierarchical classifier for emotion recognition in the wild. J Multimodal User Interfaces 10(2):125–137CrossRefGoogle Scholar
  35. Verma GK, Tiwary US (2014) Multimodal fusion framework: a multiresolution approach for emotion classification and recognition from physiological signals. Neuroimage 102:162–172CrossRefGoogle Scholar
  36. Wagner J, Andre E, Lingenfelser F, Kim J (2011) Exploring fusion methods for multimodal emotion recognition with missing data. IEEE Trans Affect Comput 2(4):206–218CrossRefGoogle Scholar
  37. Wang P, Xu L, Zhou SM, Fan Z, Li Y, Feng S (2010) A novel Bayesian learning method for information aggregation in modular neural networks. Expert Syst Appl 37(2):1071–1074CrossRefGoogle Scholar
  38. Wang SJ, Hilgetag CC, Zhou C (2011) Sustained activity in hierarchical modular neural networks: self-organized criticality and oscillations. Front Comput Neurosci 5:30Google Scholar
  39. Wen G, Hou Z, Li H, Li D, Jiang L, Xun E (2017) Ensemble of deep neural networks with probability-based fusion for facial expression recognition. Cognit Comput 9(5):597–610CrossRefGoogle Scholar
  40. Zeng Z, Pantic M, Roisman GI, Huang TS (2009) A survey of affect recognition methods: audio, visual, and spontaneous expressions. IEEE Trans Pattern Anal Mach Intell 31(1):39–58CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Faculty of Information TechnologyBeijing University of TechnologyBeijingChina
  2. 2.Beijing Key Laboratory of Computational Intelligence and Intelligent SystemBeijingChina

Personalised recommendations