Brain-Like Evolving Spiking Neural Networks for Multimodal Information Processing

  • Simei Gomes Wysoski
  • Lubica Benuskova
  • Nikola Kasabov
Part of the Studies in Computational Intelligence book series (SCI, volume 266)


Despite of much evidence suggesting how and where sensory information converge in the human brain, the neural mechanisms of interaction among modalities at the level of neuronal cells and ensembles are still not well understood. The chapter explores emulation of multimodal information processing in a brain-like manner through evolving spiking neural network (ESNN) architectures that use several multimodal characteristics of the biological brains, e.g., multisensory neurons, crossmodal connections, capacity of lifelong adaptation and evolution, adaptive pattern recognition. Illustration is given on audiovisual ESNN for the person authentication problem. Preliminary results show that the integrated system can improve the accuracy in many operation points as well as it enables a range of multi-criteria optimizations.


Speech Signal Individual Modality Speaker Verification Bidirectional Associative Memory False Acceptance Rate 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Ben-Yacoub, S., Abdeljaoued, Y., Mayoraz, E.: Fusion of face and speech data for person identity verification. Martigny-Valais-Suisse, IDIAP-RR 99-03 (1999)Google Scholar
  2. 2.
    Bimbot, F., Bonastre, J.-F., Fredouille, C., et al.: A tutorial on text-independent speaker verification. EURASIP Journal on Applied Signal Processing 7(1), 430–451 (2004)Google Scholar
  3. 3.
    Brunelli, R., Falavigna, D.: Person identification using multiple cues. IEEE Transactions on Pattern Analysis and Machine Intelligence 17(10), 955–966 (1995)CrossRefGoogle Scholar
  4. 4.
    Burileanu, C., Moraru, D., Bojan, L., et al.: On performance improvement of a speaker verification system using vector quantization, cohorts and hybrid cohort-world models. International Journal of Speech Technology 5, 247–257 (2002)zbMATHCrossRefGoogle Scholar
  5. 5.
    Burton, A.M., Bruce, V., Johnston, R.A.: Understanding face recognition with an interactive activation model. British Journal of Psychology 81, 361–380 (1990)Google Scholar
  6. 6.
    Calvert, G.A.: Crossmodal processing in the human brain: insights from functional neuroimaging studies. Cerebral Cortex 11, 1110–1123 (2001)CrossRefGoogle Scholar
  7. 7.
    Chevallier, S., Paugam-Moisy, H., Lemaitre, F.: Distributed processing for modelling real-time multimodal perception in a virtual robot. In: Proc. International Multi-Conference Parallel and Distributed Computing and Networks, Innsbruck, pp. 393–398 (2005)Google Scholar
  8. 8.
    Chibelushi, C.C., Deravi, F., Mason, J.S.D.: A review of speech-based bimodal recognition. IEEE Transactions on Multimedia 4(1), 23–37 (2002)CrossRefGoogle Scholar
  9. 9.
    Crepet, A., Paugam-Moisy, H., Reynaud, E., et al.: A modular neural model for binding several modalities. In: Proc. International Conference on Artificial Intelligence (ICAI), pp. 921–928 (2000)Google Scholar
  10. 10.
    Delorme, A., Gautrais, J., van Rullen, R., et al.: SpikeNET: a simulator for modeling large networks of integrate and fire neurons. Neurocomputing 26(27), 989–996 (1999)CrossRefGoogle Scholar
  11. 11.
    Ellis, A.W., Young, A.W., Hay, D.C.: Modelling the recognition of faces and words. In: Morris, P.E. (ed.) Modelling Cognition. Wiley, New York (1987)Google Scholar
  12. 12.
    Ellis, H.D., Jones, D.M., Mosdell, N.: Intra- and inter-modal repetition priming of familiar faces and voices. British Journal of Psychology 88, 143–156 (1997)Google Scholar
  13. 13.
    Gerstner, W., Kistler, W.M.: Spiking Neuron Models. Cambridge Univ. Press, Cambridge (2002)zbMATHGoogle Scholar
  14. 14.
    Gonzalo, D., Shallice, T., Dolan, R.: Time-dependent changes in learning audiovisual associations: a single-trial fMRI study. NeuroImage 11, 243–255 (2000)CrossRefGoogle Scholar
  15. 15.
    Haller, M., Hyoung-Gook, K., Sikora, T.: Audiovisual anchorperson detection for topic-oriented navigation in broadcast news. In: Proc. IEEE International Conference on Multimedia and Expo, pp. 1817–1820. IEEE, Toronto (2006)CrossRefGoogle Scholar
  16. 16.
    Kasabov, N., Postma, E., van den Herik, J.: AVIS: a connectionist-based framework for integrated auditory and visual information processing. Information Sciences 123, 127–148 (2000)zbMATHCrossRefGoogle Scholar
  17. 17.
    Maciokas, J., Goodman, P.H.: Large-scale spike-timingdependent-plasticity model of bimodal (audio/visual) processing. Technical Report, Goodman Brain Computation Laboratory. University of Nevada, Reno (2003)Google Scholar
  18. 18.
    McIntosh, A.R., Cabeza, R.E., Lobaugh, N.J.: Analysis of neural interactions explains the activation of occipital cortex by an auditory stimulus. Journal of Neurophysiology 80, 2790–2796 (1998)Google Scholar
  19. 19.
    Messer, K., Matas, J., Kittler, J., et al.: XM2VTSDB. The extended M2VTS database. In: Proc. 2nd International Conference on Audio-Video Based Biometric Person Authentication, Washington, pp. 72–77 (1999)Google Scholar
  20. 20.
    Park, C., Choi, T., Kim, Y., et al.: Multi-modal human verification using face and speech. In: Proc. IEEE Interantional Conference on Computer Vision Systems (ICVS), pp. 54–59 (2006)Google Scholar
  21. 21.
    Poggio, T., Girosi, F.: Regularization algorithms for learning that are equivalent to multilayer networks. Science 247, 978–982 (1990)CrossRefMathSciNetGoogle Scholar
  22. 22.
    Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker verification using adapted Gaussian Mixture Models. Digital Signal Processing 10, 19–41 (2000)CrossRefGoogle Scholar
  23. 23.
    Rosenberg, A.E., Soong, F.K.: Evaluation of a vector quantization talker recognition system in text independent and text dependent modes. Computer Speech and Language 2(3-4), 143–157 (1987)CrossRefGoogle Scholar
  24. 24.
    Ross, A., Jain, A.K.: Information fusion in biometrics. Pattern Recognition Letters 24(13), 2115–2125 (2003)CrossRefGoogle Scholar
  25. 25.
    Sanderson, C., Paliwal, K.K.: Identity verification using speech and face information. Digital Signal Processing 14, 449–480 (2004)CrossRefGoogle Scholar
  26. 26.
    Séguier, R., Mercier, D.: Audio-visual speech recognition one pass learning with spiking neurons. In: Dorronsoro, J.R. (ed.) ICANN 2002. LNCS, vol. 2415, p. 1207. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  27. 27.
    Sharkey, A.: Combining Artificial Neural Nets: Ensemble and Modular Multi-Net Systems. Springer, New York (1999)zbMATHGoogle Scholar
  28. 28.
    Stein, B.E., Meredith, M.A.: The Merging of the Senses. The MIT Press, Cambridge (1993)Google Scholar
  29. 29.
    Thorpe, S.J., Fabre-Thorpe, M.: Seeking categories in the brain. Science 291, 260–262 (2001)CrossRefGoogle Scholar
  30. 30.
    Viola, P., Jones, M.: Rapid Object Detection using a Boosted Cascade of Simple Features. In: Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2001), vol. 1, pp. 511–517 (2001)Google Scholar
  31. 31.
    von Kriegstein, K., Kleinschmidt, A., Sterzer, P., et al.: Interaction of face and voice areas during speaker recognition. Journal of Cognitive Neuroscience 17(3), 367–376 (2005)CrossRefGoogle Scholar
  32. 32.
    von Kriegstein, K., Giraud, A.: Implicit multisensory associations influence voice recognition. Plos Biology 4(10), 1809–1820 (2006)Google Scholar
  33. 33.
    Wysoski, S.G., Benuskova, L., Kasabov, N.: On-line learning with structural adaptation in a network of spiking neurons for visual pattern recognition. In: Kollias, S.D., Stafylopatis, A., Duch, W., Oja, E. (eds.) ICANN 2006. LNCS, vol. 4131, pp. 61–70. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  34. 34.
    Wysoski, S.G., Benuskova, L., Kasabov, N.: Fast and adaptive network of spiking for multi-view visual pattern recognition. Neurocomputing (2007) (under review)Google Scholar
  35. 35.
    Wysoski, S.G., Benuskova, L., Kasabov, N.: Text-independent speaker authentication with spiking neural networks. In: de Sá, J.M., Alexandre, L.A., Duch, W., Mandic, D.P. (eds.) ICANN 2007. LNCS, vol. 4669, pp. 758–767. Springer, Heidelberg (2007)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Simei Gomes Wysoski
    • 1
  • Lubica Benuskova
    • 1
  • Nikola Kasabov
    • 1
  1. 1.Department Knowledge Engineering and Discovery Research InstituteAuckland University of Technology, AUT Tech ParkAucklandNew Zealand

Personalised recommendations