Abstract
Despite of much evidence suggesting how and where sensory information converge in the human brain, the neural mechanisms of interaction among modalities at the level of neuronal cells and ensembles are still not well understood. The chapter explores emulation of multimodal information processing in a brain-like manner through evolving spiking neural network (ESNN) architectures that use several multimodal characteristics of the biological brains, e.g., multisensory neurons, crossmodal connections, capacity of lifelong adaptation and evolution, adaptive pattern recognition. Illustration is given on audiovisual ESNN for the person authentication problem. Preliminary results show that the integrated system can improve the accuracy in many operation points as well as it enables a range of multi-criteria optimizations.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Ben-Yacoub, S., Abdeljaoued, Y., Mayoraz, E.: Fusion of face and speech data for person identity verification. Martigny-Valais-Suisse, IDIAP-RR 99-03 (1999)
Bimbot, F., Bonastre, J.-F., Fredouille, C., et al.: A tutorial on text-independent speaker verification. EURASIP Journal on Applied Signal Processing 7(1), 430–451 (2004)
Brunelli, R., Falavigna, D.: Person identification using multiple cues. IEEE Transactions on Pattern Analysis and Machine Intelligence 17(10), 955–966 (1995)
Burileanu, C., Moraru, D., Bojan, L., et al.: On performance improvement of a speaker verification system using vector quantization, cohorts and hybrid cohort-world models. International Journal of Speech Technology 5, 247–257 (2002)
Burton, A.M., Bruce, V., Johnston, R.A.: Understanding face recognition with an interactive activation model. British Journal of Psychology 81, 361–380 (1990)
Calvert, G.A.: Crossmodal processing in the human brain: insights from functional neuroimaging studies. Cerebral Cortex 11, 1110–1123 (2001)
Chevallier, S., Paugam-Moisy, H., Lemaitre, F.: Distributed processing for modelling real-time multimodal perception in a virtual robot. In: Proc. International Multi-Conference Parallel and Distributed Computing and Networks, Innsbruck, pp. 393–398 (2005)
Chibelushi, C.C., Deravi, F., Mason, J.S.D.: A review of speech-based bimodal recognition. IEEE Transactions on Multimedia 4(1), 23–37 (2002)
Crepet, A., Paugam-Moisy, H., Reynaud, E., et al.: A modular neural model for binding several modalities. In: Proc. International Conference on Artificial Intelligence (ICAI), pp. 921–928 (2000)
Delorme, A., Gautrais, J., van Rullen, R., et al.: SpikeNET: a simulator for modeling large networks of integrate and fire neurons. Neurocomputing 26(27), 989–996 (1999)
Ellis, A.W., Young, A.W., Hay, D.C.: Modelling the recognition of faces and words. In: Morris, P.E. (ed.) Modelling Cognition. Wiley, New York (1987)
Ellis, H.D., Jones, D.M., Mosdell, N.: Intra- and inter-modal repetition priming of familiar faces and voices. British Journal of Psychology 88, 143–156 (1997)
Gerstner, W., Kistler, W.M.: Spiking Neuron Models. Cambridge Univ. Press, Cambridge (2002)
Gonzalo, D., Shallice, T., Dolan, R.: Time-dependent changes in learning audiovisual associations: a single-trial fMRI study. NeuroImage 11, 243–255 (2000)
Haller, M., Hyoung-Gook, K., Sikora, T.: Audiovisual anchorperson detection for topic-oriented navigation in broadcast news. In: Proc. IEEE International Conference on Multimedia and Expo, pp. 1817–1820. IEEE, Toronto (2006)
Kasabov, N., Postma, E., van den Herik, J.: AVIS: a connectionist-based framework for integrated auditory and visual information processing. Information Sciences 123, 127–148 (2000)
Maciokas, J., Goodman, P.H.: Large-scale spike-timingdependent-plasticity model of bimodal (audio/visual) processing. Technical Report, Goodman Brain Computation Laboratory. University of Nevada, Reno (2003)
McIntosh, A.R., Cabeza, R.E., Lobaugh, N.J.: Analysis of neural interactions explains the activation of occipital cortex by an auditory stimulus. Journal of Neurophysiology 80, 2790–2796 (1998)
Messer, K., Matas, J., Kittler, J., et al.: XM2VTSDB. The extended M2VTS database. In: Proc. 2nd International Conference on Audio-Video Based Biometric Person Authentication, Washington, pp. 72–77 (1999)
Park, C., Choi, T., Kim, Y., et al.: Multi-modal human verification using face and speech. In: Proc. IEEE Interantional Conference on Computer Vision Systems (ICVS), pp. 54–59 (2006)
Poggio, T., Girosi, F.: Regularization algorithms for learning that are equivalent to multilayer networks. Science 247, 978–982 (1990)
Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker verification using adapted Gaussian Mixture Models. Digital Signal Processing 10, 19–41 (2000)
Rosenberg, A.E., Soong, F.K.: Evaluation of a vector quantization talker recognition system in text independent and text dependent modes. Computer Speech and Language 2(3-4), 143–157 (1987)
Ross, A., Jain, A.K.: Information fusion in biometrics. Pattern Recognition Letters 24(13), 2115–2125 (2003)
Sanderson, C., Paliwal, K.K.: Identity verification using speech and face information. Digital Signal Processing 14, 449–480 (2004)
Séguier, R., Mercier, D.: Audio-visual speech recognition one pass learning with spiking neurons. In: Dorronsoro, J.R. (ed.) ICANN 2002. LNCS, vol. 2415, p. 1207. Springer, Heidelberg (2002)
Sharkey, A.: Combining Artificial Neural Nets: Ensemble and Modular Multi-Net Systems. Springer, New York (1999)
Stein, B.E., Meredith, M.A.: The Merging of the Senses. The MIT Press, Cambridge (1993)
Thorpe, S.J., Fabre-Thorpe, M.: Seeking categories in the brain. Science 291, 260–262 (2001)
Viola, P., Jones, M.: Rapid Object Detection using a Boosted Cascade of Simple Features. In: Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2001), vol. 1, pp. 511–517 (2001)
von Kriegstein, K., Kleinschmidt, A., Sterzer, P., et al.: Interaction of face and voice areas during speaker recognition. Journal of Cognitive Neuroscience 17(3), 367–376 (2005)
von Kriegstein, K., Giraud, A.: Implicit multisensory associations influence voice recognition. Plos Biology 4(10), 1809–1820 (2006)
Wysoski, S.G., Benuskova, L., Kasabov, N.: On-line learning with structural adaptation in a network of spiking neurons for visual pattern recognition. In: Kollias, S.D., Stafylopatis, A., Duch, W., Oja, E. (eds.) ICANN 2006. LNCS, vol. 4131, pp. 61–70. Springer, Heidelberg (2006)
Wysoski, S.G., Benuskova, L., Kasabov, N.: Fast and adaptive network of spiking for multi-view visual pattern recognition. Neurocomputing (2007) (under review)
Wysoski, S.G., Benuskova, L., Kasabov, N.: Text-independent speaker authentication with spiking neural networks. In: de Sá, J.M., Alexandre, L.A., Duch, W., Mandic, D.P. (eds.) ICANN 2007. LNCS, vol. 4669, pp. 758–767. Springer, Heidelberg (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Wysoski, S.G., Benuskova, L., Kasabov, N. (2010). Brain-Like Evolving Spiking Neural Networks for Multimodal Information Processing. In: Hanazawa, A., Miki, T., Horio, K. (eds) Brain-Inspired Information Technology. Studies in Computational Intelligence, vol 266. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04025-2_3
Download citation
DOI: https://doi.org/10.1007/978-3-642-04025-2_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04024-5
Online ISBN: 978-3-642-04025-2
eBook Packages: EngineeringEngineering (R0)