ICANN 2010: Artificial Neural Networks – ICANN 2010 pp 492-497 | Cite as
A Controlling Strategy for an Active Vision System Based on Auditory and Visual Cues
Abstract
It is still an open question how preliminary visual reflexes can be structured by auditory and visual modalities in order to recognize objects. Therefore, we propose a new method for a controlling strategy for an active vision system that learns to focus on relevant multi modal aspects of the environment. The method is bootstrapped by a bottom up visual saliency process in order to extract important visual points. In this paper, we present our first results and focus on the unsupervised generation of training data for a multi-modal object recognition. The performance is compared to a human evaluated database.
Keywords
Mutual Information Object Recognition Active Vision Visual Saliency Object Recognition SystemPreview
Unable to display preview. Download preview PDF.
References
- 1.Walther, D., Koch, C.: Modeling attention to salient proto-objects. Neural Networks 19, 1395–1407 (2006)MATHCrossRefGoogle Scholar
- 2.Newell, F.N.: Cross-modal object recognition. In: Calvert, G., Spence, C., Stein, B.E. (eds.) The handbook of multisensory processes, pp. 123–139. MIT Press, Cambridge (2004)Google Scholar
- 3.Xiao, M., Wong, M., Umali, M., Pomplun, M.: Using eye-tracking to study audio-visual perceptual integration. Perception 36(9), 1391–1395 (2007)CrossRefGoogle Scholar
- 4.Lehmann, S., Murray, M.M.: The role of multisensory memories in unisensory object discrimination. Cognitive Brain Research 24(2), 326–334 (2005)CrossRefGoogle Scholar
- 5.Molholm, S., Ritter, W., Javitt, D.C., Foxe, J.J.: Multisensory visual-auditory object recognition in humans: a high-density electrical mapping study. Cerebral Cortex 14, 452–465 (2004)CrossRefGoogle Scholar
- 6.Roy, D.: Learning Audio-Visual Associations using Mutual Information. In: Proceedings of International Workshop on Integrating Speech and Image Understanding, pp. 147–163 (1999)Google Scholar
- 7.Hershey, J., Movellan, J.: Audio-vision: Using audio-visual synchrony to locate sounds. In: Advances in Neural Information Processing Systems, pp. 813–819 (1999)Google Scholar
- 8.Rolf, M., Hanheide, M., Rohlfing, K.: Attention via synchrony: Making use of multimodal cues in social learning. IEEE Transactions on Autonomous Mental Development, 55–67 (2009)Google Scholar
- 9.Grahl, M., Joublin, F., Kummert, F.: A method for multi modal object recognition based on self-referential classification strategies. European Patent Application, No. 09177019.8, pending (2009)Google Scholar
- 10.Itti, L., Koch, C., Niebur, E.: A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(11), 1254–1259 (1998)CrossRefGoogle Scholar