A Controlling Strategy for an Active Vision System Based on Auditory and Visual Cues

  • Miranda Grahl
  • Frank Joublin
  • Franz Kummert
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6353)

Abstract

It is still an open question how preliminary visual reflexes can be structured by auditory and visual modalities in order to recognize objects. Therefore, we propose a new method for a controlling strategy for an active vision system that learns to focus on relevant multi modal aspects of the environment. The method is bootstrapped by a bottom up visual saliency process in order to extract important visual points. In this paper, we present our first results and focus on the unsupervised generation of training data for a multi-modal object recognition. The performance is compared to a human evaluated database.

Keywords

Mutual Information Object Recognition Active Vision Visual Saliency Object Recognition System 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Walther, D., Koch, C.: Modeling attention to salient proto-objects. Neural Networks 19, 1395–1407 (2006)MATHCrossRefGoogle Scholar
  2. 2.
    Newell, F.N.: Cross-modal object recognition. In: Calvert, G., Spence, C., Stein, B.E. (eds.) The handbook of multisensory processes, pp. 123–139. MIT Press, Cambridge (2004)Google Scholar
  3. 3.
    Xiao, M., Wong, M., Umali, M., Pomplun, M.: Using eye-tracking to study audio-visual perceptual integration. Perception 36(9), 1391–1395 (2007)CrossRefGoogle Scholar
  4. 4.
    Lehmann, S., Murray, M.M.: The role of multisensory memories in unisensory object discrimination. Cognitive Brain Research 24(2), 326–334 (2005)CrossRefGoogle Scholar
  5. 5.
    Molholm, S., Ritter, W., Javitt, D.C., Foxe, J.J.: Multisensory visual-auditory object recognition in humans: a high-density electrical mapping study. Cerebral Cortex 14, 452–465 (2004)CrossRefGoogle Scholar
  6. 6.
    Roy, D.: Learning Audio-Visual Associations using Mutual Information. In: Proceedings of International Workshop on Integrating Speech and Image Understanding, pp. 147–163 (1999)Google Scholar
  7. 7.
    Hershey, J., Movellan, J.: Audio-vision: Using audio-visual synchrony to locate sounds. In: Advances in Neural Information Processing Systems, pp. 813–819 (1999)Google Scholar
  8. 8.
    Rolf, M., Hanheide, M., Rohlfing, K.: Attention via synchrony: Making use of multimodal cues in social learning. IEEE Transactions on Autonomous Mental Development, 55–67 (2009)Google Scholar
  9. 9.
    Grahl, M., Joublin, F., Kummert, F.: A method for multi modal object recognition based on self-referential classification strategies. European Patent Application, No. 09177019.8, pending (2009)Google Scholar
  10. 10.
    Itti, L., Koch, C., Niebur, E.: A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(11), 1254–1259 (1998)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Miranda Grahl
    • 1
  • Frank Joublin
    • 2
  • Franz Kummert
    • 1
  1. 1.Cor-LabBielefeld UniversityBielefeldGermany
  2. 2.Honda Research Institute Europe GmbHOffenbachGermany

Personalised recommendations