Towards Crossmodal Learning for Smooth Multimodal Attention Orientation

  • Frederik Haarslev
  • David Docherty
  • Stefan-Daniel Suvei
  • William Kristian Juel
  • Leon BodenhagenEmail author
  • Danish Shaikh
  • Norbert Krüger
  • Poramate Manoonpong
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11357)


Orienting attention towards another person of interest is a fundamental social behaviour prevalent in human-human interaction and crucial in human-robot interaction. This orientation behaviour is often governed by the received audio-visual stimuli. We present an adaptive neural circuit for multisensory attention orientation that combines auditory and visual directional cues. The circuit learns to integrate sound direction cues, extracted via a model of the peripheral auditory system of lizards, with visual directional cues via deep learning based object detection. We implement the neural circuit on a robot and demonstrate that integrating multisensory information via the circuit generates appropriate motor velocity commands that control the robot’s orientation movements. We experimentally validate the adaptive neural circuit for co-located human target and a loudspeaker emitting a fixed tone.


Sensor fusion Neural control Human robot interaction 



This research was part of the SMOOTH project (project number 6158-00009B) by Innovation Fund Denmark.


  1. 1.
  2. 2.
    Alonso-Martín, F., Gorostiza, J.F., Malfaz, M., Salichs, M.A.: User localization during human-robot interaction. Sensors 12(7), 9913–9935 (2012)CrossRefGoogle Scholar
  3. 3.
    Atrey, P.K., Hossain, M.A., El Saddik, A., Kankanhalli, M.S.: Multimodal fusion for multimedia analysis: a survey. Multimedia Syst. 16(6), 345–379 (2010)CrossRefGoogle Scholar
  4. 4.
    van den Brule, R., Dotsch, R., Bijlstra, G., Wigboldus, D.H.J., Haselager, P.: Do robot performance and behavioral style affect human trust? Int. J. Soc. Robot. 6(4), 519–531 (2014)CrossRefGoogle Scholar
  5. 5.
    Christensen-Dalsgaard, J., Manley, G.: Directionality of the lizard ear. J. Exp. Biol. 208(6), 1209–1217 (2005)CrossRefGoogle Scholar
  6. 6.
    D’Arca, E., Robertson, N.M., Hopgood, J.: Person tracking via audio and video fusion. In: 9th IET Data Fusion Target Tracking Conference: Algorithms Applications, pp. 1–6 (2012)Google Scholar
  7. 7.
    David, B., David, A.: Combining visual and auditory information. In: Martinez-Conde, S., Macknik, S., Martinez, L., Alonso, J.M., Tse, P. (eds.) Visual Perception-Fundamentals of Awareness: Multi-Sensory Integration and High-Order Perception, Progress in Brain Research, Part B, vol. 155, pp. 243–258. Elsevier (2006)Google Scholar
  8. 8.
    Gehrig, T., Nickel, K., Ekenel, H.K., Klee, U., McDonough, J.: Kalman filters for audio-video source localization. In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 118–121 (2005)Google Scholar
  9. 9.
    Graf, B., Reiser, U., Hägele, M., Mauz, K., Klein, P.: Robotic home assistant Care-O-bot 3 - product vision and innovation platform. In: IEEE Workshop on Advanced Robotics and its Social Impacts (2009)Google Scholar
  10. 10.
    Hoseinnezhad, R., Vo, B.N., Vo, B.T., Suter, D.: Bayesian integration of audio and visual information for multi-target tracking using a CB-member filter. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 2300–2303 (2011)Google Scholar
  11. 11.
    Kheradiya, J., Reddy, S., Hegde, R.: Active Speaker Detection using audio-visual sensor array. In: IEEE International Symposium on Signal Processing and Information Technology, pp. 480–484 (2014)Google Scholar
  12. 12.
    Kiliç, V., Barnard, M., Wang, W., Kittler, J.: Audio assisted robust visual tracking with adaptive particle filtering. IEEE Trans. Multimedia 17(2), 186–200 (2015)CrossRefGoogle Scholar
  13. 13.
    Mayer, A.R., Dorflinger, J.M., Rao, S.M., Seidenberg, M.: Neural networks underlying endogenous and exogenous visual-spatial orienting. Neuroimage 23(2), 534–541 (2004)CrossRefGoogle Scholar
  14. 14.
    Porr, B., Wörgötter, F.: Strongly improved stability and faster convergence of temporal sequence learning by utilising input correlations only. Neural Comput. 18(6), 1380–1412 (2006)CrossRefzbMATHGoogle Scholar
  15. 15.
    Posner, M.I.: Orienting of attention. Q. J. Exp. Psychol. 32(1), 3–25 (1980)CrossRefGoogle Scholar
  16. 16.
    Qian, X., Brutti, A., Omologo, M., Cavallaro, A.: 3D audio-visual speaker tracking with an adaptive particle filter. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 2896–2900 (2017)Google Scholar
  17. 17.
    Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. arXiv preprint arXiv:1612.08242 (2016)
  18. 18.
    Sanchez-Riera, J., et al.: Online multimodal speaker detection for humanoid robots. In: 12th IEEE-RAS International Conference on Humanoid Robots, pp. 126–133 (2012)Google Scholar
  19. 19.
    Shaikh, D., Hallam, J., Christensen-Dalsgaard, J.: From “ear” to there: a review of biorobotic models of auditory processing in lizards. Biol. Cybern. 110(4), 303–317 (2016)CrossRefzbMATHGoogle Scholar
  20. 20.
    Talantzis, F., Pnevmatikakis, A., Constantinides, A.G.: Audio-visual active speaker tracking in cluttered indoors environments. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 39(1), 7–15 (2009)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Frederik Haarslev
    • 1
  • David Docherty
    • 2
  • Stefan-Daniel Suvei
    • 1
  • William Kristian Juel
    • 1
  • Leon Bodenhagen
    • 1
    Email author
  • Danish Shaikh
    • 2
  • Norbert Krüger
    • 1
  • Poramate Manoonpong
    • 2
    • 3
  1. 1.SDU Robotics, Maersk Mc-Kinney Moller InstituteUniversity of Southern DenmarkOdenseDenmark
  2. 2.SDU Embodied Systems for Robotics and Learning, Maersk Mc-Kinney Moller InstituteUniversity of Southern DenmarkOdenseDenmark
  3. 3.Bio-inspired Robotics and Neural Engineering Laboratory, School of Information Science and TechnologyVidyasirimedhi Institute of Science and TechnologyWangchanThailand

Personalised recommendations