Social Interaction of Humanoid Robot Based on Audio-Visual Tracking

  • Hiroshi G. Okuno
  • Kazuhiro Nakadai
  • Hiroaki Kitano
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2358)


Social interaction is essential in improving robot human interface. Such behaviors for social interaction may include paying attention to a new sound source, moving toward it, or keeping face to face with a moving speaker. Some sound-centered behaviors may be difficult to attain, because the mixture of sounds is not well treated or auditory processing is too slow for real-time applications. Recently, Nakadai et al have developed real-time auditory and visual multiple-talker tracking technology by associating auditory and visual streams. The system is implemented on an upper-torso humanoid and the real-time talker tracking is attained with 200 msec of delay by distributed processing on four PCs connected by Gigabit Ethernet. Focus-of-attention is programmable and allows a variety of behaviors. The system demonstrates non-verbal social interaction by realizing a receptionist robot by focusing on an associated stream, while a companion robot on an auditory stream.


Sound Source Humanoid Robot Visual Stream Auditory Event Auditory Stream 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Breazeal, C., AND Scassellati, B. A context-dependent attention system for a social robot. Proceedints of the Sixteenth International Joint Conf. on Atificial Intelligence (IJCAI-99), 1146–1151.Google Scholar
  2. 2.
    Breazeal, C. Emotive qualities in robot speech. Proc. of IEEE/RSJ International Conf. on Intelligent Robots and Systems (IROS-2001), 1389–1394.Google Scholar
  3. 3.
    Brooks, R. A., Breazeal, C., Irie, R., Kemp, C. C., Marjanovic, M., Scassellati, B., AND Williamson, M. M. Alternative essences of intelligence. Proc. of 15th National Conf. on Artificial Intelligence (AAAI-98), 961–968.Google Scholar
  4. 4.
    Horvitz, E., AND Paek, T. A computational architecture for conversation. Proc. of Seventh International Conf. on User Modeling (1999), Springer, 201–210.Google Scholar
  5. 5.
    Kagami, S., Okada, K., Inaba, M., AND Inoue, H. Real-time 3d optical flow generation system. Proc. of International Conf. on Multisensor Fusion and Integration for Intelligent Systems (MFI’99), 237–242.Google Scholar
  6. 6.
    Kawahara, T., Lee, A., Kobayashi, T., Takeda, K., Minematsu, N., Itou, K., Ito, A., Yamamoto, M., Yamada, A., Utsuro, T., AND Shikano, K. Japanese dictation toolkit-1997 version-. Journal of Acoustic Society Japan (E)20, 3 (1999), 233–239.Google Scholar
  7. 7.
    Matsusaka, Y., Tojo, T., Kuota, S., Furukawa, K., Tamiya, D., Hayata, K., Nakano, Y., AND Kobayashi, T. Multi-person conversation via multi-modal interface — a robot who communicates with multi-user. Proc. of 6th European Conf. on Speech Communication Technology (EUROSPEECH-99), ESCA, 1723–1726.Google Scholar
  8. 8.
    Nupakadai, K., Lourens, T., Okuno, H. G., AND Kitano, H. Active audition for humanoid. Proc. of 17th National Conf. on Artificial Intelligence (AAAI-2000), 832–839.Google Scholar
  9. 9.
    Nakadai, K., Matsui, T., Okuno, H. G., AND Kitano, H. Active audition system and humanoid exterior design. Proc. of IEEE/RAS International Conf. on Intelligent Robots and Systems (IROS-2000), 1453–1461.Google Scholar
  10. 10.
    Nakadai, K. Hidai, K., Mizoguchi, H., Okuno, H. G., AND Kitano, H. Realtime auditory and visual multiple-object tracking for robots. Proc. of the Seventeenth International Joint Conf. on Artificial Intelligence (IJCAI-01), 1425–1432.Google Scholar
  11. 11.
    Okuno, H., Nakadai, K., Lourens, T., AND Kitano, H. Sound and visual tracking for humanoid robot. Proc. of Seventeenth International Conf. on Industrial and Engineering Applications of Artificial Intelligence and Expert Systems (IEA/AIE-2001) (Jun. 2001), LNAI 2070, Springer-Verlag, 640–650.Google Scholar
  12. 12.
    Okuno, H. G., Nakatani, T., AND Kawabata, T. Listening to two simultaneous speeches. Speech Communication 27, 3–4 (1999), 281–298.Google Scholar
  13. 13.
    Ono, T., Imai, M., AND Ishiguro, H. A model of embodied communications with gestures between humans and robots. Proc. of Twenty-third Annual Meeting of the Cognitive Science Society (CogSci2001), AAAI, 732–737.Google Scholar
  14. 14.
    Waldherr, S., Thrun, S., Romero, R., AND Margaritis, D. Template-based recoginition of pose and motion gestures on a mobile robot. Proc. of 15th National Conf. on Artificial Intelligence (AAAI-98), 977–982.Google Scholar
  15. 15.
    Wolfe, J., Cave, K. R., AND Franzel, S. Guided search: An alternative to the feature integration model for visual search. Journal of Experimental Psychology: Human Perception and Performance 15, 3 (1989), 419–433.CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Hiroshi G. Okuno
    • 1
    • 2
  • Kazuhiro Nakadai
    • 1
  • Hiroaki Kitano
    • 2
    • 3
  1. 1.Graduate School of InformaticsKyoto UniversityKyotoJapan
  2. 2.Kitano Symbiotic Systems Project ERATOJapan Science and Technolog Corp.TokyoJapan
  3. 3.Sony Computer Science Laboratories, Inc.ShinagawaTokyo

Personalised recommendations