The Video Conference Tool Robot ViCToR

  • Tom Goeckel
  • Stefan Schiffer
  • Hermann Wagner
  • Gerhard Lakemeyer
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9245)


We present a robotic tool that autonomously follows a conversation to enable remote presence in video conferencing. When humans participate in a meeting with the help of video conferencing tools, it is crucial that they are able to follow the conversation both with acoustic and visual input. To this end, we design and implement a video conferencing tool robot that uses binaural sound source localization as its main source to autonomously orient towards the currently talking speaker. To increase robustness of the acoustic cue against noise we supplement the sound localization with a source detection stage. Also, we include a simple onset detector to retain fast response times. Since we only use two microphones, we are confronted with ambiguities on whether a source is in front or behind the device. We resolve these ambiguities with the help of face detection and additional moves. We tailor the system to our target scenarios in experiments with a four minute scripted conversation. In these experiments we evaluate the influence of different system settings on the responsiveness and accuracy of the device.


Decay Constant Sound Source Face Detection Sound Localization Onset Detector 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Adalgeirsson, S.O., Breazeal, C.: MeBot: a robotic platform for socially embodied presence. In: Proc. 5th Int’l Conf. on HRI (HRI 2010), pp. 15–22. IEEE Press (2010)Google Scholar
  2. 2.
    Belle, V., Deselaers, T., Schiffer, S.: Randomized trees for real-time one-step face detection and recognition. In: Proc. Int’l Conf. on Pattern Recognition (ICPR 2008), pp. 1–4. IEEE Computer Society, December 8–11, 2008Google Scholar
  3. 3.
    Blauert, J.: Spatial Hearing: The Psychophysics of Human Sound Localization. MIT Press (1997)Google Scholar
  4. 4.
    Dietz, M., Klein-Henning, M., Hohmann, V.: The influence of pause, attack, and decay duration of the ongoing envelope on sound lateralization. Journal of the Acoustical Society of America 137(2), EL137–EL143 (2015)CrossRefGoogle Scholar
  5. 5.
    Dietz, M., Marquardt, T., Stange, A., Pecka, M., Grothe, B., McAlpine, D.: Emphasis of spatial cues in the temporal fine structure during the rising segments of amplitude-modulated sounds ii: Single neuron recordings. Journal of Neurophysiology 111(10), 1973–1985 (2014)CrossRefGoogle Scholar
  6. 6.
    Elhilali, M., Xiang, J., Shamma, S.A., Simon, J.Z.: Interaction between attention and bottom-up saliency mediates the representation of foreground and background in an auditory scene. PLoS Biology 7(6), e1000129 (2009)CrossRefGoogle Scholar
  7. 7.
    Faller, C., Merimaa, J.: Source localization in complex listening situations: Selection of binaural cues based on interaural coherence. J. Acoust. Soc. Am. 116(5), 3075–3089 (2004)CrossRefGoogle Scholar
  8. 8.
    Goeckel, T., Lakemeyer, G., Wagner, H.: Echo suppression for sound localization with a model of the precendence effect. Tech. rep., Biology II, RWTH Aachen University (2014)Google Scholar
  9. 9.
    Jones, M., Viola, P.: Face recognition using boosted local features. In: Proc. ICCV (2003)Google Scholar
  10. 10.
    Kayser, C., Petkov, C.I., Lippert, M., Logothetis, N.K.: Mechanisms for allocating auditory attention: An auditory saliency map. Current Biology 15, 1943–1947 (2005)CrossRefGoogle Scholar
  11. 11.
    Kristoffersson, A., Coradeschi, S., Loutfi, A.: A review of mobile robotic telepresence. Adv. in Hum.-Comp. Int. 2013, 3:3–3:3 (2013)Google Scholar
  12. 12.
    Litovsky, R.Y., Colburn, H.S., Yost, W.A., Guzman, S.J.: The precedence effect. J. Acoust. Soc. Am. 106(4), 1633–1654 (1999)CrossRefGoogle Scholar
  13. 13.
    May, T., van de Par, S., Kohlrausch, A.: A probabilistic model for robust localization based on a binaural auditory front-end. IEEE Trans. Audio, Speech, Language Process (1) (2011)Google Scholar
  14. 14.
    Sanchez-Riera, J., Alameda-Pineda, X., Wienke, J., Deleforge, A., Arias, S., Cech, J., Wrede, S., Horaud, Radu, P.: Online multimodal speaker detection for humanoid robots. In: Proc. Int’l Conf. on Humanoid Robotics (Humanoids 2012), pp. 126–133. IEEE, December 2012Google Scholar
  15. 15.
    Sangwan, A., Chiranth, M., Jamadagni, H., Sah, R., Prasad, R., Gaurav, V.: Vad techniques for real-time speech transmission on the internet, pp. 46–50 (2002)Google Scholar
  16. 16.
    Schiffer, S.: cAPTUre: a configurable audio pan-tilt unit for repeatable experimentation. Tech. rep., Knowledge-based Systems Group, RWTH Aachen University (2012)Google Scholar
  17. 17.
    Supper, B., Brookes, T., Rumsey, F.: An auditory onset detection algorithm for improved automatic source localization. IEEE Trans. Audio, Speech Language Process 14(3), 1008–1016 (2006)CrossRefGoogle Scholar
  18. 18.
    Zhang, C., Yin, P., Rui, Y., Cutler, R., Viola, P., Sun, X., Pinto, N., Zhang, Z.: Boosting-based multimodal speaker detection for distributed meeting videos. IEEE Transactions on Multimedia 10(8), 1541–1552 (2008)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Tom Goeckel
    • 1
  • Stefan Schiffer
    • 2
  • Hermann Wagner
    • 1
  • Gerhard Lakemeyer
    • 2
  1. 1.Institute of Biology IIRWTH Aachen UniversityAachenGermany
  2. 2.Knowledge Based Systems Group (KBSG)RWTH Aachen UniversityAachenGermany

Personalised recommendations