The Video Conference Tool Robot ViCToR
We present a robotic tool that autonomously follows a conversation to enable remote presence in video conferencing. When humans participate in a meeting with the help of video conferencing tools, it is crucial that they are able to follow the conversation both with acoustic and visual input. To this end, we design and implement a video conferencing tool robot that uses binaural sound source localization as its main source to autonomously orient towards the currently talking speaker. To increase robustness of the acoustic cue against noise we supplement the sound localization with a source detection stage. Also, we include a simple onset detector to retain fast response times. Since we only use two microphones, we are confronted with ambiguities on whether a source is in front or behind the device. We resolve these ambiguities with the help of face detection and additional moves. We tailor the system to our target scenarios in experiments with a four minute scripted conversation. In these experiments we evaluate the influence of different system settings on the responsiveness and accuracy of the device.
KeywordsDecay Constant Sound Source Face Detection Sound Localization Onset Detector
Unable to display preview. Download preview PDF.
- 1.Adalgeirsson, S.O., Breazeal, C.: MeBot: a robotic platform for socially embodied presence. In: Proc. 5th Int’l Conf. on HRI (HRI 2010), pp. 15–22. IEEE Press (2010)Google Scholar
- 2.Belle, V., Deselaers, T., Schiffer, S.: Randomized trees for real-time one-step face detection and recognition. In: Proc. Int’l Conf. on Pattern Recognition (ICPR 2008), pp. 1–4. IEEE Computer Society, December 8–11, 2008Google Scholar
- 3.Blauert, J.: Spatial Hearing: The Psychophysics of Human Sound Localization. MIT Press (1997)Google Scholar
- 8.Goeckel, T., Lakemeyer, G., Wagner, H.: Echo suppression for sound localization with a model of the precendence effect. Tech. rep., Biology II, RWTH Aachen University (2014)Google Scholar
- 9.Jones, M., Viola, P.: Face recognition using boosted local features. In: Proc. ICCV (2003)Google Scholar
- 11.Kristoffersson, A., Coradeschi, S., Loutfi, A.: A review of mobile robotic telepresence. Adv. in Hum.-Comp. Int. 2013, 3:3–3:3 (2013)Google Scholar
- 13.May, T., van de Par, S., Kohlrausch, A.: A probabilistic model for robust localization based on a binaural auditory front-end. IEEE Trans. Audio, Speech, Language Process (1) (2011)Google Scholar
- 14.Sanchez-Riera, J., Alameda-Pineda, X., Wienke, J., Deleforge, A., Arias, S., Cech, J., Wrede, S., Horaud, Radu, P.: Online multimodal speaker detection for humanoid robots. In: Proc. Int’l Conf. on Humanoid Robotics (Humanoids 2012), pp. 126–133. IEEE, December 2012Google Scholar
- 15.Sangwan, A., Chiranth, M., Jamadagni, H., Sah, R., Prasad, R., Gaurav, V.: Vad techniques for real-time speech transmission on the internet, pp. 46–50 (2002)Google Scholar
- 16.Schiffer, S.: cAPTUre: a configurable audio pan-tilt unit for repeatable experimentation. Tech. rep., Knowledge-based Systems Group, RWTH Aachen University (2012)Google Scholar