Recognition of Gestural Object Reference with Auditory Feedback

  • Ingo Bax
  • Holger Bekel
  • Gunther Heidemann
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2714)


We present a cognitively motivated vision architecture for the evaluation of pointing gestures. The system views a scene of several structured objects and a pointing human hand. A neural classifier gives an estimation of the pointing direction, then the object correspondence is established using a sub-symbolic representation of both the scene and the pointing direction. The system achieves high robustness because the result (the indicated location) does not primarily depend on the accuracy of the pointing direction classification. Instead, the scene is analysed for low level saliency features to restrict the set of all possible pointing locations to a subset of highly likely locations. This transformation of the “continuous” to a “discrete” pointing problem simultaneously facilitates an auditory feedback whenever the object reference changes, which leads to a significantly improved human-machine interaction.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    G. Backer, B. Mertsching, and M. Bollmann. Data-and Model-Driven Gaze Control for an Active-Vision System. IEEE Trans. PAMI, 23(12):1415–1429, 2001.Google Scholar
  2. 2.
    M. Fislage, R. Rae, and H. Ritter. Using visual attention to recognize human pointing gestures in assembly tasks. In 7th IEEE Int’l Conf. Comp. Vision, 1999.Google Scholar
  3. 3.
    K. Fukushima. Neocognitron: A Self-Organizing Neural Network Model for a Mechanism of Pattern Recognition unaffected by Shift in Position. Biol. Cybern., 36:193–202, 1980.MATHCrossRefMathSciNetGoogle Scholar
  4. 4.
    C. Harris and M. Stephens. A Combined Corner and Edge Detector. In Proc. 4th Alvey Vision Conf., pages 147–151, 1988.Google Scholar
  5. 5.
    G. Heidemann. Ein flexibel einsetzbares Objekterkennungssystem auf der Basis neuronaler Netze. PhD thesis, Univ. Bielefeld, 1998. Infix, DISKI 190.Google Scholar
  6. 6.
    G. Heidemann, D. Lücke, and H. Ritter. A System for Various Visual Classification Tasks Based on Neural Networks. In A. Sanfeliu et al., editor, Proc. 15th Int’l Conf. on Pattern Recognition ICPR 2000, Barcelona, volume I, pages 9–12, 2000.Google Scholar
  7. 7.
    G. Heidemann, R. Rae, H. Bekel, I. Bax, and H. Ritter. Integrating Context-Free and Context-Dependent Attentional Mechanisms for Gestural Object Reference. In Proc. Int’l Conf. Cognitive Vision Systems, Graz, Austria, 2003.Google Scholar
  8. 8.
    G. Heidemann and H. Ritter. Efficient Vector Quantization Using the WTA-rule with Activity Equalization. Neural Processing Letters, 13(1):17–30, 2001.MATHCrossRefGoogle Scholar
  9. 9.
    L. Itti, C. Koch, and E. Niebur. A Model of Saliency-Based Visual Attention for Rapid Scene Analysis. IEEE Trans. PAMI, 20(11):1254–1259, 1998.Google Scholar
  10. 10.
    T. Kalinke and W. v. Seelen. Entropie als Maß des lokalen Informationsgehalts in Bildern zur Realisierung einer Aufmerksamkeitssteuerung. In B. Jähne et al., editor, Mustererkennung 1996. Springer, Heidelberg, 1996.Google Scholar
  11. 11.
    T. Kohonen. Self-organization and associative memory. In Springer Series in Information Sciences 8. Springer-Verlag Heidelberg, 1984.Google Scholar
  12. 12.
    P. J. Locher and C. F. Nodine. Symmetry Catches the Eye. In A. Levy-Schoen and J. K. O’Reagan, editors, Eye Movements: From Physiology to Cognition, pages 353–361. Elsevier Science Publishers B. V. (North Holland), 1987.Google Scholar
  13. 13.
    D. Reisfeld, H. Wolfson, and Y. Yeshurun. Context-Free Attentional Operators: The Generalized Symmetry Transform. Int’l J. Comp. Vision, 14, 1995.Google Scholar
  14. 14.
    T. D. Sanger. Optimal Unsupervised Learning in a Single-Layer Linear Feedforward Neural Network. Neural Networks, 2:459–473, 1989.CrossRefGoogle Scholar
  15. 15.
    C. Schmid, R. Mohr, and C. Bauckhage. Evaluation of Interest Point Detectors. Int’l J. of Computer Vision, 37(2):151–172, 2000.MATHCrossRefGoogle Scholar
  16. 16.
    M. E. Tipping and C. M. Bishop. Mixtures of probabilistic principal component analyzers. Neural Computation, 11(2):443–482, 1999.CrossRefGoogle Scholar
  17. 17.
    D. Walther, L. Itti, M. Riesenhuber, T. Poggio, and C. Koch. Attentional Selection for Object Recognition — a Gentle Way. In Proc. 2nd Workshop on Biologically Motivated Computer Vision (BMCV’02), Tübingen, Germany, 2002.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Ingo Bax
    • 1
  • Holger Bekel
    • 1
  • Gunther Heidemann
    • 1
  1. 1.Neuroinformatics Group, Faculty of TechnologyBielefeld UniversityBielefeldGermany

Personalised recommendations