An Object-Oriented Approach Using a Top-Down and Bottom-Up Process for Manipulative Action Recognition

  • Zhe Li
  • Jannik Fritsch
  • Sven Wachsmuth
  • Gerhard Sagerer
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4174)


Different from many gesture-based human-robot interaction applications, which focused on the recognition of the interactional or the pointing gestures, this paper proposes a vision-based method for manipulative gesture recognition aiming to achieve natural, proactive, and non-intrusive interaction between humans and robots. The main contributions of the paper are an object-centered scheme for the segmentation and characterization of hand trajectory information, the use of particle filtering methods for an action primitive spotting, and the tight coupling of bottom-up and top-down processing that realizes a task-driven attention filter for low-level recognition steps. In contrast to purely trajectory based techniques, the presented approach is called object-oriented w.r.t. two different aspects: it is object-centered in terms of trajectory features that are defined relative to an object, and it uses object-specific models for action primitives. The system has a two-layer structure recognizing both the HMM-modeled manipulative primitives and the underlying task characterized by the manipulative primitive sequence. The proposed top-down and bottom-up mechanism between the two layers decreases the image processing load and improves the recognition rate.


Gesture Recognition Object Type Action Primitive Hand Trajectory Object Context 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Black, M.J., Jepson, A.D.: A Probabilistic Framework for Matching Temporal Trajectories: CONDENSATION-Based Recognition of Gestures and Expressions. In: Burkhardt, H.-J., Neumann, B. (eds.) ECCV 1998. LNCS, vol. 1406, pp. 909–924. Springer, Heidelberg (1998)Google Scholar
  2. 2.
    Bobick, A.: Movement, activity, and action: The role of knowledge in the perception of motion. In: Royal Society Workshop on Knowledge-based Vision in Man and Machine (1998)Google Scholar
  3. 3.
    Chan, M.T., Hoogs, A., Schmiederer, J., Petersen, M.: Detecting rare events in video using semantic primitives with hmm. In: ICPR 2004, vol. IV, pp. 150–154 (2004)Google Scholar
  4. 4.
    Fritsch, J.: Vision-based Recognition of Gestures with Context. Dissertation, Bielefeld University, Technical Faculty (2003)Google Scholar
  5. 5.
    Fritsch, J., Hofemann, N., Sagerer, G.: Combining Sensory and Symbolic Data for Manipulative Gesture Recognition. In: Proc. IEEE ICPR, Cambridge, UK, pp. 930–933 (2004)Google Scholar
  6. 6.
    Fukuda, T., Nakauchi, Y., Noguchi, K., Matsubara, T.: Time series action support by mobile robot in intelligent environment. In: Proc. IEEE Int’l. Conf. Robotics and Automation, Barcelona, Spain, pp. 2908–2913 (2005)Google Scholar
  7. 7.
    Isard, M., Blake, A.: Condensation – conditional density propagation for visual tracking. Int. J. Computer Vision, 5–28 (1998)Google Scholar
  8. 8.
    Jo, K.H., Kuno, Y., Shirai, Y.: Manipulative hand gesture recognition using task knowledge for human computer interaction. In: Proc. Int’l. Conf. on Automatic Face and Gesture Recognition, pp. 468–473 (1998)Google Scholar
  9. 9.
    Li, Z., Hofemann, N., Fritsch, J., Sagerer, G.: Hierarchical modeling and recognition of manipulative gesture. In: Proc. ICCV, Workshop on Modeling People and Human Interaction, Beijing, China. IEEE, Los Alamitos (2005)Google Scholar
  10. 10.
    Moore, D.J., Essa, I.A., Hayes III, M.H.: Exploiting human actions and object context for recognition tasks. In: Proc. IEEE Int’l. Conf. Computer Vision, pp. 20–27 (1999)Google Scholar
  11. 11.
    Nehaniv, C.P.: Classifying types of gesture and inferring intent. In: Proceedings of the Symposium on Robot Companions: Hard problems and Open Challenges in Robot-Human Interaction AISB 2005, Hatfield, UK, pp. 74–81 (2005)Google Scholar
  12. 12.
    Pinhanez, C.S., Bobick, A.F.: Human action detection using pnf propagation of temporal constraints. In: Proc. IEEE CVPR, Washington, DC, USA, pp. 898–907 (1998)Google Scholar
  13. 13.
    Rabiner, L.R.: A tutorial on hidden markov models and selected applications in speech recognition. In: Readings in speech recognition, pp. 267–296. Morgan Kaufmann Publishers Inc., San Francisco (1990)Google Scholar
  14. 14.
    Viola, P., Jones, M.: Robust real-time object detection. In: Proc. IEEE Int. Workshop on Statistical and Computational Theories of Vision, Vancouver, Canada (2001)Google Scholar
  15. 15.
    Yu, C., Ballard, D.H.: Learning to Recognize Human Action Sequences. In: 2nd International Conference on Development and Learning (ICDL 2002), pp. 28–34 (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Zhe Li
    • 1
  • Jannik Fritsch
    • 1
  • Sven Wachsmuth
    • 1
  • Gerhard Sagerer
    • 1
  1. 1.Bielefeld UniversityBielefeldGermany

Personalised recommendations