Machine Vision and Applications

, Volume 24, Issue 5, pp 931–946 | Cite as

Multi-cue hand detection and tracking for a head-mounted augmented reality system

  • Oytun Akman
  • Ronald Poelman
  • Wouter Caarls
  • Pieter Jonker
Original Paper


With the recent developments in wearable augmented reality (AR), the role of natural human–computer interaction is becoming more important. Utilization of auxiliary hardware for interaction introduces extra complexity, weight and cost to wearable AR systems and natural means of interaction such as gestures are therefore more desirable. In this paper, we present a novel multi-cue hand detection and tracking method for head-mounted AR systems which combines depth, color, intensity and curvilinearity. The combination of different cues increases the detection rate, eliminates the background regions and therefore increases the tracking performance under challenging conditions. Detected hand positions and the trajectories are used to perform actions such as click, select, etc. Moreover, the 6 DOF poses of the hands are calculated by approximating the segmented regions with planes in order to render a planar menu (interface) around the hand and use the hand as a planar selection tool. The proposed system is tested on different scenarios (including markers for reference) and the results show that our system can detect and track the hands successfully in challenging conditions such as cluttered and dynamic environments and illumination variance. The proposed hand tracker outperforms other well-known hand trackers under these conditions.


Human–computer interaction Hand detection Hand-pose estimation Tracking Augmented reality  3D graphical user interface 

Supplementary material

138_2013_500_MOESM1_ESM.avi (35.5 mb)
Supplementary caption 1 with my own citation
138_2013_500_MOESM2_ESM.avi (13.9 mb)
Supplementary caption 2 with my own citation
138_2013_500_MOESM3_ESM.avi (16 mb)
Supplementary caption 3 with my own citation
138_2013_500_MOESM4_ESM.avi (13.8 mb)
Supplementary caption 4 with my own citation
138_2013_500_MOESM5_ESM.avi (6.7 mb)
Supplementary caption 5 with my own citation
138_2013_500_MOESM6_ESM.avi (4.8 mb)
Supplementary caption 6 with my own citation


  1. 1.
    Appenrodt, J., Al-Hamadi, A., Elmezain, M., Michaelis, B.: Data gathering for gesture recognition systems based on mono color-, stereo color- and thermal cameras. In: Future Generation Information Technology, Lecture Notes in Computer Science, pp. 78–86. Springer, Berlin (2009)Google Scholar
  2. 2.
    Argyros, A.A., Lourakis, M.I.A.: Real-time tracking of multiple skin-colored objects with a possibly moving camera. In: ECCV, pp. 368–379 (2004)Google Scholar
  3. 3.
    Baraldi, S., Bimbo, A.D., Landucci, L., Valli, A.: wikitable: finger driven interaction for collaborative knowledge-building workspaces. In: Computer Vision and Pattern Recognition Workshop, vol. 144 (2006)Google Scholar
  4. 4.
    Bradski, G.: Computer Video Face Tracking for use in a Perceptual User Interface. Technical report Intel (1998)Google Scholar
  5. 5.
    Caglar, M., Lobo, N.: Open hand detection in a cluttered single image using finger primitives. In: Computer Vision and Pattern Recognition Workshop, 2006. CVPRW ’06. Conference on, p. 148 (2006)Google Scholar
  6. 6.
    Canny, J.F.: Finding Edges and Lines in Images. Technial report. MIT Artificial Intelligence Laboratory (1983)Google Scholar
  7. 7.
    Comaniciu, D., Meer, P.: Mean shift: a robust approach toward feature space analysis. IEEE Trans. Pattern Anal. Mach. Intel. 24(5), 603–619 (2002)CrossRefGoogle Scholar
  8. 8.
    Coogan, T., Awad, G., Han, J., Sutherland, A.: Real time hand gesture recognition including hand segmentation and tracking. In: Advances in Visual Computing, Lecture Notes in Computer Science, pp. 495–504. Springer, Berlin (2006)Google Scholar
  9. 9.
    de La Gorce, M., Paragios, N., Fleet, D.J.: Model-based hand tracking with texture, shading and self-occlusions. In: IEEE Computer Society Conference on, Computer Vision and Pattern Recognition, pp. 1–8 (2008)Google Scholar
  10. 10.
    Darrell, T., Gordon, G., Harville, M., Woodfill, J.: Integrated person tracking using stereo, color, and pattern detection. Int. J. Comput. Vis. 37, 175–185 (2000)zbMATHCrossRefGoogle Scholar
  11. 11.
    Delamarre, Q., Faugeras, O.: Finding pose of hand in video images: a stereo-based approach. In: Proceedings. Third IEEE International Conference on, Automatic Face and Gesture Recognition, 1998, pp. 585–590 (1998)Google Scholar
  12. 12.
    Deselaers, T., Criminisi, A., Winn, J., Agarwal, A.: Incorporating on-demand stereo for real time recognition. In: IEEE Conference on, Computer Vision and Pattern Recognition, 2007. CVPR ’07, pp. 1–8 (2007)Google Scholar
  13. 13.
    Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification (2nd edn.). Wiley-Interscience, London (2000)Google Scholar
  14. 14.
    Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981)Google Scholar
  15. 15.
    Foxlin, E., Altshuler, Y., Naimark, L., Harrington, M.: Flighttracker: A novel optical/inertial tracker for cockpit enhanced vision. In: Proceedings of Third IEEE and ACM International Symposium on Mixed and Augmented Reality, pp. 212–221 (2004)Google Scholar
  16. 16.
    Fuchs, H., Livingston, M.A., Raskar, R., Colucci, D., Keller, K., State, A., Crawford, J.R., Rademacher, P., Drake, S.H., Meyer, A.A.: Augmented reality visualization for laparoscopic surgery. In: Proceedings of the First International Conference on Medical Image Computing and Computer-Assisted Intervention, MICCAI ’98, pp. 934–943. Springer, London (1998)Google Scholar
  17. 17.
    Ghobadi, S., Loepprich, O., Ahmadov, F., Bernshausen, J., Hartmann, K., Loffeld, O.: Real time hand based robot control using 2d/3d images. In: Advances in Visual Computing, Lecture Notes in Computer Science, pp. 307–316. Springer, Berlin (2008)Google Scholar
  18. 18.
    Graf, H.P., Cosatto, E., Gibbon, D., Kocheisen, M., Petaja, E.: Multi-modal system for locating heads and faces. In: IEEE International Conference on, Automatic Face and Gesture Recognition, p. 88 (1996)Google Scholar
  19. 19.
    Grzeszcuk, R., Bradski, G., Chu, M., Bouguet, J.Y.: Stereo based gesture recognition invariant to 3d pose and lighting. In: Proceedings. IEEE Conference on, Computer Vision and Pattern Recognition, 2000, vol. 1, pp. 826–833 (2000)Google Scholar
  20. 20.
    Hirschmuller, H.: Stereo processing by semiglobal matching and mutual information. IEEE Trans. Pattern Anal. Mach. Intel. 30, 328–341 (2008)Google Scholar
  21. 21.
    Irawati, S., Green, S., Billinghurst, M., Duenser, A., Ko, H.: “Move the couch where?”: developing an augmented reality multimodal interface. In: IEEE/ACM International Symposium on, Mixed and Augmented Reality, 2006. ISMAR 2006, pp. 183–186 (2006)Google Scholar
  22. 22.
    Jones, M.J., Rehg, J.M.: Statistical color models with application to skin detection. Int. J. Comput. Vis. 46, 81–96 (2002)zbMATHCrossRefGoogle Scholar
  23. 23.
    Kato, H., Billinghurst, M.: Marker tracking and hmd calibration for a video-based augmented reality conferencing system. In: Proceedings of the 2nd IEEE and ACM International Workshop on Augmented Reality, p. 85. IEEE Computer Society, Washington, DC (1999)Google Scholar
  24. 24.
    Kerawalla, L., Luckin, R., Seljeflot, S., Woolard, A.: Making it real: exploring the potential of augmented reality for teaching primary school science. Virtual Real. 10, 163–174 (2006)Google Scholar
  25. 25.
    Koller, T., Gerig, G., Szekely, G., Dettwiler, D.: Multiscale detection of curvilinear structures in 2-d and 3-d image data. In: Proceedings., Fifth International Conference on, Computer Vision, 1995. pp. 864–869 (1995)Google Scholar
  26. 26.
    Kolsch, M., Turk, M.: Fast 2d hand tracking with flocks of features and multi-cue integration. In: Computer Vision and Pattern Recognition Workshop, vol 10, p. 158 (2004)Google Scholar
  27. 27.
    Lee, M., Green, R., Billinghurst, M.: 3d natural hand interaction for ar applications. In: Image and Vision Computing New Zealand, 2008. IVCNZ 2008. 23rd International Conference, pp. 1–6 (2008)Google Scholar
  28. 28.
    Lee, S.H., Yoon, Y.I., Choi, J.H., Lee, C.W., Kim, J.T., Choi, J.S.: AR squash game. In: IEEE International Conference on, Information Reuse and Integration, 2006, pp. 579–584 (2006)Google Scholar
  29. 29.
    Lee, T., Hollerer, T.: Handy AR: Markerless inspection of augmented reality objects using fingertip tracking. In: 11th International Symposium on Wearable Computers (2007)Google Scholar
  30. 30.
    Lu, S., Metaxas, D., Samaras, D., Oliensis, J.: Using multiple cues for hand tracking and model refinement. In: Proceedings. 2003 IEEE Computer Society Conference on, Computer Vision and Pattern Recognition, 2003. vol. 2, pp. II-443–II-50 (2003)Google Scholar
  31. 31.
    MacLean, J., Herpers, R., Pantofaru, C., Wood, L., Derpanis, K., Topalovic, D., Tsotsos, J.: Fast hand gesture recognition for real-time teleconferencing applications. In: Proceedings. IEEE ICCV Workshop on, Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems, 2001, pp. 133–140 (2001)Google Scholar
  32. 32.
    Malassiotis, S., Tsalakanidou, F., Mavridis, N., Giagourta, V., Grammalidis, N., Strintzis, M.G.: A face and gesture recognition system based on an active stereo sensor. In: Proceedings of International Conference on Image Processing. Thessaloniki, pp. 7–10 (2001)Google Scholar
  33. 33.
    Manders, C., Farbiz, F., Chong, J., Tang, K., Chua, G., Loke, M., Yuan, M.: Robust hand tracking using a skin tone and depth joint probability model. In: 8th IEEE International Conference on, Automatic Face Gesture Recognition, 2008. FG ’08. pp. 1–6 (2008)Google Scholar
  34. 34.
    Merrill, D., Maes, P.: Augmenting looking, pointing and reaching gestures to enhance the searching and browsing of physical objects. In: Pervasive Computing, Lecture Notes in Computer Science, pp. 1–18. Springer, Berlin (2007)Google Scholar
  35. 35.
    Ng, C.W., Ranganath, S.: Real-time gesture recognition system and application. Image Vis. Comput. 20(13–14), 993–1007 (2002) Google Scholar
  36. 36.
    Oka, K., Sato, Y., Koike, H.: Real-time tracking of multiple fingertips and gesture recognition for augmented desk interface systems. In: Proceedings. Fifth IEEE International Conference on, Automatic Face and Gesture Recognition, 2002, pp. 429–434 (2002)Google Scholar
  37. 37.
    Park, J., Yoon, Y.L.: Led-glove based interactions in multi-modal displays for teleconferencing. In: ICAT ’06. 16th International Conference on, Artificial Reality and Telexistence-Workshops, 2006. pp. 395–399 (2006)Google Scholar
  38. 38.
    Petersen, N., Stricker, D.: Fast hand detection using posture invariant constraints. In: Mertsching, B., Hund, M., Aziz, Z. (eds.) KI 2009: Advances in Artificial Intelligence, Lecture Notes in Computer Science, vol. 5803, pp. 106–113. Springer, Berlin (2009)Google Scholar
  39. 39.
    Piekarski, W., Thomas, B.H.: Thumbsup: Integrated command and pointer interactions for mobile outdoor augmented reality systems. In: HCI International (2003)Google Scholar
  40. 40.
    Poelman, R., Akman, O., Lukosch, S., Jonker, P.: As if being there mediated reality for crime scene investigation. In: The ACM Conference on Computer Supported Cooperative Work (2012)Google Scholar
  41. 41.
    Saxe, D., Foulds, R.: Toward robust skin identification in video images. In: Proceedings of the Second International Conference on, Automatic Face and Gesture Recognition, 1996, pp. 379–384 (1996)Google Scholar
  42. 42.
    Schlattman, M., Klein, R.: Simultaneous 4 gestures 6 dof real-time two-hand tracking without any markers. In: VRST ’07: Proceedings of the 2007 ACM symposium on Virtual reality software and technology, pp. 39–42. ACM, New York (2007)Google Scholar
  43. 43.
    Soler, L., Nicolau, S., Schmid, J., Koehl, C., Marescaux, J., Pennec, X., Ayache, N.: Virtual reality and augmented reality in digestive surgery. In: Third IEEE and ACM International Symposium on, Mixed and Augmented Reality, 2004. ISMAR 2004. pp. 278–279 (2004)Google Scholar
  44. 44.
    Soutschek, S., Penne, J., Hornegger, J., Kornhuber, J.: 3-d gesture-based scene navigation in medical imaging applications using time-of-flight cameras. In: IEEE Computer Society Conference on, Computer Vision and Pattern Recognition Workshops, 2008. CVPRW ’08, pp. 1–6 (2008)Google Scholar
  45. 45.
    Wang, Q., Chen, X., Gao, W.: Skin color weighted disparity competition for hand segmentation from stereo camera. In: Proceedings of the British Machine Vision Conference, pp. 66.1–66.11. BMVA Press (2010)Google Scholar
  46. 46.
    Wang, R.Y., Popovic, J.: Real-time hand-tracking with a color glove. ACM Trans. Graph. 28(3), 63-1–63-8 (2009)Google Scholar
  47. 47.
    Welch, G., Bishop, G.: An Introduction to the Kalman Filter. Technical report. Chapel Hill (1995)Google Scholar
  48. 48.
    Yang, M.H., Kriegman, D., Ahuja, N.: Detecting faces in images: a survey. IEEE Trans. Pattern Anal. Mach. Intel. 24(1), 34–58 (2002)CrossRefGoogle Scholar
  49. 49.
    Ye, G., Corso, J., Hager, G.: Gesture recognition using 3d appearance and motion features. In: CVPRW ’04. Conference on, Computer Vision and Pattern Recognition Workshop, 2004, p. 160 (2004)Google Scholar
  50. 50.
    Zhu, Y., Xu, G., Kriegman, D.J.: A real-time approach to the spotting, representation, and recognition of hand gestures for human-computer interaction. Comput. Vis. Image Underst. 85(3), 189–208 (2002)zbMATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Oytun Akman
    • 1
  • Ronald Poelman
    • 2
  • Wouter Caarls
    • 1
  • Pieter Jonker
    • 1
  1. 1.Delft Biorobotics Laboratory, Department of Biomechanical EngineeringDelft University of TechnologyDelftThe Netherlands
  2. 2.Section Systems Engineering, Faculty of Technology, Policy and ManagementDelft University of TechnologyDelftThe Netherlands

Personalised recommendations