Skip to main content
Log in

Markerless tracking and gesture recognition using polar correlation of camera optical flow

Machine Vision and Applications Aims and scope Submit manuscript

Abstract

We present a novel, real-time, markerless vision-based tracking system, employing a rigid orthogonal configuration of two pairs of opposing cameras. Our system uses optical flow over sparse features to overcome the limitation of vision-based systems that require markers or a pre-loaded model of the physical environment. We show how opposing cameras enable cancellation of common components of optical flow leading to an efficient tracking algorithm that captures five degrees of freedom including direction of translation and angular velocity. Experiments comparing our device with an electromagnetic tracker show that its average tracking accuracy is 80 % over 185 frames, and it is able to track large range motions even in outdoor settings. We also present how our tracking system can be used for gesture recognition by combining it with a simple linear classifier over a set of 15 gestures. Experimental results show that we are able to achieve 86.7 % gesture recognition accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

References

  1. Beauchemin, S.S., Barron, J.L.: The computation of optical flow. ACM Comput. Surveys. 27(3), 433–466 (1995)

    Article  Google Scholar 

  2. Bishop, G., Fuchs, H.: The self-tracker: a smart optical sensor on silicon. In: Proceedings, Conference on Advanced Research in VLSI (1984)

  3. Bouguet, J.-Y.: Pyramidal implementation of the affine lucas kanade feature tracker: description of the algorithm. Intel Corporation, Microprocessor Research Labs (2000)

  4. Bradski, G.R.: Computer vision face tracking for use in a perceptual user interface. Intel Technol. J. 2(2), 12–21 (1998)

    Google Scholar 

  5. Bruss, A.R., Horn, B.K.P.: Passive navigation. Comput. Vis. Graphics Image Process. 21(1), 3–20 (1983)

    Article  Google Scholar 

  6. Chen, L., Özsu, M.T., Oria, V.: Robust and fast similarity search for moving object trajectories. In: Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 491–502. ACM, New York (2005)

  7. Clipp, B., Kim, J-H., Frahm, J-M., Pollefeys, M., Hartley, R.: Robust 6dof motion estimation for non-overlapping, multi-camera systems. In: IEEE Workshop on Applications of Computer Vision, vol. 0, pp. 1–8 (2008)

  8. Comport, A.I., Eric, M., Pressigout, M.: Fran?ois, C.: Real-time markerless tracking for augmented reality: the virtual visual servoing framework. IEEE Transact. Vis. Comput. Graphics 12(4), 615–628 (2006)

    Article  Google Scholar 

  9. Davis, J.W., Vaks, S.: A perceptual user interface for recognizing head gesture acknowledgements. In: Proceedings of the 2001 Workshop on Perceptive User Interfaces, pp. 1–7. ACM, New York (2001)

  10. Davison, A.J., Reid, I.D., Molton, N.D., Stasse, O.: Monoslam: real-time single camera slam. IEEE Transact. Pattern Anal. Machine Intell. 29(6), 1052–1067 (2007)

    Article  Google Scholar 

  11. Demming, G.: Sony eyetoy \(^{TM}\): developing mental models for 3-D interaction in a 2-D gaming environment. In: Computer Human Interaction, vol. 3101, pp. 575–582. Springer, Berlin (2004)

  12. Freeman, W.T., Tanaka, K., Ohta, J., Kyuma, K.: Computer vision for computer games. In: IEEE International Conference on Automatic Face and Gesture Recognition, vol. 0, p. 100 (1996)

  13. Freeman, W.T., Anderson, D.B., Beardsley, P.A., Dodge, C.N., Roth, M., Weissman, C.D., Yerazunis, W.S., Kage, H., Kyuma, K., Miyake, Y., Tanaka, K.: Computer vision for interactive computer graphics, vol. 18, pp. 42–53. IEEE Computer Society, Los Alamitos (1998)

  14. Gupta, P., da Vitoria Loba, N., LaViola, J.J. Jr.: Markerless tracking using polar correlation of camera optical flow. In: IEEE Virtual Reality Conference (2010)

  15. Hämäläinen, P., Höysniemi, J.: A computer vision and hearing based user interface for a computer game for children. In: Universal Access Theoretical Perspectives, Practice, and Experience, pp. 299–318. Springer, Berlin (2003)

  16. Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision. Cambridge University Press, New York (2003)

    Google Scholar 

  17. Höysniemi, J., Hämäläinen, P., Turkki, L., Rouvi, T.: Children’s intuitive gestures in vision-based action games. Commun. ACM. 48(1), 44–50 (2005)

    Article  Google Scholar 

  18. Jepson, A., Heeger, D.J.: Linear subspace methods for recovering translational direction, pp. 39–62 (1992)

  19. Kaess, M., Dellaert, F.: Visual slam with a multi-camera rig. Technical Report GIT-GVU-06-06, Georgia Institute of Technology (2006)

  20. Kim, J-H., Li, H., Hartley, R.: Motion estimation for multi-camera systems using global optimization. Comput. Vis. Pattern Recogn. 1–8 (2008)

  21. Kim, J-H., Li, H., Hartley, R.: Motion estimation for non-overlapping multi-camera rigs: linear algebraic and \(l_{\infty }\) geometric solutions. IEEE Transact. Pattern Anal. Machine Intell. 99(1), 1044–1059 (2009)

    Google Scholar 

  22. Kim, J-S., Hwangbo, M., Kanade, T.: Motion estimation using multiple non-overlapping cameras for small unmanned aerial vehicles, pp. 3076–3081 (2008)

  23. Koch, R., Koeser, K., Streckel, B., Evers-Senne, J.-F.: Markerless Image-based 3D Tracking for Real-time Augmented Reality Applications. Montreux, Switzerland (2005)

    Google Scholar 

  24. Lee, M.S., Weinshall, D., Solal, E.C., Colmenarez, A., Lyons, D.M.: A computer vision system for on-screen item selection by finger pointing. Comput. Vis. Pattern Recogn. 1, 1026–1033 (2001)

    Google Scholar 

  25. Li, H., Hartley, R.: Five-point motion estimation made easy. In: International Conference on Pattern Recognition, vol. 1, pp. 630–633 (2006)

  26. Li, H., Hartley, R., Kim, J-H: A linear approach to motion estimation using generalized camera models. Comput. Vis. Pattern Recogn. 1–8 (2008)

  27. Lim, J., Barnes, N.: Estimation of the epipole using optical flow at antipodal points. In: IEEE International Conference on Computer Vision, vol. 0, pp. 1–6 (2007)

  28. Lim, J., Barnes, N.: Directions of egomotion from antipodal points. Comput. Vis. Pattern Recogn. 1–8 (2008)

  29. Longuet-Higgins, H.C., Prazdny, K.: The interpretation of a moving retinal image. Proc. Roy. Soc. Lond. B208, 385–397 (1980)

    Google Scholar 

  30. Lowe, D.G.: Object recognition from local scale-invariant features. In: Proceedings of the International Conference on Computer Vision, p. 1150. IEEE Computer Society, Washington, DC (1999)

  31. Moeslund, T., Strring, M., Orring, M., Granum, E.: Vision-based user interface for interacting with a virtual environment. In: Proceedings of the DANKOMB 2000, pp. 20–28 (2000)

  32. Nistér, D.: An efficient solution to the five-point relative pose problem. IEEE Transact. Pattern Anal. Machine Intell. 26(6), 756–777 (2004)

    Article  Google Scholar 

  33. Nocedal, J., Wright, S.J.: Numerical Optimization, 2nd edn., XXII, 664 p. Springer, New York (2006)

  34. Pless, R.: Camera cluster in motion: motion estimation for generalized camera designs. In: IEEE Robotics and Automation Magazine, vol. 11, pp. 39–44 (2004)

  35. Rhalibi, A., Merabti, M., Fergus, P., Shen, Y.: Perceptual user interface as games controller. In: IEEE Consumer Communications and Networking Conference, vol. 10, pp. 1059–1064 (2008)

  36. Segen, J., Kumar, S.: Gesture vr: vision-based 3d hand interace for spatial interaction. In: Proceedings of the Sixth ACM International Conference on Multimedia, pp. 455–464. ACM, New York (1998)

  37. Shi, J., Tomasi, C.: Good features to track. In: IEEE Conference on Computer Vision and Pattern Recognition (1994)

  38. Snavely, N., Seitz, S.M., Szeliski, R.: Photo tourism: exploring photo collections in 3d. ACM Transact Graphics (TOG) (2006)

  39. Sundaresan, A., Chellappa, R.: Markerless motion capture using multiple cameras. Comput. Vis. Interact. Intell. Environ. 15–26 (2005)

  40. Tariq, S., Dellaert, F.: A multi-camera 6-dof pose tracker. In: Proceedings of the 3rd IEEE/ACM International Symposium on Mixed and Augmented Reality, pp. 296–297. IEEE Computer Society, Washington, DC (2004)

  41. Thomas, I., Simoncelli, E.P.: Linear structure from motion. Institute for Research in Cognitive Science Technical Report IRCS-94-26, University of Pennsylvania (1994)

  42. Tsai, R.Y.: A versatile camera calibration technique for high-accuracy 3D machine vision metrology using off-the-shelf TV cameras and lenses. IEEE J. Robot. Autom. 3, 323–344 (1987)

    Google Scholar 

  43. Tsao, A-T., Fuh, C-S., Hung, Y-P., Chen, Y-S.: Ego-motion estimation using optical flow fields observed from multiple cameras. In: Computer Vision and Pattern Recognition, p. 457. IEEE Computer Society, Washington, DC (1997)

  44. Turk, M.: Moving from guis to puis. In: Proceedings of Fourth Symposium on Intelligent Information (1998)

  45. Welch, G., Bishop, G., Vicci, L., Brumback, S., Keller, K., Colucci, D.: High-performance wide-area optical tracking: the hiball tracking system. Presence Teleoper. Virtual Environ. 10(1), 1–21 (2001)

    Article  Google Scholar 

  46. Welch, G., Foxlin, E.: Motion tracking: no silver bullet, but a respectable arsenal. IEEE Comput. Graphics Appl. 22(6), 24–38 (2002)

    Article  Google Scholar 

  47. Wilson, A.D., Cutrell, E.: FlowMouse: a computer vision-based pointing and gesture input device. In: Proceedings of IFIP International Conference on Human-Computer Interaction, pp. 565–578. Springer, Berlin (2005)

  48. Wu, A., Shah, M, da Vitoria Lobo, N. : A virtual 3d blackboard: 3d finger tracking using a single camera. In: Fourth IEEE International Conference On Automatic Face And Gesture Recognition, pp. 536–543 (2000)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Prince Gupta.

Electronic supplementary material

The Below is the Electronic Supplementary Material.

ESM 1 (avi 10046 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gupta, P., da Vitoria Lobo, N. & Laviola, J.J. Markerless tracking and gesture recognition using polar correlation of camera optical flow. Machine Vision and Applications 24, 651–666 (2013). https://doi.org/10.1007/s00138-012-0451-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00138-012-0451-3

Keywords

Navigation