Machine Vision and Applications

, Volume 24, Issue 3, pp 651–666 | Cite as

Markerless tracking and gesture recognition using polar correlation of camera optical flow

  • Prince GuptaEmail author
  • Niels da Vitoria Lobo
  • Joseph J. LaviolaJr.
Original Paper


We present a novel, real-time, markerless vision-based tracking system, employing a rigid orthogonal configuration of two pairs of opposing cameras. Our system uses optical flow over sparse features to overcome the limitation of vision-based systems that require markers or a pre-loaded model of the physical environment. We show how opposing cameras enable cancellation of common components of optical flow leading to an efficient tracking algorithm that captures five degrees of freedom including direction of translation and angular velocity. Experiments comparing our device with an electromagnetic tracker show that its average tracking accuracy is 80 % over 185 frames, and it is able to track large range motions even in outdoor settings. We also present how our tracking system can be used for gesture recognition by combining it with a simple linear classifier over a set of 15 gestures. Experimental results show that we are able to achieve 86.7 % gesture recognition accuracy.


Optical flow Polar correlation Multi camera Markerless 

Supplementary material

138_2012_451_MOESM1_ESM.avi (9.8 mb)
ESM 1 (avi 10046 KB)


  1. 1.
    Beauchemin, S.S., Barron, J.L.: The computation of optical flow. ACM Comput. Surveys. 27(3), 433–466 (1995)CrossRefGoogle Scholar
  2. 2.
    Bishop, G., Fuchs, H.: The self-tracker: a smart optical sensor on silicon. In: Proceedings, Conference on Advanced Research in VLSI (1984)Google Scholar
  3. 3.
    Bouguet, J.-Y.: Pyramidal implementation of the affine lucas kanade feature tracker: description of the algorithm. Intel Corporation, Microprocessor Research Labs (2000)Google Scholar
  4. 4.
    Bradski, G.R.: Computer vision face tracking for use in a perceptual user interface. Intel Technol. J. 2(2), 12–21 (1998)Google Scholar
  5. 5.
    Bruss, A.R., Horn, B.K.P.: Passive navigation. Comput. Vis. Graphics Image Process. 21(1), 3–20 (1983)CrossRefGoogle Scholar
  6. 6.
    Chen, L., Özsu, M.T., Oria, V.: Robust and fast similarity search for moving object trajectories. In: Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 491–502. ACM, New York (2005) Google Scholar
  7. 7.
    Clipp, B., Kim, J-H., Frahm, J-M., Pollefeys, M., Hartley, R.: Robust 6dof motion estimation for non-overlapping, multi-camera systems. In: IEEE Workshop on Applications of Computer Vision, vol. 0, pp. 1–8 (2008)Google Scholar
  8. 8.
    Comport, A.I., Eric, M., Pressigout, M.: Fran?ois, C.: Real-time markerless tracking for augmented reality: the virtual visual servoing framework. IEEE Transact. Vis. Comput. Graphics 12(4), 615–628 (2006)CrossRefGoogle Scholar
  9. 9.
    Davis, J.W., Vaks, S.: A perceptual user interface for recognizing head gesture acknowledgements. In: Proceedings of the 2001 Workshop on Perceptive User Interfaces, pp. 1–7. ACM, New York (2001)Google Scholar
  10. 10.
    Davison, A.J., Reid, I.D., Molton, N.D., Stasse, O.: Monoslam: real-time single camera slam. IEEE Transact. Pattern Anal. Machine Intell. 29(6), 1052–1067 (2007)CrossRefGoogle Scholar
  11. 11.
    Demming, G.: Sony eyetoy \(^{TM}\): developing mental models for 3-D interaction in a 2-D gaming environment. In: Computer Human Interaction, vol. 3101, pp. 575–582. Springer, Berlin (2004)Google Scholar
  12. 12.
    Freeman, W.T., Tanaka, K., Ohta, J., Kyuma, K.: Computer vision for computer games. In: IEEE International Conference on Automatic Face and Gesture Recognition, vol. 0, p. 100 (1996)Google Scholar
  13. 13.
    Freeman, W.T., Anderson, D.B., Beardsley, P.A., Dodge, C.N., Roth, M., Weissman, C.D., Yerazunis, W.S., Kage, H., Kyuma, K., Miyake, Y., Tanaka, K.: Computer vision for interactive computer graphics, vol. 18, pp. 42–53. IEEE Computer Society, Los Alamitos (1998)Google Scholar
  14. 14.
    Gupta, P., da Vitoria Loba, N., LaViola, J.J. Jr.: Markerless tracking using polar correlation of camera optical flow. In: IEEE Virtual Reality Conference (2010)Google Scholar
  15. 15.
    Hämäläinen, P., Höysniemi, J.: A computer vision and hearing based user interface for a computer game for children. In: Universal Access Theoretical Perspectives, Practice, and Experience, pp. 299–318. Springer, Berlin (2003)Google Scholar
  16. 16.
    Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision. Cambridge University Press, New York (2003)Google Scholar
  17. 17.
    Höysniemi, J., Hämäläinen, P., Turkki, L., Rouvi, T.: Children’s intuitive gestures in vision-based action games. Commun. ACM. 48(1), 44–50 (2005)CrossRefGoogle Scholar
  18. 18.
    Jepson, A., Heeger, D.J.: Linear subspace methods for recovering translational direction, pp. 39–62 (1992)Google Scholar
  19. 19.
    Kaess, M., Dellaert, F.: Visual slam with a multi-camera rig. Technical Report GIT-GVU-06-06, Georgia Institute of Technology (2006)Google Scholar
  20. 20.
    Kim, J-H., Li, H., Hartley, R.: Motion estimation for multi-camera systems using global optimization. Comput. Vis. Pattern Recogn. 1–8 (2008)Google Scholar
  21. 21.
    Kim, J-H., Li, H., Hartley, R.: Motion estimation for non-overlapping multi-camera rigs: linear algebraic and \(l_{\infty }\) geometric solutions. IEEE Transact. Pattern Anal. Machine Intell. 99(1), 1044–1059 (2009)Google Scholar
  22. 22.
    Kim, J-S., Hwangbo, M., Kanade, T.: Motion estimation using multiple non-overlapping cameras for small unmanned aerial vehicles, pp. 3076–3081 (2008)Google Scholar
  23. 23.
    Koch, R., Koeser, K., Streckel, B., Evers-Senne, J.-F.: Markerless Image-based 3D Tracking for Real-time Augmented Reality Applications. Montreux, Switzerland (2005)Google Scholar
  24. 24.
    Lee, M.S., Weinshall, D., Solal, E.C., Colmenarez, A., Lyons, D.M.: A computer vision system for on-screen item selection by finger pointing. Comput. Vis. Pattern Recogn. 1, 1026–1033 (2001)Google Scholar
  25. 25.
    Li, H., Hartley, R.: Five-point motion estimation made easy. In: International Conference on Pattern Recognition, vol. 1, pp. 630–633 (2006)Google Scholar
  26. 26.
    Li, H., Hartley, R., Kim, J-H: A linear approach to motion estimation using generalized camera models. Comput. Vis. Pattern Recogn. 1–8 (2008)Google Scholar
  27. 27.
    Lim, J., Barnes, N.: Estimation of the epipole using optical flow at antipodal points. In: IEEE International Conference on Computer Vision, vol. 0, pp. 1–6 (2007)Google Scholar
  28. 28.
    Lim, J., Barnes, N.: Directions of egomotion from antipodal points. Comput. Vis. Pattern Recogn. 1–8 (2008)Google Scholar
  29. 29.
    Longuet-Higgins, H.C., Prazdny, K.: The interpretation of a moving retinal image. Proc. Roy. Soc. Lond. B208, 385–397 (1980)Google Scholar
  30. 30.
    Lowe, D.G.: Object recognition from local scale-invariant features. In: Proceedings of the International Conference on Computer Vision, p. 1150. IEEE Computer Society, Washington, DC (1999)Google Scholar
  31. 31.
    Moeslund, T., Strring, M., Orring, M., Granum, E.: Vision-based user interface for interacting with a virtual environment. In: Proceedings of the DANKOMB 2000, pp. 20–28 (2000)Google Scholar
  32. 32.
    Nistér, D.: An efficient solution to the five-point relative pose problem. IEEE Transact. Pattern Anal. Machine Intell. 26(6), 756–777 (2004)CrossRefGoogle Scholar
  33. 33.
    Nocedal, J., Wright, S.J.: Numerical Optimization, 2nd edn., XXII, 664 p. Springer, New York (2006)Google Scholar
  34. 34.
    Pless, R.: Camera cluster in motion: motion estimation for generalized camera designs. In: IEEE Robotics and Automation Magazine, vol. 11, pp. 39–44 (2004)Google Scholar
  35. 35.
    Rhalibi, A., Merabti, M., Fergus, P., Shen, Y.: Perceptual user interface as games controller. In: IEEE Consumer Communications and Networking Conference, vol. 10, pp. 1059–1064 (2008)Google Scholar
  36. 36.
    Segen, J., Kumar, S.: Gesture vr: vision-based 3d hand interace for spatial interaction. In: Proceedings of the Sixth ACM International Conference on Multimedia, pp. 455–464. ACM, New York (1998)Google Scholar
  37. 37.
    Shi, J., Tomasi, C.: Good features to track. In: IEEE Conference on Computer Vision and Pattern Recognition (1994)Google Scholar
  38. 38.
    Snavely, N., Seitz, S.M., Szeliski, R.: Photo tourism: exploring photo collections in 3d. ACM Transact Graphics (TOG) (2006)Google Scholar
  39. 39.
    Sundaresan, A., Chellappa, R.: Markerless motion capture using multiple cameras. Comput. Vis. Interact. Intell. Environ. 15–26 (2005)Google Scholar
  40. 40.
    Tariq, S., Dellaert, F.: A multi-camera 6-dof pose tracker. In: Proceedings of the 3rd IEEE/ACM International Symposium on Mixed and Augmented Reality, pp. 296–297. IEEE Computer Society, Washington, DC (2004)Google Scholar
  41. 41.
    Thomas, I., Simoncelli, E.P.: Linear structure from motion. Institute for Research in Cognitive Science Technical Report IRCS-94-26, University of Pennsylvania (1994)Google Scholar
  42. 42.
    Tsai, R.Y.: A versatile camera calibration technique for high-accuracy 3D machine vision metrology using off-the-shelf TV cameras and lenses. IEEE J. Robot. Autom. 3, 323–344 (1987)Google Scholar
  43. 43.
    Tsao, A-T., Fuh, C-S., Hung, Y-P., Chen, Y-S.: Ego-motion estimation using optical flow fields observed from multiple cameras. In: Computer Vision and Pattern Recognition, p. 457. IEEE Computer Society, Washington, DC (1997)Google Scholar
  44. 44.
    Turk, M.: Moving from guis to puis. In: Proceedings of Fourth Symposium on Intelligent Information (1998)Google Scholar
  45. 45.
    Welch, G., Bishop, G., Vicci, L., Brumback, S., Keller, K., Colucci, D.: High-performance wide-area optical tracking: the hiball tracking system. Presence Teleoper. Virtual Environ. 10(1), 1–21 (2001)CrossRefGoogle Scholar
  46. 46.
    Welch, G., Foxlin, E.: Motion tracking: no silver bullet, but a respectable arsenal. IEEE Comput. Graphics Appl. 22(6), 24–38 (2002)CrossRefGoogle Scholar
  47. 47.
    Wilson, A.D., Cutrell, E.: FlowMouse: a computer vision-based pointing and gesture input device. In: Proceedings of IFIP International Conference on Human-Computer Interaction, pp. 565–578. Springer, Berlin (2005)Google Scholar
  48. 48.
    Wu, A., Shah, M, da Vitoria Lobo, N. : A virtual 3d blackboard: 3d finger tracking using a single camera. In: Fourth IEEE International Conference On Automatic Face And Gesture Recognition, pp. 536–543 (2000)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Prince Gupta
    • 1
    Email author
  • Niels da Vitoria Lobo
    • 1
  • Joseph J. LaviolaJr.
    • 1
  1. 1.Department of Electrical Engineering and Computer ScienceUniversity of Central FloridaOrlandoUSA

Personalised recommendations