Autonomous Robots

, Volume 25, Issue 3, pp 267–286 | Cite as

Visual-model-based, real-time 3D pose tracking for autonomous navigation: methodology and experiments

  • Hans de RuiterEmail author
  • Beno Benhabib


This paper presents a novel 3D-model-based computer-vision method for tracking the full six degree-of-freedom (dof) pose (position and orientation) of a rigid body, in real-time. The methodology has been targeted for autonomous navigation tasks, such as interception of or rendezvous with mobile targets. Tracking an object’s complete six-dof pose makes the proposed algorithm useful even when targets are not restricted to planar motion (e.g., flying or rough-terrain navigation). Tracking is achieved via a combination of textured model projection and optical flow. The main contribution of our work is the novel combination of optical flow with z-buffer depth information that is produced during model projection. This allows us to achieve six-dof tracking with a single camera.

A localized illumination normalization filter also has been developed in order to improve robustness to shading. Real-time operation is achieved using GPU-based filters and a new data-reduction algorithm based on colour-gradient redundancy, which was developed within the framework of our project. Colour-gradient redundancy is an important property of colour images, namely, that the gradients of all colour channels are generally aligned. Exploiting this property provides a threefold increase in speed. A processing rate of approximately 80 to 100 fps has been obtained in our work when utilizing synthetic and real target-motion sequences. Sub-pixel accuracies were obtained in tests performed under different lighting conditions.


Computer vision Real-time object tracking Pose tracking Mobile-robot navigation 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Advanced Multimedia Processing Lab. (2006, January). The self-reconfigurable camera array. Carnegie Mellon University. [Online]. Available:
  2. Andrews, R., & Lovell, B. (2003). Color optical flow. In Workshop on digital image computing (Vol. 1, pp. 135–139) Brisbane, Australia, February 2003. Google Scholar
  3. ATI Technologies Inc. (2004, December). Radeon X800 graphics technology. [Online]. Available:
  4. Barron, J., & Klette, R. (2002). Quantitative color optical flow. In 16th international conference on pattern recognition (Vol. 4, pp. 251–255), Quebec City, Canada, August 2002. Google Scholar
  5. Chen, J., & Stockman, G. (1996). Determining pose of 3D objects with curved surfaces. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(1), 52–57. zbMATHCrossRefGoogle Scholar
  6. Collins, G., & Dennis, L. A. (2000). A system for video surveillance and monitoring. In International conference on automated deduction (pp. 497–501), Pittsburgh, PA, June 2000. Google Scholar
  7. Comport, A., Marchand, E., & Chaumette, F. (2003). A real-time tracker for markerless augmented reality. In IEEE and ACM international symposium on mixed and augmented reality (pp. 36–45), Tokyo, Japan, October 2003. Google Scholar
  8. Corke, P., & Good, M. (1996). Dynamic effects in visual closed-loop systems. IEEE Transactions on Robotics and Automation, 12(5), 671–683. CrossRefGoogle Scholar
  9. Dickmanns, E., & Graefe, V. (1988). Dynamic monocular machine vision. Machine Vision and Applications, 1(4), 223–240. CrossRefGoogle Scholar
  10. Drummond, T., & Cipolla, R. (2002). Real-time visual tracking of complex structures. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(7), 932–946. CrossRefGoogle Scholar
  11. Dunker, J., Hartmann, G., & Stöhr, M. (1996). Single view recognition and pose estimation of 3D objects using sets of prototypical views and spatially tolerant ontour representations. In International conference on pattern recognition (Vol. 4, pp. 14–18), Vienna, Austria, August 1996. Google Scholar
  12. Ekvall, S., Hoffman, F., & Kragic, D. (2003). Object recognition and pose estimation for robotic manipulation using color coocurrence histograms. In International conference on robots and systems (Vol. 2, pp. 1284–1289), Las Vegas, NV, October 2003. Google Scholar
  13. Fagerer, C., Dickmanns, D., & Dickmanns, E. (1994). Visual grasping with long delay time of a free floating object in orbit. Autonomous Robots, 1(1), 53–68. CrossRefGoogle Scholar
  14. Fan, Y., & Balasuriya, A. (2001). Target tracking by underwater robots. In IEEE international conference on systems, man, and cybernetics (pp. 696–701), Tucson, AZ, October 2001. Google Scholar
  15. Farmer, M., Hsu, R., & Jain, A. (2002). Interacting multiple model (IMM) Kalman filters for robust high speed motion tracking. In International conference on pattern recognition (Vol. 2, pp. 20–23), Québec City, August 2002. Google Scholar
  16. Fung, J. (2004, November). Parallel computer graphics architectures for computer vision. EyeTap Personal Imaging (ePI) Lab, Edward S. Rogers Dept. of Electrical and Computer Eng., University of Toronto. [Online]. Available:
  17. Ginhoux, R., & Gutmann, J. (2001). Model-based object tracking using stereo vision. In IEEE international conference on robotics and automation (pp. 1226–1232), Seoul, Korea, May 2001. Google Scholar
  18. Gong, H., Yang, Q., Pan, C., & Lu, H. (2004). Generalized optical flow in the scale space. In IEEE international conference on image and graphics (pp. 536–539), Hong Kong, China, December 2004. Google Scholar
  19. Hager, G. D., & Belhumeur, P. N. (1996). Real-time tracking of image regions with changes in geometry and illumination. In IEEE conference on computer vision and pattern recognition (pp. 403–410), San Francisco, CA. Google Scholar
  20. Han, M., Xu, W., Tao, H., & Gong, Y. (2004). An algorithm for multiple object trajectory tracking. In IEEE conference on computer vision and pattern recognition (pp. 864–871), Washington, DC, June–July 2004. Google Scholar
  21. Hartley, R., & Kang, S. (2005). Parameter-free radial distortion correction with centre of distortion estimation. In IEEE international conference on computer vision (Vol. 2, pp. 1834–1841), Canberra, Australia, October 2005. Google Scholar
  22. Hujic, D., Croft, E., Zak, G., Fenton, R., Mills, J., & Benhabib, B. (1998). The robotic interception of moving objects in industrial settings: strategy development and experiment. IEEE/ASME Transactions on Mechatronics, 3(3), 225–239. CrossRefGoogle Scholar
  23. Hutchinson, S., Hager, G., & Corke, P. (1996). A tutorial on visual servo control. IEEE Transactions on Robotics and Automation, 12(5), 651–670. CrossRefGoogle Scholar
  24. Hyams, J., Powell, M., & Murphy, R. (2000). Cooperative navigation of micro-rovers using color segmentation. Autonomous Robots, 9(1), 7–16. CrossRefGoogle Scholar
  25. InforMedia Services. (2006, January). Images. St. Cloud State University. [Online]. Available:
  26. INTEL. (2004, November). The software vectorization handbook, errata. [Online]. Available:
  27. Isard, M., & Blake, A. (1998). Condensation—conditional density propagation for visual tracking. International Journal of Computer Vision, 29(1), 5–28. CrossRefGoogle Scholar
  28. Jepson, A., Fleet, D., & El-Maraghi, T. (2001). Robust online appearance models for visual tracking. In IEEE conference on computer vision and pattern recognition (pp. 415–422), Kauai, HI. Google Scholar
  29. Jia, Z., Balasuriya, A., & Challa, S. (2005). Vision based autonomous vehicles target visual tracking with multiple dynamics models. In IEEE network, sensing and control (pp. 1081–1086), Las Vegas, NV, March 2005. Google Scholar
  30. Jin, H., Favaro, P., & Soatto, S. (2000). Real-time 3D motion and structure of point-features: a front-end for vision-based control and interaction. In Conference on computer vision and pattern recognition (pp. 778–779), Hilton Head Island, SC. Google Scholar
  31. Johansson, B., & Moe, A. (2005). Patch-duplets for object recognition and pose estimation. In Canadian conference on computer and robot vision (pp. 9–16), Victoria, Canada, May 2005. Google Scholar
  32. Jurie, F., & Dhome, M. (2002). Real time robust template matching. In 13th British machine vision conference (pp. 123–132), Cardiff, Wales. Google Scholar
  33. Kify. (2006, January). Nature wallpapers. [Online]. Available:
  34. Kim, S., & Kweon, I. (2003). Robust model-based 3D object recognition by combining feature matching with tracking. In International conference on robotics and automation, (Vol. 2, pp. 2123–2128), Taipei, Taiwan, September 2003. Google Scholar
  35. Krahnstoever, N., & Sharma, R. (2003). Appearance management and cue fusion for 3D model-based tracking. In Conference on computer vision and pattern recognition (Vol. 2, pp. 249–254), Madison, WI, June 2003. Google Scholar
  36. Kyrki, V., & Schmock, K. (2005). Integration methods of model-free features for 3D tracking. In Lecture notes in computer science (pp. 557–566). Berlin: Springer. Google Scholar
  37. Lee, S., Jung, S., & Nevatia, R. (2002). Automatic pose estimation of complex 3D building models. In IEEE workshop on applications of computer vision (pp. 148–152), Orlando, FL, December 2002. Google Scholar
  38. Lepetit, V., Pilet, J., & Fua, P. (2004). Point matching as a classification problem for fast and robust object pose estimation. In Conference on computer vision and pattern recognition (Vol. 2, pp. 224–250), June 2004. Google Scholar
  39. Lippiello, V., Siciliano, B., & Villani, L. (2003). Robust visual tracking using a fixed multi-camera system. In IEEE conference on robotics and automation (pp. 3333–3338), Taipei, Taiwan, September 2003. Google Scholar
  40. Lu, C., Hager, G., & Mjolsness, E. (2000). Fast and globally convergent pose estimation from video images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(6), 610–622. CrossRefGoogle Scholar
  41. Lucas, B., & Kanade, T. (1981). An iterative image registration technique with application to stereo vision. In 7th international joint conference on artificial intelligence (pp. 674–479), Vancouver, Canada, August 1981. Google Scholar
  42. Marchand, E., Bouthemy, P., & Chaumette, F. (2001). A 2D-3D model-based approach to real-time visual tracking. Image and Vision Computing, 19(7), 941–955. CrossRefGoogle Scholar
  43. Matsushita, Y., Nishino, K., Ikeuchi, K., & Sakauchi, M. (2004). Illumination normalization with time-dependent intrinsic images for video surveillance. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(10), 1336–1347. CrossRefGoogle Scholar
  44. Matthews, I., Ishikawa, T., & Baker, S. (2004). The template update problem. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(6), 810–815. CrossRefGoogle Scholar
  45. McKenna, S., Jabri, S., Duric, Z., & Wechsler, H. (2000). Tracking interacting people. In 4th IEEE international conference on automatic face and gesture recognition (pp. 348–353), Grenoble, France, March 2000. Google Scholar
  46. Ponsa, D., López, A., Serrat, J., Lumbereras, F., & Graf, T. (2005). Multiple vehicle 3D tracking using an unscented Kalman filter. In International IEEE conference on intelligent transportation systems (pp. 1108–1113), Vienna, Austria, September 2005. Google Scholar
  47. Schenker, P., Huntsberger, T., Pirjanian, P., Baumgartner, E., & Tunstel, E. (2003). Planetary rover developments supporting mars exploration, sample return and future human-robotic colonization. Autonomous Robots, 13(2–3), 103–126. CrossRefGoogle Scholar
  48. Sen Gupta, G., Messom, C., & Demidenko, S. (2005). Real-time identification and predictive control of fast mobile robots using global vision sensing. IEEE Transactions on Instrumentation and Measurement, 54(1), 200–214. CrossRefGoogle Scholar
  49. Shreiner, D. (Ed.). (2004). OpenGL reference manual (4th ed.) Boston: Addison-Wesley. Google Scholar
  50. Sugar, T., McBeath, M., Suluh, A., & Mundhra, K. (2006). Mobile robot intercaption using human navigational principles: comparison of active versus passive tracking algorithms. Autonomous Robots, 21(1), 43–54. CrossRefGoogle Scholar
  51. Tan, T. N., Sullivan, G. D., & Baker, K. D. (1998). Model-based localisation and recognition of road vehicles. International Journal of Computer Vision, 27(1), 5–25. CrossRefGoogle Scholar
  52. Vincze, M., Schlemmer, M., Gemeiner, P., & Ayromlou, M. (2005). Vision for robotics: a tool for model-based object tracking. IEEE Robotics and Automation Magazine, 12(4), 53–64. CrossRefGoogle Scholar
  53. Virtual New Zealand. (2006, January). Virtual New Zealand photos. [Online]. Available:
  54. Webber, J., & Malik, J. (1993). Robust computation of optical flow in a multi-scale differential framework. In 4th international conference on computer vision (pp. 12–20), Berlin, Germany, May 1993. Google Scholar
  55. Wong, F., Chan, T., Ben Mrad, R., & Benhabib, B. (2004). Mobile-robot guidance in the presence of obstacles. In International conference on flexible automation and intelligent manufacturing (pp. 292–299), Toronto, Canada, July 2004. Google Scholar
  56. Wu, Y., Hua, G., & Yu, T. (2003). Switching observation models for contour tracking in clutter. In IEEE conference on computer vision and pattern recognition (pp. 295–302), Madison, WI, June 2003. Google Scholar
  57. Yang, R., Welch, G., & Bishop, G. (2002). Real-time consensus-based scene reconstruction using commodity graphics hardware. In 10th Pacific conference on computer graphics and applications (pp. 225–234), Beijing, China. Google Scholar
  58. Zhang, Z. (2000). A flexible new technique for camera calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(11), 1330–1334. CrossRefGoogle Scholar
  59. Zhang, Z., Li, J., & Wei, X. (2004). Robust computation of optical flow field with large motion. In IEEE international conference on signal processing (Vol. 1, pp. 893–896), Beijing, China, September 2004. Google Scholar
  60. Zhao, L., Luo, S., & Liao, L. (2004). 3d object recognition and pose estimation using kernel pca. In 3rd international conference on machine learning and cybernetics (pp. 3258–3262), Shanghai, China, August 2004. Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  1. 1.Computer Integrated Manufacturing Laboratory, Department of Mechanical and Industrial EngineeringUniversity of TorontoTorontoCanada

Personalised recommendations