International Journal of Computer Vision

, Volume 35, Issue 1, pp 45–63 | Cite as

Incremental Focus of Attention for Robust Vision-Based Tracking

  • Kentaro Toyama
  • Gregory D. Hager


We present the Incremental Focus of Attention (IFA) architecture for robust, adaptive, real-time motion tracking. IFA systems combine several visual search and vision-based tracking algorithms into a layered hierarchy. The architecture controls the transitions between layers and executes algorithms appropriate to the visual environment at hand: When conditions are good, tracking is accurate and precise; as conditions deteriorate, more robust, yet less accurate algorithms take over; when tracking is lost altogether, layers cooperate to perform a rapid search for the target and continue tracking.

Implemented IFA systems are extremely robust to most common types of temporary visual disturbances. They resist minor visual perturbances and recover quickly after full occlusions, illumination changes, major distractions, and target disappearances. Analysis of the algorithm's recovery times are supported by simulation results and experiments on real data. In particular, examples show that recovery times after lost tracking depend primarily on the number of objects visually similar to the target in the field of view.

visual tracking real-time tracking robust tracking face tracking 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Bar-Shalom, Y. and Fortmann, T.E. 1988. Tracking and Data Association. Academic Press.Google Scholar
  2. Bar-Shalom, Y. and Li, X.-R. 1993. Estimation and Tracking: Principles, Techniques, and Software. Artech House.Google Scholar
  3. Blake, A., Curwen, R., and Zisserman, A. 1993. Affine-invariant contour tracking with automatic control of spatiotemporal scale. In Proc. Int'l Conf. on Computer Vision, Berlin, Germany, pp. 421–430.Google Scholar
  4. Bradshaw, K.J., McLauchlan, P.F., Reid, I.D., and Murray, D.W. 1994. Saccade and pursuit on an active head/eye platform. Image and Vision Computing, 12(3):155–163.Google Scholar
  5. Burridge, R. Rizzi, A., and Koditschek, D. 1995. Towards a dyanmical pick and place. In Proc. Int'l Conf. Intel. Rob. and Sys., Vol. 2, pp. 292–297.Google Scholar
  6. Burridge, R., Rizzi, A., and Koditschek, D. 1999. Sequential composition of dynamically dexterous robot behaviors. Int'l Journal Robot. Res., 18(6):534–555.Google Scholar
  7. Burt, P.J. 1988. Attention mechanisms for vision in a dynamic world. In Proc. Int'l Conf. on Patt. Recog., pp. 977–987.Google Scholar
  8. Burt, P.J. and van der Wal, G.S. 1990. An architecture for multiresolution, focal, image analysis. In Proc. Int'l Conf. on Patt. Recog., pp. 305–311.Google Scholar
  9. Cohen, L.D. 1991. On active contour models and balloons. CVGIP: Image Understanding, 53(2):211–218.Google Scholar
  10. Concepcion, V. and Wechsler, H. 1996. Detection and localization of objects in time-varying imagery using attention, representation and memory pyramids. Patt. Recog., 29(9):1543–1557.Google Scholar
  11. Coombs, D. and Brown, C.M. 1993. Real-time binocular smooth-pursuit. Int'l J. of Computer Vision, 11(2):147–165.Google Scholar
  12. Crowley, J. and Berard, F. 1997. Multi-modal tracking of faces for video communication. In Proc. Computer Vision and Patt. Recog., pp. 640–645.Google Scholar
  13. Culhane, S.M. and Tsotsos, J.K. 1992. An attentional prototype for early vision. In Proc. European Conf. on Computer Vision, Italy, pp. 551–560.Google Scholar
  14. Feiten, W., Hager, G.D., Bauer, J., Magnussen, B., and Toyama, K. 1997. Modeling and control for mobile manipulation in everyday environments. In Proceedings of the 8th International Symposium on Robotics Research.Google Scholar
  15. Gennery, D.B. 1992. Visual tracking of known three-dimensional objects. Int'l J. of Computer Vision, 7(3):243–270.Google Scholar
  16. Hager, G. and Toyama, K. 1998. XVision: A portable substrate for real-time vision applications. Computer Vision and Image Understanding, 69(1):23–37.Google Scholar
  17. Hager, G.D. and Toyama, K. 1996. XVision: Interaction through real-time visual tracking. In CVPR Demo Program. San Francisco.Google Scholar
  18. Hager, G.D. and Belhumeur, P.N. 1998. Efficient region tracking with parametric models of geometry and illumination. PAMI, 20(10):1025–1039.Google Scholar
  19. Huber, E. and Kortenkamp, D. 1995. Using stereo vision to pursue moving agents with a mobile robot. In Proc. Int'l Conf. on Robot. and Autom., Nagoya, Japan, pp. 2340–2346.Google Scholar
  20. Isard, M. and Blake, A. 1996. Contour tracking by stochastic propagation of conditional density. In Proc. European Conf. on Computer Vision, Vol. I, pp. 343–356.Google Scholar
  21. Isard, M. and Blake,A. 1998. Icondensation: Unifying low-level and high-level tracking in a stochastic framework. In ECCV98, pp. 893–908.Google Scholar
  22. Kahn, R.E., Swain, M.J., Prokopowicz, P.N., and Firby, R.J. 1996. Gesture recognition using Perseus architecture. In Proc. Computer Vision and Patt. Recog., pp. 734–741.Google Scholar
  23. Kass, H., Witkin, A., and Terzopoulos, D. 1987. Snakes: Active contour models. Int'l J. of Computer Vision, 1:321–331.Google Scholar
  24. Kosaka, A. and Nakazawa, G. 1995. Vision-based motion tracking of rigid objects using prediction of uncertainties. In Proc. Int'l Conf. on Robot. and Autom., Nagoya, Japan, pp. 2637–2644.Google Scholar
  25. Lowe, D.G. 1992. Robust model-based motion tracking through the integration of search and estimation. Int'l J. of Computer Vision, 8(2):113–122.Google Scholar
  26. Maki, A., Nordlund, P., and Eklundh, J. 1996. A computational model of depth-based attention. In Proc. Int'l Conf. on Patt. Recog., p. D9E.1.Google Scholar
  27. Murray, D. and Basu, A. 1994. Motion tracking with an active camera. IEEE Trans. Patt. Anal. and Mach. Intel., 16(5):449–459.Google Scholar
  28. Neisser, U. 1967. Cognitive Psychology. Appleton-Century-Crofts, New York.Google Scholar
  29. Nishihara, H.K. 1996. Real-time vision. In CVPR Demo Program.Google Scholar
  30. Nordlund, P. and Uhlin, T. 1996. Closing the loop: Detection and pursuit of a moving object by a moving observer. Image and Vision Computing, 14:265–275.Google Scholar
  31. Oliver, N., Pentland, A., and Berard, F. 1997. LAFTER: Lips and face real time tracker. In Proc. Computer Vision and Patt. Recog. Google Scholar
  32. Pahlavan, K. and Eklundh, J.-O. 1992. A head-eye system — analysis and design. CVGIP: Image Understanding, 56:41–56.Google Scholar
  33. Prokopowicz, P., Swain, M., and Kahn, R. 1994. Task and environment-sensitive tracking. Technical Report 94-05, University of Chicago.Google Scholar
  34. Raja, Y., McKenna, S.J., and Gong, S. 1998. Tracking and segmenting people in varying lighting conditions using colour. In Proc. Int'l Conf. on Autom. Face and Gesture Recog., pp. 228–233.Google Scholar
  35. Rasmussen, C., Toyama, K., and Hager, G. 1996. Tracking objects by color alone. DCS RR-1114, Yale University.Google Scholar
  36. Reid, D.B. 1979. An algorithm for tracking multiple targets. IEEE Trans. Autom. Control, 24:843–854.Google Scholar
  37. Terzopoulos, D. and Rabie, T.F. 1995. Animat vision: Active vision in artificial animals. In Proc. Int'l Conf. on Computer Vision, pp. 801–808.Google Scholar
  38. Terzopoulos, D. and Szeliski, R. 1992. Tracking with Kalman snakes. In Active Vision, A. Blake and A. Yuille (Eds.), MIT Press: Cambridge, MA.Google Scholar
  39. Toyama, K. 1998. Radial spanning for fast blob detection. In Proc. Int'l Conf. on Comp. Vision, Patt. Recog., and Image Proc. Google Scholar
  40. Toyama, K. 1998. Robust Vision-Based Object Tracking. PhD Thesis. Yale University.Google Scholar
  41. Toyama, K. and Hager, G. 1997. If at first you don't succeed. In Proc. AAAI, Providence, RI, pp. 3–9.Google Scholar
  42. Toyama, K., Wang, J., and Hager, G. 1996. SERVOMATIC: a modular system for robust positioning using stereo visual servoing. In Proc. Int'l Conf. Rob. and Autom., pp. 2636–2643.Google Scholar
  43. Treisman, A. 1985. Preattentive processing in vision. CVGIP: Image Understanding, 31:156–177.Google Scholar
  44. Tsotsos, J.K. 1993. An inhibitory beam for attentional selection. In Spatial Vision for Humans and Robots. Cambridge University Press.Google Scholar
  45. Tsotsos, J.K. 1995. Towards a computational model of visual attention. In Early Vision and Beyond, T. Papathomas, C. Chubb, A. Gorea, and E. Kowler (Eds.), MIT Press, pp. 207–218.Google Scholar
  46. Turk, M. 1996. Visual interaction with lifelike characters. In Proc. Automatic Face and Gesture Recognition.Google Scholar
  47. Uhlin, T., Nordlund, P., Maki, A., and Eklundh, J.-O. 1995. Towards an active visual observer. In Proc. Int'l Conf. on Computer Vision, Cambridge, MA, pp. 679–686.Google Scholar
  48. Vincze, M. 1996. Optimal window size for visual tracking. In Proc. Int'l Conf. on Patt. Recog., p. A91.1.Google Scholar
  49. Wixson, L.E. and Ballard, D.H. 1994. Using intermediate objects to improve the efficiency of visual-search. Int'l J. of Computer Vision, 12(2–3):209–230.Google Scholar
  50. Wolfe, J. 1995. Guided search 2.0: A revised model of visual search. Psychonomic Bulletin and Review, 1(2):202–238.Google Scholar
  51. Wren, C.R., Azarbayejani, A., Darrell, T., and Pentland, A. 1997. Pfinder: Real-time tracking of the human body. IEEE Tran. Patt. Anal. and Mach. Intel., 19(7):780–785.Google Scholar

Copyright information

© Kluwer Academic Publishers 1999

Authors and Affiliations

  • Kentaro Toyama
    • 1
  • Gregory D. Hager
    • 2
  1. 1.Microsoft Research, One Microsoft WayRedmond
  2. 2.Department of Computer ScienceThe Johns Hopkins UniversityBaltimore

Personalised recommendations