Video annotation: Computers watching video

  • Aaron F. Bobick
New Directions
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1035)


Recently our research has focused on tools for video annotation — the generation of symbolic descriptions of dynamic scenes. In this paper I describe two specific examples of developing capabilities necessary for the understanding of action in video sequences. The first is a novel tracking technique that builds context-specific templates that are used only locally in time and space. The second example is some work on gesture recognition where time is implicitly represented in a probabilistic manner. We believe the future of computer vision lies in the processing of video and that while much work has been devoted to the representation and manipulation of static images, we are far behind in developing tools for considering action.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    A. F. Bobick and A. D. Wilson, “A state-based technique for the summarization and recognition of gesture,” Proc. Int. Conf. Comp. Vis., 1995.Google Scholar
  2. 2.
    A. D. Wilson and A. F. Bobick, “Learning visual behavior for gesture analysis,” in Proc. IEEE Int'l. Symp. on Comp. Vis., 1995. submitted for publication.Google Scholar
  3. 3.
    S. Intille and A. Bobick, “Closed-world tracking,” in Proc. Int. Conf. Comp. Vis., June 1995.Google Scholar
  4. 4.
    H.-H. Nagel, “From image sequences towards conceptual descriptions,” Image and Vision Comp., vol. 6, no. 2, pp. 59–74, 1988.Google Scholar
  5. 5.
    T. Strat and M. Fischler, “Context-based vision: recognizing objects using information from both 2D and 3D imagery,” IEEE Trans. Patt. Analy. and Mach. Intell., vol. 13, no. 10, pp. 1050–1065, 1991.Google Scholar
  6. 6.
    J. M. Rehg and T. Kanade, “Visual tracking of high DOF articulated structures: an application to human hand tracking,” Proc. European Conf. Comp. Vis., vol. 2, pp. 35–46, 1994.Google Scholar
  7. 7.
    K. Rohr, “Towards model-based recognition of human movements in image sequences,” Comp. Vis., Graph., and Img. Proc., vol. 59, no. 1, pp. 94–115, 1994.Google Scholar
  8. 8.
    T. Darrell and A. Pentland, “Space-time gestures,” Proc. Comp. Vis. and Pattern Rec., pp. 335–340, 1993.Google Scholar
  9. 9.
    Y. Cui and J. Weng, “Learning-based hand sign recognition,” in Proc. of the Intl. Workshop on Automatic Face-and Gesture-Recognition, (Zurich), 1995.Google Scholar
  10. 10.
    F. Quek, “Hand gesture interface for human-machine interaction,” in Proc. of Virtual Reality Systems, vol. Fall, 1993.Google Scholar
  11. 11.
    R. W. Picard and T. P. Minka, “Vision texture for annotation,” Journal of Multimedia Systems, vol. 3, pp. 3–14, 1995.Google Scholar
  12. 12.
    A. Bobick and C. Pinhanez, “Using approximate models as a source of contextual information for vision processing,” in IEEE Workshop on Context Based Vision, June 1995.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1996

Authors and Affiliations

  • Aaron F. Bobick
    • 1
  1. 1.MIT Media LaboratoryCambridgeUSA

Personalised recommendations