International Journal of Computer Vision

, Volume 50, Issue 2, pp 203–226 | Cite as

View-Invariant Representation and Recognition of Actions

  • Cen Rao
  • Alper Yilmaz
  • Mubarak Shah


Analysis of human perception of motion shows that information for representing the motion is obtained from the dramatic changes in the speed and direction of the trajectory. In this paper, we present a computational representation of human action to capture these dramatic changes using spatio-temporal curvature of 2-D trajectory. This representation is compact, view-invariant, and is capable of explaining an action in terms of meaningful action units called dynamic instants and intervals. A dynamic instant is an instantaneous entity that occurs for only one frame, and represents an important change in the motion characteristics. An interval represents the time period between two dynamic instants during which the motion characteristics do not change. Starting without a model, we use this representation for recognition and incremental learning of human actions. The proposed method can discover instances of the same action performed by differentpeople from different view points. Experiments on 47 actions performed by 7 individuals in an environment with no constraints shows the robustness of the proposed method.

action recognition view-invariant representation view-invariant matching spatio-temporal curvature human perception instants 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Bobick, A. and Davis, J.W. 1997. Action recognition using temporal templates. In CVPR-97, pp. 125–146.Google Scholar
  2. Comaniciu, D., Ramesh, V., and Meer, P. 2000. Real-time tracking of non-rigid objects using mean shift. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 142–149.Google Scholar
  3. Davis, J., Bobick, A., and Richards, W. 2000. Categorical representation and recognition of oscillatory motion patterns. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 628–635.Google Scholar
  4. Gould, K. and Shah, M. 1989. The trajectory primal sketch: A multi-scale scheme for representing motion characteristics. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, San Diego, pp. 79–85.Google Scholar
  5. Izumi, M. and Kojiama, A. 2000. “Generating natural language description of human behavior from video images.” In ICPR-2000, vol. 4, pp. 728–731Google Scholar
  6. Jagacinski, R.J., Johnson, W.W., and Miller, R.A. 1983. Quantify-ing the cognitive trajectories of extrapolated movements. Journal of Exp. Psychology: Human Perception and Performance, 9: 43–57.Google Scholar
  7. Kjeldesn, R. and Kender, J. 1996. Finding skin in color images. In Int. Workshop on Automatic Face and Gesture Recognition, pp. 312–317.Google Scholar
  8. Koller, D., Heinze, D., and Nagel, H.-H. 1991. Algorithmic characterization of vehicle trajectories from image sequences by motion verbs. In CVPR-91, pp. 90–95.Google Scholar
  9. Madabushi, A. and Aggarwal, J.K. 2000. Using head movement to recognize activity. In Proc. Int Conf on Pattern Recognition, vol. 4, pp. 698–701.Google Scholar
  10. Mundy, J.L. and Zisserman, A. 1992. Geometric Invariance in Computer Vision. The MIT Press. ISBN 0-262-13285-0.Google Scholar
  11. Newtson, D. and Engquist, G. 1976. The perceptual organization of ongoing behavior. Journal of Experimental Social Psychology, 12(5):436–450.Google Scholar
  12. Parish, D.H., Sperling, G., and Landy, M.S. 1990. Intelligent temporal sub-sampling of American sign language using event boundaries. J. Exptl. Psychol.: Human Perception and Performance, 16:282–294.Google Scholar
  13. Perona. P. and Malik, J. 1990. Scale-space and edge detection using anisotropic diffusion. IEEE PAMI, 12(7).Google Scholar
  14. Polana, R. 1994. Temporal texture and activity recognition. Ph.D. Thesis, University of Rochester.Google Scholar
  15. Rosen, K.H. 1999. Discrete Mathematics and its Applications. 4th edn. McGraw-Hill: New York.Google Scholar
  16. Rubin, J.M. and Richards, W.A. 1985. Boundaries of visual motion. Tech. Rep. AIM-835, Massachusetts Institute of Technology, Artificial Intelligence Laboratory, p. 149.Google Scholar
  17. Seitz, S.M. and Dyer, C.R. 1997. View-invariant analysis of cyclic motion. International Journal of Computer Vision, 25:1–25.Google Scholar
  18. Shapiro, L.S., Zisserman, A., and Brady, M. 1995. “3D motion recovery via affine epipolar geometry.” Int. J. of Computer Vision, 16:147–182.Google Scholar
  19. Siskind, J.M. and Moris, Q. 1996. A maximum likelihood ap-proach to visual event classification. In ECCV-96, pp. 347–360.Google Scholar
  20. Starner, T. and Pentland, A. 1996. Real-time American sign language recognition from video using hidden Markov models. In Motion-Based Recognition, M. Shah and R. Jain (Eds.). Kluwer Academic Publishers: Dordrecht. Computational Imaging and Vision Series.Google Scholar
  21. Tomasi, C. and Kanade, T. 1992. Shape and motion from image streams under orthography: Afactorization method. Int. J. of Computer Vision, 9(2):137–154.Google Scholar
  22. Tsai, Ping-Sing, Shah, M., Keiter, K., and Kasparis, T. 1994. Cyclic motion detection for motion based recognition. Pattern Recognition, 27(12).Google Scholar
  23. Tsotsos, J.K. et al. 1980. “A framework for visual motion under-standing.” IEEE PAMI, 2(6):563–573.Google Scholar
  24. Zacks, J. and Tversky, B. 2001. Event structure in perception and cognition. Psychological Bulletin, 127(1):3–21.Google Scholar

Copyright information

© Kluwer Academic Publishers 2002

Authors and Affiliations

  • Cen Rao
    • 1
  • Alper Yilmaz
    • 1
  • Mubarak Shah
    • 1
  1. 1.Computer Vision Laboratory, School of Electrical Engineering and Computer ScienceUniversity of Central FloridaOrlandoUSA

Personalised recommendations