Abstract
This paper presents an approach for viewpoint invariant human action recognition, an area that has received scant attention so far, relative to the overall body of work in human action recognition. It has been established previously that there exist no invariants for 3D to 2D projection. However, there exist a wealth of techniques in 2D invariance that can be used to advantage in 3D to 2D projection. We exploit these techniques and model actions in terms of view-invariant canonical body poses and trajectories in 2D invariance space, leading to a simple and effective way to represent and recognize human actions from a general viewpoint. We first evaluate the approach theoretically and show why a straightforward application of the 2D invariance idea will not work. We describe strategies designed to overcome inherent problems in the straightforward approach and outline the recognition algorithm. We then present results on 2D projections of publicly available human motion capture data as well on manually segmented real image sequences. In addition to robustness to viewpoint change, the approach is robust enough to handle different people, minor variabilities in a given action, and the speed of aciton (and hence, frame-rate) while encoding sufficient distinction among actions.
Similar content being viewed by others
References
Aggarwal, J. and Cai, Q. 1999. Human motion analysis: A review. Computer Vision and Image Understanding, 73:428–440.
Astrom, K. and Morin, L. 1995. Random cross ratios. In Proc. 9th Scand. Conf. on Image Analysis.
Brand, M., Oliver, N. and Pentland, A. 1997. Coupled Hidden Markov Models for Complex Action Recognition. In Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 994–999.
Campbell, L. and Bobick, A. 1995. Recognition of human body motion using phase space constraints. In Proc. International Conference on Computer Vision, pp. 624–630.
Campbell, L.W., Becker, D.A., Azerbayejani, A., Bobick, A., and Pentland, A. 1996. Invariant Features for 3-D Gesture Rocognition. In 2nd Int. Conf. on Automatic Face- and Gesture-Rocognition, Killington, Vermont.
Cedras, C. and Shah, M. 1995, Motion-based Recognition: A Survey. Image and Vision Computing, 13(2).
Cormen, T.H., Leiserson, C.E., and Rivest, R.L. 1990. Introduction to Algorithms. MIT Press/McGraw-Hill, New York, pp. 301–328.
David, R. and Alla, H. 1991. Petri Nets and Grafcet. Prentice Hall.
Davis, J. and Bobick, A. 1997. The representation and recognition of action using temporal templates. In Proc. IEEE Conference on Computer Vision and Pattern Recognition.
Faugeras, O. 1993. Three-Dimensional Computer Vision, A Geometric Viewpoint. The MIT Press.
Fujiyoshi, H. and Lipton, A. 1998. Real-time human motion analysis by image skeletonization. In Fourth IEEE Workshop on Applications of Computer Vision, pp. 15–21.
Gavrila, D.M. 1998. The visual analysis of human movement. Computer Vision and Image Understanding, 73(1):82–98.
Haritaoglu, I., Harwood, D., and Davis, L. 1998. Ghost: A human body part labeling system using silhouettes. In Proc. International Conference on Pattern Recognition, pp. 77–82.
Hoffman, D.H. 1998. Visual Intelligence. W.W. Norton & Company.
Johannson, G. 1973. Visual perception of biological motion and a model for its analysis. Perception and Psychophysics, 14(2):201–211.
Leung, M. and Yang, Y. 1995. First Sight: A human-body outline labeling system. IEEE Trans. on Pattern Analysis and Machine Intelligence, 17(4):359–377.
Marr, D. and Nishihara, H.K. 1978. Representation and recognition of the spatial organization of three dimensional shapes. Proceedings of the Royal Society, London, B:200:269–274.
Maybank, S. 1995. Probabilistic analysis of the cross-ratio to model based vision. International Journal of Computer Vision, 16:5–33.
Moeslund, T. and Granumm, E. 2001. A survey of computer vison based human motion capture. Computer Vision and Image Understanding, 81(3):231–268.
Murphy, K. 2002. Dynamic bayesian networks: representation, inference and learning. Ph.D thesis, University of California Berkeley.
Nam, Y., Wohn, K., and Lee-Kwang, H. 1999. Modeling and recognition of hand gestures using colored petri nets. IEEE Transactions on Systems, Man and Cybernetics-Part A, 29(5):514–521.
Parameswaran, V., Burlina, P. and Chellappa, R. 1997. Performance Analysis and Learning Approaches for Vehicle Detection and Counting. In Proc. IEEE Conference on Acoustics, Speech and signal Processing.
Parameswaran, V. and Chellappa, R. 2002. Quasi-invariants for human action representation and recognition. In Proc. International Conference on Pattern Recognition.
Parameswaran, V. and Chellappa, R. 2003. View invariants for human action recognition. In Proc. IEEE Conference on Computer Vision and Pattern Recognition.
Parameswaran, V. and Chellappa, R. 2005. Human action recognition using mutual invariants. Computer Vision and Image Understanding, 98(2):294–324.
Polana, R. and Nelson, R.C. 1993. Detecting activities. In Proc. IEEE Conference on Computer Vision and Pattern Recognition pp. 2–7.
Rao, C., Yilmaz, A., and Shah, M. 2002. View-invariant representation and recognition of actions. International Journal of Computer Vision, 50(2):203–226.
Rosales, R. 2002. Specialized mappings architecture with applications to vision-based estimation of articulated pose. Ph. D. thesis, Graduate School of Arts and Sciences, Boston University.
Rosales, R. and Sclaroff, S. 2000. Inferring body pose without tracking body parts. In Proc. IEEE Conference on Computer Vision and Pattern Recognition.
Rothwell, C.A. 1995. Object Recognition Through Invariant Indering. Oxford Science Publications.
Seitz, S.M. and Dyer, C.R. 1997. View-invariant analysis of cyclic motion. International Journal of Computer Vision, 25:1–25.
Syeda-Mahmood, T. and Vasilescu, A. 2001. Recognizing action events from multiple viewpoints. In Proc. IEEE Workshop on Detection and Recognition of Events in Video.
Wang, L., Hu, W., and Tan, T. 2003. Recent developments in human motion analysis. Pattern Recognition, 36(3):585–601.
Zatsiorsky, V.M. 2002. Kinetics of human motion. Human Kinetics, Champaign, IL.
Author information
Authors and Affiliations
Additional information
This work was done when the author was a graduate student in the Department of Computer Science and was partially supported by the NSF Grant ECS-02-5475. The author is curently with Siemens Corporate Research, Princeton, NJ.
Dr. Chellappa is with the Department of Electrical and Computer Engineering.
Rights and permissions
About this article
Cite this article
Parameswaran, V., Chellappa, R. View Invariance for Human Action Recognition. Int J Comput Vision 66, 83–101 (2006). https://doi.org/10.1007/s11263-005-3671-4
Received:
Revised:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/s11263-005-3671-4