View-Invariant Modeling and Recognition of Human Actions Using Grammars

  • Abhijit S. Ogale
  • Alap Karapurkar
  • Yiannis Aloimonos
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4358)


In this paper, we represent human actions as sentences generated by a language built on atomic body poses or phonemes. The knowledge of body pose is stored only implicitly as a set of silhouettes seen from multiple viewpoints; no explicit 3D poses or body models are used, and individual body parts are not identified. Actions and their constituent atomic poses are extracted from a set of multiview multiperson video sequences by an automatic keyframe selection process, and are used to automatically construct a probabilistic context-free grammar (PCFG), which encodes the syntax of the actions. Given a new single viewpoint video, we can parse it to recognize actions and changes in viewpoint simultaneously. Experimental results are provided.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Rizzolatti, G., Arbib, M.A.: Language within our grasp. Trends in Neurosciences 21, 188–194 (1998)CrossRefGoogle Scholar
  2. 2.
    Aggarwal, J., Park, S.: Human motion: Modeling and recognition of actions and interactions. In: Proceedings of the 2nd International Symposium on 3D Data Processing, Visualization and Transmission, pp. 640–647 (2004)Google Scholar
  3. 3.
    Wang, L., Hu, W., Tan, T.: Recent developments in human motion analysis. Pattern recognition 36, 585–601 (2003)CrossRefGoogle Scholar
  4. 4.
    Pavlovic, V., Sharma, R., Huang, T.: Visual interpretation of hand gestures for human-computer interaction: a review. IEEE Transactions on Pattern Analysis and Machine Intelligence 19, 677–695 (1997)CrossRefGoogle Scholar
  5. 5.
    Yamato, J., Ohya, J., Ishii, K.: Recognizing human action in time sequential images using hidden markov model. In: Proceedings of IEEE Conf. Computer Vision and Image Processing, pp. 379–385. IEEE Computer Society Press, Los Alamitos (1992)Google Scholar
  6. 6.
    Bregler, C.: Learning and recognizing human dynamics in video sequences. In: Proceedings of the IEEE Conf. Computer Vision and Pattern Recognition, pp. 568–574. IEEE Computer Society Press, Los Alamitos (1997)CrossRefGoogle Scholar
  7. 7.
    Brand, M., Kettnaker, V.: Discovery and segmentation of activities in video. IEEE Transactions on Pattern Analysis and Machine Intelligence 22, 844–851 (2000)CrossRefGoogle Scholar
  8. 8.
    Bobick, A.F., Davis, J.W.: The recognition of human movement using temporal templates. IEEE Transactions on Pattern Analysis and Machine Intelligence 23, 257–267 (2001)CrossRefGoogle Scholar
  9. 9.
    Kojima, A., Tamura, T., Fukunaga, K.: Natural language description of human activities from video images based on concept hierarchy of actions. IJCV 50, 171–184 (2002)zbMATHCrossRefGoogle Scholar
  10. 10.
    Sullivan, J., Carlsson, S.: Recognizing and tracking human action. In: Proceedings of European Conference on Computer Vision, pp. 629–644 (2002)Google Scholar
  11. 11.
    Rao, C., Yilmaz, A., Shah, M.: View-invariant representation and recognition of actions. International Journal of Computer Vision 50, 203–226 (2002)zbMATHCrossRefGoogle Scholar
  12. 12.
    Davis, J.W., Tyagi, A.: A reliable-inference framework for recognition of human actions. In: Proceedings of the IEEE Conference on Advanced Video and Signal Based Surveillance, pp. 169–176. IEEE Computer Society Press, Los Alamitos (2003)CrossRefGoogle Scholar
  13. 13.
    Mori, T., Segawa, Y., Shimosaka, M., Sato, T.: Hierarchical recognition of daily human actions based on continuous hidden markov models. In: Proceedings of IEEE International Conference on Automatic Face and Gesture Recognition, pp. 779–784. IEEE Computer Society Press, Los Alamitos (2004)CrossRefGoogle Scholar
  14. 14.
    Feng, X., Perona, P.: Human action recognition by sequence of movelet codewords. In: Proceedings of First International Symposium on 3D Data Processing Visualization and Transmission, pp. 717–721 (2002)Google Scholar
  15. 15.
    Park, J., Park, S., Aggarwal, J.K.: Model-based human motion tracking and behavior recognition using hierarchical finite state automata. In: Laganà, A., Gavrilova, M., Kumar, V., Mun, Y., Tan, C.J.K., Gervasi, O. (eds.) ICCSA 2004. LNCS, vol. 3046, pp. 311–320. Springer, Heidelberg (2004)Google Scholar
  16. 16.
    Reddy, B.S., Chatterji, B.: An fft-based technique for translation, rotation and scale-invariant image registration. IEEE Transactions on Image Processing 5, 1266–1271 (1996)CrossRefGoogle Scholar

Copyright information

© Springer Berlin Heidelberg 2007

Authors and Affiliations

  • Abhijit S. Ogale
    • 1
  • Alap Karapurkar
    • 1
  • Yiannis Aloimonos
    • 1
  1. 1.Computer Vision Laboratory, Dept. of Computer Science, University of Maryland, College Park, MD 20742USA

Personalised recommendations