Spatially and Temporally Segmenting Movement to Recognize Actions

Green, Richard

doi:10.1007/978-1-4020-6693-1_9

Richard Green⁵

Part of the book series: Computational Imaging and Vision ((CIVI,volume 36))

2878 Accesses

This chapter presents a Continuous Movement Recognition (CMR) framework which forms a basis for segmenting continuous human motion to recognize actions as demonstrated through the tracking and recognition of hundreds of skills from gait to twisting summersaults. A novel 3D color clone-body-model is dynamically sized and texture mapped to each person for more robust tracking of both edges and textured regions. Tracking is further stabilized by estimating the joint angles for the next frame using a forward smoothing Particle filter with the search space optimized by utilizing feedback from the CMR system. A new paradigm defines an alphabet of dynemes being small units of movement, to enable recognition of diverse actions. Using multiple Hidden Markov Models, the CMR system attempts to infer the action that could have produced the observed sequence of dynemes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Hardcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abdelkader, M., R. Chellappa, Q. Zheng, and A. Chan: Integrated Motion De-tection and Tracking for Visual Surveillance, In Proc. Fourth IEEE International Conference on Computer Vision Systems, pp. 28-36, 2006
Google Scholar
Aggarwal A., S. Biswas, S. Singh, S. Sural, and A. Majumdar: Object Tracking Using Background Subtraction and Motion Estimation in MPEG Videos, In Proc. Asian Conference on Computer Vision, pp. 121-130, 2006.
Google Scholar
Badler, N., C. Phillips and B. Webber: Simulating Humans. Oxford University Press, New York, pp. 23-65, 1993.
MATH Google Scholar
Bauckhage C., M. Hanheide, S. Wrede and G. Sagerer: A Cognitive Vision System for Action Recognition in Office Environment, In Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 827-833, 2004.
Google Scholar
Bhatia S., L. Sigal, M. Isard, and M. Black: 3D Human Limb Detection using Space Carving and Multi-view Eigen Models, In Proc. Second IEEE Interna-tional Conference on Computer Vision Systems 2004.
Google Scholar
Brand, M., and V. Kettnaker: Discovery and segmentation of activities in video, IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 2000.
Google Scholar
Bregler C.: Twist Based Acquisition and Tracking of Animal and Human Kine-matics, International Journal of Computer Vision, 56(3):179-194, 2004.
Article Google Scholar
Bregler, C.:Learning and Recognizing Human Dynamics in Video Sequences, In Proc. IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 1997.
Google Scholar
Bregler, C. and J. Malik: Tracking people with twists and exponential maps, In Proc. IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 8-15, 1998.
Google Scholar
Campos T.: 3D Hand and Object Tracking for Intention Recognition. DPhil Transfer Report, Robotics Research Group, Department of Engineering Science, University of Oxford, 2003.
Google Scholar
Cham, T., and J. Rehg: A Multiple Hypothesis Approach to Figure Tracking, In Proc. IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 239-245, 1999.
Google Scholar
Chen D., J. Yang, H.: Towards Automatic Analysis of Social Interaction Pat-terns in a Nursing Home Environment from Video, In Proc. ACM Multimedia Information Retrieval, pp. 283-290, 2004.
Google Scholar
Daugman, J.: How Iris Recognition Works, In Proc. IEEE Conference on ICIP, 2002.
Google Scholar
Demirdjian D., T. Ko, and T. Darrell: Untethered Gesture Acquisition and Recognition for Virtual World Manipulation, In Proc. International Conference on Virtual Reality, 2005.
Google Scholar
Deutscher, J., A. Blake, I. Reid; Articulated Body Motion Capture by Annealed Particle Filtering, In Proc. IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2: 1144-1149, 2000.
Google Scholar
Deutscher, J., A. Davison, and I. Reid: Automatic Partitioning of High Di-mensional Search Spaces Associated with Articulated Body Motion Capture, In Proc. IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2: 669-676, 2001.
Google Scholar
Drummond, T., and R. Cipolla: Real-time Tracking of Highly Articulated Struc-tures in the Presence of Noisy Measurements, In Proc. IEEE International Con-ference on Computer Vision, ICCV, 2: 315-320, 2001.
Article Google Scholar
Elias H., O. Carlos, and S. Jesus: Detected motion classification with a double- background and a neighborhood-based difference, Pattern Recognition Letters, 24(12): 2079-2092, 2003.
Article Google Scholar
Fang G., W. Gao and D. Zhao: Large Vocabulary Sign Language Recognition Based on Hierarchical Decision Trees, In Proc. International Conference on Mul-timodal Interfaces, pp. 301-312, 2003
Google Scholar
Ferryman J., A. Adams, S. Velastin, T. Ellis, P. Emagnino, and N. Tyler: REASON: Robust Method for Monitoring and Understanding People in Pub-lic Spaces. Technological Report, Computational Vision Group, University of Reading, 2004.
Google Scholar
Gao H.: Tracking Small and Fast Objects in Noisy Images. Masters Thesis. Computer Science Department, University of Canterbury, 2005.
Google Scholar
Gao J. and J. Shi: Multiple Frame Motion Inference Using Belief Propagation, In Proc. IEEE International Conference on Automatic Face and Gesture Recog-nition, 2004.
Google Scholar
Gavrila, D. and L. Davis: 3-D model-based tracking of humans in action: a multi-view approach, In Proc. IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 73-80, 1996.
Google Scholar
Goncalves, L., E. Di Bernardo, E. Ursella and P. Perona: Monocular Tracking of the Human Arm in 3D, In Proc. IEEE International Conference on Computer Vision, ICCV, 764-770, 1995.
Google Scholar
Green R. and L. Guan: Quantifying and Recognising Human Movement Pat-terns from Monocular Video Images - Part I: A New Framework for Modelling Human Motion, IEEE Transactions on Circuits and Systems for Video Tech-nology, 14(2): 179-190, 2004.
Article Google Scholar
Green R. and L. Guan: Quantifying and Recognising Human Movement Pat-terns from Monocular Video Images - Part II: Application to Biometrics. IEEE Transactions on Circuits and Systems for Video Technology, 14(2): 191-198, 2004.
Article Google Scholar
Grobel, K. and M. Assam: Isolated Sign Language Recognition Using Hidden Markov Models, In Proc. IEEE International Conference on Systems, Man and Cybernetics, pp. 162-167, Orlando, 1997.
Google Scholar
Grossmann E., A. Kale and C. Jaynes: Towards Interactive Generation of “Ground-truth” in Background Subtraction from Partially Labelled Examples, In Proc. IEEE Workshop on VS PETS, 2005.
Google Scholar
Grossmann E., A. Kale, C. Jaynes and S. Cheung: Offline Generation of High Quality Background Subtraction Data, In Proc. British Machine Vision Con-ference, 2005.
Google Scholar
Herbison-Evans, D., R. Green and A. Butt: Computer Animation with NUDES in Dance and Physical Education, Australian Computer Science Communica-tions, 4(1): 324-331, 1982.
Google Scholar
Hogg, D.: Model-based vision: A program to see a walking person, Image and Vision Computing, 1(1): 5-20, 1983.
Google Scholar
Hutchinson-Guest, A.: Choreo-Graphics; A Comparison of Dance Notation Sys- tems from the Fifteenth Century to the Present, Gordon and Breach, New York, 1989.
Google Scholar
Isard, M. and A. Blake: Visual Tracking by Stochastic Propagation of Condi-tional Density, In Proc. Fourth European Conference on Computer Vision, pp. 343-356, Cambridge, 1996.
Google Scholar
Isard, M. and A. Blake: A Mixed-state Condensation Tracker with Automatic Model Switching, In Proc. Sixth International Conference on Computer Vision, pp. 107-112, 1998.
Google Scholar
Jaynes C., A. Kale, N. Sanders, and E. Grossman: The Terrascope Dataset: A Scripted Multi-Camera Indoor Video Surveillance Dataset with Ground-truth, In Proc. IEEE Workshop on VS PETS, 2005.
Google Scholar
Jelinek, F.: Statistical Methods for Speech Recognition, MIT Press, Cambridge, 1999.
Google Scholar
Jeong K. and C. Jaynes: Moving Shadow Detection Using a Combined Geomet- ric and Color Classification Approach, In Proc. IEEE Motion, Breckenridge, 2005.
Google Scholar
Ju, S., M. Black and Y. Yacoob: Cardboard People: A Parameterized Model of Articulated Motion, In Proc. IEEE International Conference on Automatic Face and Gesture Recognition, pp. 38-44, 1996.
Google Scholar
Kadous, M.: Machine recognition of Auslan signs using PowerGloves: Towards large-lexicon recognition of sign language, In Proc. Workshop on the Integration of Gesture in Language and Speech, pp. 165-74, Applied Science and Engineering Laboratories, Newark, 1996.
Google Scholar
Kakadiaris, I. and D. Metaxas: Model-based estimation of 3D human motion with occlusion based on active multi-viewpoint selection, IEEE Conference on Computer Vision and Pattern Recognition, pp. 81-87, 1996.
Google Scholar
Krinidis M., N. Nikolaidis and I. Pitas: Feature-Based Tracking Using 3D Physics-Based Deformable Surface. Department of Informatics, Aristotle Uni-versity of Thessaloniki, 2005.
Google Scholar
Kumar S.: Models for Learning Spatial Interactions in Natural Images for Context-Based Classification. Phd Thesis, The Robotics Institute School of Computer Science Carnegie Mellon University, 2005.
Google Scholar
Leventon, M. and W. Freeman: Bayesian estimation of 3-d human motion from an image sequence, Technical Report 98-06, Mitsubishi Electric Research Lab, Cambridge, 1998.
Google Scholar
Li D., D. Winfield and D. Parkhurst: Starburst: A hybrid algorithm for video-based eye tracking combining feature-based and model-based approaches. Tech-nical Report of Human Computer Interaction Program, Iowa State University, 2005.
Google Scholar
Liang, R. and M. Ouhyoung: A Real-time Continuous Gesture Recognition Sys-tem for Sign Language, In Proc. Third International Conference on Automatic Face and Gesture Recognition, pp. 558-565, Nara, 1998.
Google Scholar
Liddell, S. and R. Johnson: American Sign Language: the phonological base, Sign Language Studies, 64: 195-277, 1989.
Google Scholar
Liebowitz, D. and S. Carlsson: Uncalibrated Motion Capture Exploiting Artic-ulated Structure Constraints, In Proc. IEEE International Conference on Com-puter Vision, ICCV, 2001.
Google Scholar
Lukowicz, P., J. Ward, H. Junker, M. Stager, G. Troster, A. Atrash, and T. Starner: Recognising Workshop Activity Using Body Worn Microphones and Accelerometers, In Proc. Second International Conference on Pervasive Computing, pp. 18-22, 2004.
Google Scholar
MacCormick, J. and M. Isard: Partitioned Sampling, Articulated Objects and Interface-quality Hand Tracking, In Proc. European Conference on Computer Vision, 2: 3-19, 2000.
Google Scholar
Makris D.: Learning an Activity Based Semantic Scene Model. PhD Thesis, School of Engineering and Mathematical Science, City University, 2004.
Google Scholar
Mark J. Body Tracking from Single-Camera Video. Technical Report of Mit- subishi Electric Research Laboratories, 2004.
Google Scholar
Moeslund, T. and E. Granum: A survey of computer vision-based human motion capture, Computer Vision and Image Understanding, 18: 231-268, 2001.
Article Google Scholar
Nam Y. and K. Wohn: Recognition of space-time hand-gestures using hidden Markov model, ACM Symposium on Virtual Reality Software and Technology, 1996.
Google Scholar
Pentland, A. and B. Horowitz: Recovery of nonrigid motion and structure, IEEE Transactions on PAMI, 13:730-742, 1991.
Google Scholar
Pheasant, S. Bodyspace. Anthropometry, Ergonomics and the Design of Work, Taylor & Francis, 1996.
Google Scholar
Plnkers, R. and P. Fua: Articulated Soft Objects for Video-based Body Mod-elling, In Proc. IEEE International Conference on Computer Vision, ICCV, pp. 394-401, 2001.
Google Scholar
Rehg, J. and T. Kanade: Model-based Tracking of Self-occluding Articulated Objects, In Proc. Fifth International Conference on Computer Vision, pp. 612-617, 1995.
Google Scholar
Remondino F. and A. Roditakis: Human Figure Reconstruction and Modelling from Single Image or Monocular Video Sequence, In Proc. Fourth International Conference on 3D Digital Image and Modelling, 2003.
Google Scholar
Ren, J., J. Orwell, G. Jones, and M. Xu: A General Framework for 3D Soccer Ball Estimations and Tracking, IEEE Transactions on Image Processing, 24-27, 2004.
Google Scholar
Rittscher, J., A. Blake and S. Roberts: Towards the automatic analysis of com-plex human body motions, Image and Vision Computing, 20(12): 905-916, 2002.
Article Google Scholar
Rohr, K. Towards model-based recognition of human movements in image se-quences, CVGIP - Image Understanding, 59(1):94-115, 1994.
Article Google Scholar
Rosales, R. and S. Sclaroff: Inferring Body Pose Without Tracking Body Parts, In Proc. IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2000.
Google Scholar
Schlenzig, J., E. Hunter, and R. Jain: Recursive Identification of Gesture Input-ers Using Hidden Markov Models, In Proc. Applications of Computer Vision, 187-194, 1994.
Google Scholar
Schrotter G., A. Gruen, E. Casanova, and P. Fua: Markerless Model Based Surface Measurement and Motion Tracking, In Proc. Seventh conference on Optical 3D Measurement Techniques, Zurich, 2005.
Google Scholar
Sigal, L., S. Bhatia S., Roth, M. Black, and M. Isard: Tracking Loose-limbed People, In Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2004.
Google Scholar
Starner, T. and A. Pentland: Real-time American Sign Language recognition from video using Hidden Markov Models, Technical Report 375, MIT Media Laboratory, 1996.
Google Scholar
Stokoe, W.: Sign Language Structure: An Outline of the Visual Communication System of the American Deaf, Studies in Linguistics: Chapter 8. Linstok Press, Silver Spring, MD, 1960. Revised 1978.
Google Scholar
Sullivan, J., A. Blake, M. Isard, and J. MacCormick: Object Localization by Bayessian Correlation, In Proc. International Conference on Computer Vision, 2: 1068-1075, 1999.
Article Google Scholar
Tamura, S., and S. Kawasaki: Recognition of sign language motion images, Pat- tern Recognition, 31: 343-353, 1988.
Article Google Scholar
Taylor, C. Reconstruction of Articulated Objects from Point Correspondences in a Single Articulated Image, In Proc. IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 586-591, 2000.
Google Scholar
Urtasun R. and P. Fua: (2004) 3D Human Body Tracking using Deterministic Temporal Motion Models, Technical Report of Computer Vision Laboratory, EPFL, Lausanne, 2004.
Google Scholar
Vogler, C. and D. Metaxas: Adapting hidden Markov Models for ASL Recogni-tion by Using Three-dimensional Computer Vision Methods, In Proc. IEEE In-ternational Conference on Systems, Man and Cybernetics, pp. 156-161, Orlando, 1997.
Google Scholar
Vogler, C. and D. Metaxas: ASL Recognition Based on a Coupling Between HMMs and 3D Motion Analysis, In Proc. IEEE International Conference on Computer Vision, pp. 363-369, Mumbai, 1998.
Google Scholar
Vogler, C. and D. Metaxas: Toward scalability in ASL recognition: breaking down signs into phonemes, Gesture Workshop 99, Gif-sur-Yvette, 1999.
Google Scholar
Wachter, S. and H. Nagel, Tracking of persons in monocular image sequences, Computer Vision and Image Understanding, 74(3):174-192, 1999.
Article Google Scholar
Waldron, M. and S. Kim, Isolated ASL sign recognition system for deaf persons, IEEE Transactions on Rehabilitation Engineering, 3(3):261-71, 1995.
Article Google Scholar
Wang, J., G. Lorette, and P. Bouthemy, Analysis of Human Motion: A Modelbased Approach, In Proc. Scandinavian Conference on Image Analysis, 2:1142-1149, 1991.
Google Scholar
Wren, C., A. Azarbayejani, T. Darrell and A. Pentland, “Pfinder: Real-time tracking of the human body”, IEEE Transactions on PAMI, 19(7):780-785, 1997.
Google Scholar
Yamato, J., J. Ohya, and K. Ishii, Recognizing Human Action in Time-sequential Images Using Hidden Markov Models, In Proc. IEEE International Conference on Computer Vision, pp. 379-385, 1992.
Google Scholar
Zhong H., J. Shi, and M. Visontai: Detecting Unusual Activity in Video, In Proc.IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004.
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, University of Canterbury, Christchurch, New Zealand
Richard Green

Authors

Richard Green
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Max-Planck Institute for Computer Science, Stuhlsatzhausenweg 85, D-66123, Saarbrücken, Germany
Bodo Rosenhahn
The University of Auckland, New Zealand
Reinhard Klette
Rutgers University, Piscataway, NJ, USA
Dimitris Metaxas

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Green, R. (2008). Spatially and Temporally Segmenting Movement to Recognize Actions. In: Rosenhahn, B., Klette, R., Metaxas, D. (eds) Human Motion. Computational Imaging and Vision, vol 36. Springer, Dordrecht. https://doi.org/10.1007/978-1-4020-6693-1_9

Download citation

DOI: https://doi.org/10.1007/978-1-4020-6693-1_9
Publisher Name: Springer, Dordrecht
Print ISBN: 978-1-4020-6692-4
Online ISBN: 978-1-4020-6693-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics