Skip to main content

Spatially and Temporally Segmenting Movement to Recognize Actions

  • Chapter
Human Motion

Part of the book series: Computational Imaging and Vision ((CIVI,volume 36))

  • 2878 Accesses

This chapter presents a Continuous Movement Recognition (CMR) framework which forms a basis for segmenting continuous human motion to recognize actions as demonstrated through the tracking and recognition of hundreds of skills from gait to twisting summersaults. A novel 3D color clone-body-model is dynamically sized and texture mapped to each person for more robust tracking of both edges and textured regions. Tracking is further stabilized by estimating the joint angles for the next frame using a forward smoothing Particle filter with the search space optimized by utilizing feedback from the CMR system. A new paradigm defines an alphabet of dynemes being small units of movement, to enable recognition of diverse actions. Using multiple Hidden Markov Models, the CMR system attempts to infer the action that could have produced the observed sequence of dynemes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abdelkader, M., R. Chellappa, Q. Zheng, and A. Chan: Integrated Motion De-tection and Tracking for Visual Surveillance, In Proc. Fourth IEEE International Conference on Computer Vision Systems, pp. 28-36, 2006

    Google Scholar 

  2. Aggarwal A., S. Biswas, S. Singh, S. Sural, and A. Majumdar: Object Tracking Using Background Subtraction and Motion Estimation in MPEG Videos, In Proc. Asian Conference on Computer Vision, pp. 121-130, 2006.

    Google Scholar 

  3. Badler, N., C. Phillips and B. Webber: Simulating Humans. Oxford University Press, New York, pp. 23-65, 1993.

    MATH  Google Scholar 

  4. Bauckhage C., M. Hanheide, S. Wrede and G. Sagerer: A Cognitive Vision System for Action Recognition in Office Environment, In Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 827-833, 2004.

    Google Scholar 

  5. Bhatia S., L. Sigal, M. Isard, and M. Black: 3D Human Limb Detection using Space Carving and Multi-view Eigen Models, In Proc. Second IEEE Interna-tional Conference on Computer Vision Systems 2004.

    Google Scholar 

  6. Brand, M., and V. Kettnaker: Discovery and segmentation of activities in video, IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 2000.

    Google Scholar 

  7. Bregler C.: Twist Based Acquisition and Tracking of Animal and Human Kine-matics, International Journal of Computer Vision, 56(3):179-194, 2004.

    Article  Google Scholar 

  8. Bregler, C.:Learning and Recognizing Human Dynamics in Video Sequences, In Proc. IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 1997.

    Google Scholar 

  9. Bregler, C. and J. Malik: Tracking people with twists and exponential maps, In Proc. IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 8-15, 1998.

    Google Scholar 

  10. Campos T.: 3D Hand and Object Tracking for Intention Recognition. DPhil Transfer Report, Robotics Research Group, Department of Engineering Science, University of Oxford, 2003.

    Google Scholar 

  11. Cham, T., and J. Rehg: A Multiple Hypothesis Approach to Figure Tracking, In Proc. IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 239-245, 1999.

    Google Scholar 

  12. Chen D., J. Yang, H.: Towards Automatic Analysis of Social Interaction Pat-terns in a Nursing Home Environment from Video, In Proc. ACM Multimedia Information Retrieval, pp. 283-290, 2004.

    Google Scholar 

  13. Daugman, J.: How Iris Recognition Works, In Proc. IEEE Conference on ICIP, 2002.

    Google Scholar 

  14. Demirdjian D., T. Ko, and T. Darrell: Untethered Gesture Acquisition and Recognition for Virtual World Manipulation, In Proc. International Conference on Virtual Reality, 2005.

    Google Scholar 

  15. Deutscher, J., A. Blake, I. Reid; Articulated Body Motion Capture by Annealed Particle Filtering, In Proc. IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2: 1144-1149, 2000.

    Google Scholar 

  16. Deutscher, J., A. Davison, and I. Reid: Automatic Partitioning of High Di-mensional Search Spaces Associated with Articulated Body Motion Capture, In Proc. IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2: 669-676, 2001.

    Google Scholar 

  17. Drummond, T., and R. Cipolla: Real-time Tracking of Highly Articulated Struc-tures in the Presence of Noisy Measurements, In Proc. IEEE International Con-ference on Computer Vision, ICCV, 2: 315-320, 2001.

    Article  Google Scholar 

  18. Elias H., O. Carlos, and S. Jesus: Detected motion classification with a double- background and a neighborhood-based difference, Pattern Recognition Letters, 24(12): 2079-2092, 2003.

    Article  Google Scholar 

  19. Fang G., W. Gao and D. Zhao: Large Vocabulary Sign Language Recognition Based on Hierarchical Decision Trees, In Proc. International Conference on Mul-timodal Interfaces, pp. 301-312, 2003

    Google Scholar 

  20. Ferryman J., A. Adams, S. Velastin, T. Ellis, P. Emagnino, and N. Tyler: REASON: Robust Method for Monitoring and Understanding People in Pub-lic Spaces. Technological Report, Computational Vision Group, University of Reading, 2004.

    Google Scholar 

  21. Gao H.: Tracking Small and Fast Objects in Noisy Images. Masters Thesis. Computer Science Department, University of Canterbury, 2005.

    Google Scholar 

  22. Gao J. and J. Shi: Multiple Frame Motion Inference Using Belief Propagation, In Proc. IEEE International Conference on Automatic Face and Gesture Recog-nition, 2004.

    Google Scholar 

  23. Gavrila, D. and L. Davis: 3-D model-based tracking of humans in action: a multi-view approach, In Proc. IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 73-80, 1996.

    Google Scholar 

  24. Goncalves, L., E. Di Bernardo, E. Ursella and P. Perona: Monocular Tracking of the Human Arm in 3D, In Proc. IEEE International Conference on Computer Vision, ICCV, 764-770, 1995.

    Google Scholar 

  25. Green R. and L. Guan: Quantifying and Recognising Human Movement Pat-terns from Monocular Video Images - Part I: A New Framework for Modelling Human Motion, IEEE Transactions on Circuits and Systems for Video Tech-nology, 14(2): 179-190, 2004.

    Article  Google Scholar 

  26. Green R. and L. Guan: Quantifying and Recognising Human Movement Pat-terns from Monocular Video Images - Part II: Application to Biometrics. IEEE Transactions on Circuits and Systems for Video Technology, 14(2): 191-198, 2004.

    Article  Google Scholar 

  27. Grobel, K. and M. Assam: Isolated Sign Language Recognition Using Hidden Markov Models, In Proc. IEEE International Conference on Systems, Man and Cybernetics, pp. 162-167, Orlando, 1997.

    Google Scholar 

  28. Grossmann E., A. Kale and C. Jaynes: Towards Interactive Generation of “Ground-truth” in Background Subtraction from Partially Labelled Examples, In Proc. IEEE Workshop on VS PETS, 2005.

    Google Scholar 

  29. Grossmann E., A. Kale, C. Jaynes and S. Cheung: Offline Generation of High Quality Background Subtraction Data, In Proc. British Machine Vision Con-ference, 2005.

    Google Scholar 

  30. Herbison-Evans, D., R. Green and A. Butt: Computer Animation with NUDES in Dance and Physical Education, Australian Computer Science Communica-tions, 4(1): 324-331, 1982.

    Google Scholar 

  31. Hogg, D.: Model-based vision: A program to see a walking person, Image and Vision Computing, 1(1): 5-20, 1983.

    Google Scholar 

  32. Hutchinson-Guest, A.: Choreo-Graphics; A Comparison of Dance Notation Sys- tems from the Fifteenth Century to the Present, Gordon and Breach, New York, 1989.

    Google Scholar 

  33. Isard, M. and A. Blake: Visual Tracking by Stochastic Propagation of Condi-tional Density, In Proc. Fourth European Conference on Computer Vision, pp. 343-356, Cambridge, 1996.

    Google Scholar 

  34. Isard, M. and A. Blake: A Mixed-state Condensation Tracker with Automatic Model Switching, In Proc. Sixth International Conference on Computer Vision, pp. 107-112, 1998.

    Google Scholar 

  35. Jaynes C., A. Kale, N. Sanders, and E. Grossman: The Terrascope Dataset: A Scripted Multi-Camera Indoor Video Surveillance Dataset with Ground-truth, In Proc. IEEE Workshop on VS PETS, 2005.

    Google Scholar 

  36. Jelinek, F.: Statistical Methods for Speech Recognition, MIT Press, Cambridge, 1999.

    Google Scholar 

  37. Jeong K. and C. Jaynes: Moving Shadow Detection Using a Combined Geomet- ric and Color Classification Approach, In Proc. IEEE Motion, Breckenridge, 2005.

    Google Scholar 

  38. Ju, S., M. Black and Y. Yacoob: Cardboard People: A Parameterized Model of Articulated Motion, In Proc. IEEE International Conference on Automatic Face and Gesture Recognition, pp. 38-44, 1996.

    Google Scholar 

  39. Kadous, M.: Machine recognition of Auslan signs using PowerGloves: Towards large-lexicon recognition of sign language, In Proc. Workshop on the Integration of Gesture in Language and Speech, pp. 165-74, Applied Science and Engineering Laboratories, Newark, 1996.

    Google Scholar 

  40. Kakadiaris, I. and D. Metaxas: Model-based estimation of 3D human motion with occlusion based on active multi-viewpoint selection, IEEE Conference on Computer Vision and Pattern Recognition, pp. 81-87, 1996.

    Google Scholar 

  41. Krinidis M., N. Nikolaidis and I. Pitas: Feature-Based Tracking Using 3D Physics-Based Deformable Surface. Department of Informatics, Aristotle Uni-versity of Thessaloniki, 2005.

    Google Scholar 

  42. Kumar S.: Models for Learning Spatial Interactions in Natural Images for Context-Based Classification. Phd Thesis, The Robotics Institute School of Computer Science Carnegie Mellon University, 2005.

    Google Scholar 

  43. Leventon, M. and W. Freeman: Bayesian estimation of 3-d human motion from an image sequence, Technical Report 98-06, Mitsubishi Electric Research Lab, Cambridge, 1998.

    Google Scholar 

  44. Li D., D. Winfield and D. Parkhurst: Starburst: A hybrid algorithm for video-based eye tracking combining feature-based and model-based approaches. Tech-nical Report of Human Computer Interaction Program, Iowa State University, 2005.

    Google Scholar 

  45. Liang, R. and M. Ouhyoung: A Real-time Continuous Gesture Recognition Sys-tem for Sign Language, In Proc. Third International Conference on Automatic Face and Gesture Recognition, pp. 558-565, Nara, 1998.

    Google Scholar 

  46. Liddell, S. and R. Johnson: American Sign Language: the phonological base, Sign Language Studies, 64: 195-277, 1989.

    Google Scholar 

  47. Liebowitz, D. and S. Carlsson: Uncalibrated Motion Capture Exploiting Artic-ulated Structure Constraints, In Proc. IEEE International Conference on Com-puter Vision, ICCV, 2001.

    Google Scholar 

  48. Lukowicz, P., J. Ward, H. Junker, M. Stager, G. Troster, A. Atrash, and T. Starner: Recognising Workshop Activity Using Body Worn Microphones and Accelerometers, In Proc. Second International Conference on Pervasive Computing, pp. 18-22, 2004.

    Google Scholar 

  49. MacCormick, J. and M. Isard: Partitioned Sampling, Articulated Objects and Interface-quality Hand Tracking, In Proc. European Conference on Computer Vision, 2: 3-19, 2000.

    Google Scholar 

  50. Makris D.: Learning an Activity Based Semantic Scene Model. PhD Thesis, School of Engineering and Mathematical Science, City University, 2004.

    Google Scholar 

  51. Mark J. Body Tracking from Single-Camera Video. Technical Report of Mit- subishi Electric Research Laboratories, 2004.

    Google Scholar 

  52. Moeslund, T. and E. Granum: A survey of computer vision-based human motion capture, Computer Vision and Image Understanding, 18: 231-268, 2001.

    Article  Google Scholar 

  53. Nam Y. and K. Wohn: Recognition of space-time hand-gestures using hidden Markov model, ACM Symposium on Virtual Reality Software and Technology, 1996.

    Google Scholar 

  54. Pentland, A. and B. Horowitz: Recovery of nonrigid motion and structure, IEEE Transactions on PAMI, 13:730-742, 1991.

    Google Scholar 

  55. Pheasant, S. Bodyspace. Anthropometry, Ergonomics and the Design of Work, Taylor & Francis, 1996.

    Google Scholar 

  56. Plnkers, R. and P. Fua: Articulated Soft Objects for Video-based Body Mod-elling, In Proc. IEEE International Conference on Computer Vision, ICCV, pp. 394-401, 2001.

    Google Scholar 

  57. Rehg, J. and T. Kanade: Model-based Tracking of Self-occluding Articulated Objects, In Proc. Fifth International Conference on Computer Vision, pp. 612-617, 1995.

    Google Scholar 

  58. Remondino F. and A. Roditakis: Human Figure Reconstruction and Modelling from Single Image or Monocular Video Sequence, In Proc. Fourth International Conference on 3D Digital Image and Modelling, 2003.

    Google Scholar 

  59. Ren, J., J. Orwell, G. Jones, and M. Xu: A General Framework for 3D Soccer Ball Estimations and Tracking, IEEE Transactions on Image Processing, 24-27, 2004.

    Google Scholar 

  60. Rittscher, J., A. Blake and S. Roberts: Towards the automatic analysis of com-plex human body motions, Image and Vision Computing, 20(12): 905-916, 2002.

    Article  Google Scholar 

  61. Rohr, K. Towards model-based recognition of human movements in image se-quences, CVGIP - Image Understanding, 59(1):94-115, 1994.

    Article  Google Scholar 

  62. Rosales, R. and S. Sclaroff: Inferring Body Pose Without Tracking Body Parts, In Proc. IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2000.

    Google Scholar 

  63. Schlenzig, J., E. Hunter, and R. Jain: Recursive Identification of Gesture Input-ers Using Hidden Markov Models, In Proc. Applications of Computer Vision, 187-194, 1994.

    Google Scholar 

  64. Schrotter G., A. Gruen, E. Casanova, and P. Fua: Markerless Model Based Surface Measurement and Motion Tracking, In Proc. Seventh conference on Optical 3D Measurement Techniques, Zurich, 2005.

    Google Scholar 

  65. Sigal, L., S. Bhatia S., Roth, M. Black, and M. Isard: Tracking Loose-limbed People, In Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2004.

    Google Scholar 

  66. Starner, T. and A. Pentland: Real-time American Sign Language recognition from video using Hidden Markov Models, Technical Report 375, MIT Media Laboratory, 1996.

    Google Scholar 

  67. Stokoe, W.: Sign Language Structure: An Outline of the Visual Communication System of the American Deaf, Studies in Linguistics: Chapter 8. Linstok Press, Silver Spring, MD, 1960. Revised 1978.

    Google Scholar 

  68. Sullivan, J., A. Blake, M. Isard, and J. MacCormick: Object Localization by Bayessian Correlation, In Proc. International Conference on Computer Vision, 2: 1068-1075, 1999.

    Article  Google Scholar 

  69. Tamura, S., and S. Kawasaki: Recognition of sign language motion images, Pat- tern Recognition, 31: 343-353, 1988.

    Article  Google Scholar 

  70. Taylor, C. Reconstruction of Articulated Objects from Point Correspondences in a Single Articulated Image, In Proc. IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 586-591, 2000.

    Google Scholar 

  71. Urtasun R. and P. Fua: (2004) 3D Human Body Tracking using Deterministic Temporal Motion Models, Technical Report of Computer Vision Laboratory, EPFL, Lausanne, 2004.

    Google Scholar 

  72. Vogler, C. and D. Metaxas: Adapting hidden Markov Models for ASL Recogni-tion by Using Three-dimensional Computer Vision Methods, In Proc. IEEE In-ternational Conference on Systems, Man and Cybernetics, pp. 156-161, Orlando, 1997.

    Google Scholar 

  73. Vogler, C. and D. Metaxas: ASL Recognition Based on a Coupling Between HMMs and 3D Motion Analysis, In Proc. IEEE International Conference on Computer Vision, pp. 363-369, Mumbai, 1998.

    Google Scholar 

  74. Vogler, C. and D. Metaxas: Toward scalability in ASL recognition: breaking down signs into phonemes, Gesture Workshop 99, Gif-sur-Yvette, 1999.

    Google Scholar 

  75. Wachter, S. and H. Nagel, Tracking of persons in monocular image sequences, Computer Vision and Image Understanding, 74(3):174-192, 1999.

    Article  Google Scholar 

  76. Waldron, M. and S. Kim, Isolated ASL sign recognition system for deaf persons, IEEE Transactions on Rehabilitation Engineering, 3(3):261-71, 1995.

    Article  Google Scholar 

  77. Wang, J., G. Lorette, and P. Bouthemy, Analysis of Human Motion: A Modelbased Approach, In Proc. Scandinavian Conference on Image Analysis, 2:1142-1149, 1991.

    Google Scholar 

  78. Wren, C., A. Azarbayejani, T. Darrell and A. Pentland, “Pfinder: Real-time tracking of the human body”, IEEE Transactions on PAMI, 19(7):780-785, 1997.

    Google Scholar 

  79. Yamato, J., J. Ohya, and K. Ishii, Recognizing Human Action in Time-sequential Images Using Hidden Markov Models, In Proc. IEEE International Conference on Computer Vision, pp. 379-385, 1992.

    Google Scholar 

  80. Zhong H., J. Shi, and M. Visontai: Detecting Unusual Activity in Video, In Proc.IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer

About this chapter

Cite this chapter

Green, R. (2008). Spatially and Temporally Segmenting Movement to Recognize Actions. In: Rosenhahn, B., Klette, R., Metaxas, D. (eds) Human Motion. Computational Imaging and Vision, vol 36. Springer, Dordrecht. https://doi.org/10.1007/978-1-4020-6693-1_9

Download citation

  • DOI: https://doi.org/10.1007/978-1-4020-6693-1_9

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-1-4020-6692-4

  • Online ISBN: 978-1-4020-6693-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics