Automatic Human Activity Segmentation and Labeling in RGBD Videos

  • David JardimEmail author
  • Luís Nunes
  • Miguel Sales Dias
Conference paper
Part of the Smart Innovation, Systems and Technologies book series (SIST, volume 56)


Human activity recognition has become one of the most active research topics in image processing and pattern recognition. Manual analysis of video is labour intensive, fatiguing, and error prone. Solving the problem of recognizing human activities from video can lead to improvements in several application fields like surveillance systems, human computer interfaces, sports video analysis, digital shopping assistants, video retrieval, gaming and health-care. This paper aims to recognize an action performed in a sequence of continuous actions recorded with a Kinect sensor based on the information about the position of the main skeleton joints. The typical approach is to use manually labeled data to perform supervised training. In this paper we propose a method to perform automatic temporal segmentation in order to separate the sequence in a set of actions. By measuring the amount of movement that occurs in each joint of the skeleton we are able to find temporal segments that represent the singular actions. We also proposed an automatic labeling method of human actions using a clustering algorithm on a subset of the available features.


Human motion analysis Motion-based recognition Action recognition Temporal segmentation Clustering K-means Labeling Kinect Joints Video sequences 


  1. 1.
    Aggarwal, J.K., Ryoo, M.S.: Human activity analysis: a review. ACM Comput. Surv. 43(3), 1–43 (2011)CrossRefGoogle Scholar
  2. 2.
    Bobick, A.F., Wilson, A.D.: A state-based approach to the representation and recognition of gesture. IEEE Trans. Pattern Anal. Mach. Intell. 19(12), 1325–1337 (1997)CrossRefGoogle Scholar
  3. 3.
    Damen, D., Hogg, D.: Recognizing linked events: searching the space of feasible explanations. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 927–934 (2009).
  4. 4.
    Gavrila, D.: The visual analysis of human movement: a survey. Comput. Vis. Image Underst. 73(1), 82–98 (1999). Google Scholar
  5. 5.
    Gowayyed, M.A., Torki, M., Hussein, M.E., El-Saban, M.: Histogram of oriented displacements (HOD): describing trajectories of human joints for action recognition. In: International Joint Conference on Artificial Intelligence, vol. 25, pp. 1351–1357 (2013)Google Scholar
  6. 6.
    Gupta, A., Srinivasan, P., Shi, J., Davis, L.S.: Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos. In: IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009, pp. 2012–2019 (2009)Google Scholar
  7. 7.
    Hussein, M.E., Torki, M., Gowayyed, M.a., El-Saban, M.: Human action recognition using a temporal hierarchy of covariance descriptors on 3D joint locations. In: International Joint Conference on Artificial Intelligence pp. 2466–2472 (2013)Google Scholar
  8. 8.
    Intille, S.S., Bobick, A.F.: A framework for recognizing multi-agent action from visual evidence. In: Proceedings of the Sixteenth National Conference on Artificial Intelligence and the Eleventh Innovative Applications of Artificial Intelligence Conference Innovative Applications of Artificial Intelligence, vol. 489, pp. 518–525 (1999).
  9. 9.
    Keller, C.G., Dang, T., Fritz, H., Joos, A., Rabe, C., Gavrila, D.M.: Active pedestrian safety by automatic braking and evasive steering. IEEE Trans. Intell. Transp. Syst. 12(4), 1292–1304 (2011). Google Scholar
  10. 10.
    Koppula, H., Gupta, R., Saxena, A.: Learning human activities and object affordances from RGB-D videos. Int. J. Robot. Res. 32(8), 951–970 (2013). arxiv:1210.1207v2
  11. 11.
    Nirjon, S., Greenwood, C., Torres, C., Zhou, S., Stankovic, J.a., Yoon, H.J., Ra, H.K., Basaran, C., Park, T., Son, S.H.: Kintense: A robust, accurate, real-time and evolving system for detecting aggressive actions from streaming 3D skeleton data. In: 2014 IEEE International Conference on Pervasive Computing and Communications, PerCom 2014 pp. 2–10 (2014).
  12. 12.
    Niu, W., Long, J., Han, D., Wang, Y.F.: Human activity detection and recognition for video surveillance. In: 2004 IEEE International Conference on Multimedia and Exp (ICME), vols. 1-3. pp. 719–722 (2004)Google Scholar
  13. 13.
    O’Rourke, J., Badler, N.: Model-based image analysis of human motion using constraint propagation. IEEE Trans. Pattern Anal. Mach. Intell. 6(6), 522–536 (1980). Google Scholar
  14. 14.
    Pinhanez, C.S., Bobick, A.F.: Human action detection using pnf propagation of temporal constraints. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 898–904. IEEE (1998)Google Scholar
  15. 15.
    Popa, M., Kemal Koc, A., Rothkrantz, L.J.M., Shan, C., Wiggers, P.: Kinect sensing of shopping related actions. Commun. Comput. Inf. Sci. 277 CCIS, 91–100 (2012)Google Scholar
  16. 16.
    Rashid, R.F.: Towards a system for the interpretation of moving light displays. IEEE Trans. Pattern Anal. Mach. Intell. 2(6), 574–581 (1980).
  17. 17.
    Ryoo, M.S., Aggarwal, J.K.: Semantic representation and recognition of continued and recursive human activities. Int. J. Comput. Vis. 82(1), 1–24 (2009). Google Scholar
  18. 18.
    Starner, T., Weaver, J., Pentland, A.: Real-time american sign language recognition using desk and wearable computer based video. Trans. Pattern Anal. Mach. Intell. 20(466), 1371–1375 (1998). Google Scholar
  19. 19.
    Wolf, C., Mille, J., Lombardi, E., Celiktutan, O., Jiu, M., Dogan, E., Eren, G., Baccouche, M., Dellandrea, E., Bichot, C.E., Garcia, C., Sankur, B.: Evaluation of video activity localizations integrating quality and quantity measurements. Comput. Vis. Image Underst. 127, 14–30 (2014). Google Scholar
  20. 20.
  21. 21.
    Yu, E., Aggarwal, J.K.: Detection of fence climbing from monocular video. In: 18th international conference on pattern recognition, vol. 1, pp. 375–378 (2006).
  22. 22.
    Zhou, F., Torre, F.D.L., Hodgins, J.: Hierarchical aligned cluster analysis (HACA) for temporal segmentation of human motion. IEEE Trans. Pattern Anal. Mach. Intell. 35(3), 1–40 (2010).

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • David Jardim
    • 1
    • 2
    • 3
    • 4
    Email author
  • Luís Nunes
    • 2
    • 3
  • Miguel Sales Dias
    • 1
    • 3
    • 4
  1. 1.MLDCLisbonPortugal
  2. 2.Instituto de TelecomunicaçõesLisbonPortugal
  3. 3.University Institute of Lisbon (ISCTE-IUL)LisbonPortugal
  4. 4.ISTAR-IULLisbonPortugal

Personalised recommendations