Multimedia Tools and Applications

, Volume 62, Issue 3, pp 561–580 | Cite as

Human action segmentation and classification based on the Isomap algorithm

  • Yu-Ming LiangEmail author
  • Sheng-Wen Shih
  • Arthur Chun-Chieh Shih


Visual analysis of human behavior has attracted a great deal of attention in the field of computer vision because of the wide variety of potential applications. Human behavior can be segmented into atomic actions, each of which indicates a single, basic movement. To reduce human intervention in the analysis of human behavior, unsupervised learning may be more suitable than supervised learning. However, the complex nature of human behavior analysis makes unsupervised learning a challenging task. In this paper, we propose a framework for the unsupervised analysis of human behavior based on manifold learning. First, a pairwise human posture distance matrix is derived from a training action sequence. Then, the isometric feature mapping (Isomap) algorithm is applied to construct a low-dimensional structure from the distance matrix. Consequently, the training action sequence is mapped into a manifold trajectory in the Isomap space. To identify the break points between the trajectories of any two successive atomic actions, we represent the manifold trajectory in the Isomap space as a time series of low-dimensional points. A temporal segmentation technique is then applied to segment the time series into sub series, each of which corresponds to an atomic action. Next, the dynamic time warping (DTW) approach is used to cluster atomic action sequences. Finally, we use the clustering results to learn and classify atomic actions according to the nearest neighbor rule. If the distance between the input sequence and the nearest mean sequence is greater than a given threshold, it is regarded as an unknown atomic action. Experiments conducted on real data demonstrate the effectiveness of the proposed method.


Human behavior analysis Unsupervised learning Manifold learning Isomap algorithm 



The authors would like to thank the National Science Council, Taiwan under Contract NSC 99-2632-H-156-001-MY3.


  1. 1.
    Aggarwal JK, Cai Q (1999) Human motion analysis: a review. Comput Vis Image Understand 73(3):428–440CrossRefGoogle Scholar
  2. 2.
    Belkin M, Niyogi P (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput 15(6):1373–1396zbMATHCrossRefGoogle Scholar
  3. 3.
    Belongie S, Malik J, Puzicha J (2002) Shape matching and object recognition using shape contexts. IEEE Trans Pattern Anal Mach Intell 24(24):509–522CrossRefGoogle Scholar
  4. 4.
    Blackburn J, Ribeiro E (2007) Human motion recognition using Isomap and dynamic time warping. Proceedings of the Second Workshop on Human Motion, pp285–298Google Scholar
  5. 5.
    Blank M, Gorelick L, Shechtman E, Irani M, Barsi R (2005) Actions as space-time shapes. Proc IEEE Int Conf Comput Vis 2:1395–1402Google Scholar
  6. 6.
    Cock KD, Moor BD (2000) Subspace angles and distances between ARMA models. Proceedings of the Fourteenth International Symposium of Mathematical Theory of Networks and SystemsGoogle Scholar
  7. 7.
    Collins RT, Lipton AJ, Kanade T (2000) Introduction to the special section on video surveillance. IEEE Trans Pattern Anal Mach Intell 22(8):745–746CrossRefGoogle Scholar
  8. 8.
    Cox TF, Cox MAA (2011) Multidimensional scaling. Chapman and HallGoogle Scholar
  9. 9.
    Cutler R, Davis L (2000) Robust real-time periodic motion detection, analysis, and applications. IEEE Trans Pattern Anal Mach Intell 22(8):781–796CrossRefGoogle Scholar
  10. 10.
    Dhillon IS (2001) Co-clustering documents and words using bipartite spectral graph partitioning. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp269–274Google Scholar
  11. 11.
    Elgammal A, Lee CS (2004) Inferring 3D body pose from silhouettes using activity manifold learning. Proc IEEE Comput Soc Conf Comput Vis Pattern Recog 2:681–688Google Scholar
  12. 12.
    Gavrila DM (1999) The visual analysis of human movement: a survey. Comput Vis Image Understand 73(1):82–98zbMATHCrossRefGoogle Scholar
  13. 13.
    Hsieh JW, Hsu YT, Mark Liao HY, Chen CC (2008) Video-based human movement analysis and its application to surveillance systems. IEEE Trans Multimed 10(3):372–384CrossRefGoogle Scholar
  14. 14.
    Jain AK, Murthy MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31:264–323CrossRefGoogle Scholar
  15. 15.
    Law MHC, Jain AK (2006) Incremental nonlinear dimensionality reduction by manifold learning. IEEE Trans Pattern Anal Mach Intell 28(3):377–391CrossRefGoogle Scholar
  16. 16.
    Liang YM, Shih SW, Shih ACC, Liao HYM, Lin CC (2009) Learning atomic human action using variable-length Markov models. IEEE Trans Syst Man Cybern B 39(1):268–280CrossRefGoogle Scholar
  17. 17.
    Lin T, Zha H (2008) Riemannian manifold learning. IEEE Trans Pattern Anal Mach Intell 30(5):796–809CrossRefGoogle Scholar
  18. 18.
    Miyamori H, Iisaku S (2000) Video annotation for content-based retrieval using human behavior analysis and domain knowledge. Proceedings of IEEE International Conference on Automatic Face and Gesture Recognition, Grenoble, France, 320–325Google Scholar
  19. 19.
    Morariu VI, Camps OI (2006) Modeling correspondences for multi-camera tracking using nonlinear manifold learning and target dynamics. Proc IEEE Comput Soc Conf Comput Vis Pattern Recog 1:545–552Google Scholar
  20. 20.
    Nevill-Manning CG, Witten IH (2000) On-line and off-line heuristics for inferring hierarchies of repetitions in sequence. Proc IEEE 88(11):1745–1755CrossRefGoogle Scholar
  21. 21.
    Niebles JC, Wang H, Li FF (2008) Unsupervised learning of human action categories using spatial-temporal words. Int J Comput Vis 79(3):299–318CrossRefGoogle Scholar
  22. 22.
    Rabiner L, Juan BH (1993) Fundamentals of speech recognition. Prentice-Hall Signal Processing SeriesGoogle Scholar
  23. 23.
    Rane N, Birchfield S (2007) Isomap tracking with particle filtering. Proc IEEE Int Conf Image Process 2:513–516Google Scholar
  24. 24.
    Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290:2323–2326CrossRefGoogle Scholar
  25. 25.
    Sakoe H, Chiba S (1978) Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans Acoust Speech Sign Process 26(1):43–49zbMATHCrossRefGoogle Scholar
  26. 26.
    Sharma R, Pavlović VI, Huang TS (1998) Toward multimodal human-computer interface. Proc IEEE 86(5):853–869CrossRefGoogle Scholar
  27. 27.
    Su CW, Mark Liao HY, Tyan HR, Lin CW, Chen DY, Fan KC (2007) Motion flow-based video retrieval. IEEE Trans Multimed 9(6):1193–1201CrossRefGoogle Scholar
  28. 28.
    Tenenbaum JB, de Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290:2319–2323CrossRefGoogle Scholar
  29. 29.
    TREC Video Retrieval Evaluation,
  30. 30.
    Turaga PK, Veeraraghavan A, Chellappa R (2007) From videos to verbs: mining videos for activities using a cascade of dynamical systems. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1–8Google Scholar
  31. 31.
    Wang L, Hu W, Tan T (2003) Recent developments in human motion analysis. Pattern Recog 36(3):585–601CrossRefGoogle Scholar
  32. 32.
    TS Wang, HY Shum, YQ Xu, NN Zheng (2001) Unsupervised analysis of human gestures. Proceedings of the IEEE Pacific-Rim Conference on Multimedia, pp174–181Google Scholar
  33. 33.
    Wang L, Suter D (2008) Visual learning and recognition of sequential data manifolds with applications to human movement analysis. Comput Vis Image Understand 110(2):153–172CrossRefGoogle Scholar
  34. 34.
    Wren CR, Azarbayejani A, Darrell T, Pentland AP (1997) Pfinder: real-time tracking of the human body. IEEE Trans Pattern Anal Mach Intell 19(7):780–785CrossRefGoogle Scholar
  35. 35.
    Zhong H, Shi J, Visontai M (2004) Detecting unusual activity in video. Proc IEEE Comput Soc Conf Comput Vis Pattern Recog 2:819–826Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  • Yu-Ming Liang
    • 1
    Email author
  • Sheng-Wen Shih
    • 2
  • Arthur Chun-Chieh Shih
    • 3
  1. 1.Department of Computer Science and Information EngineeringAletheia UniversityTaipeiTaiwan
  2. 2.Department of Computer Science and Information EngineeringNational Chi Nan UniversityPuliTaiwan
  3. 3.Institute of Information ScienceAcademia SinicaNankangTaiwan

Personalised recommendations