Machine Vision and Applications

, Volume 25, Issue 4, pp 1007–1018 | Cite as

Silhouette analysis for human action recognition based on maximum spatio-temporal dissimilarity embedding

  • Jian Cheng
  • Haijun Liu
  • Hongsheng LiEmail author
Original Paper


In this paper, we present a human action recognition method for human silhouette sequences. Inspired by the locality preserving projection and its variants, a novel manifold embedding method, maximum spatio-temporal dissimilarity embedding, is proposed to embed each action frame into a manifold, where frames from different action classes can be well separated. Unlike existing methods that incorporate both inter-class and intra-class information in the embedding process, our proposed method focuses on maximizing distances between frames that are similar in appearance but are from different classes and takes the temporal information into consideration. A variant of Hausdorff distance is introduced for frame and sequence classifications. Extensive experimental results and comparison with state-of-the-art methods demonstrate the effectiveness and robustness of the proposed method for human action silhouette analysis.


Silhouette analysis Human action recognition Gait recognition Manifold learning  Hausdorff distance 



This work was supported by the National Science Foundation of China (No. 61301269 and No. 61201271), the Research Fund for the Doctoral Program of Higher Education (No. 20100185120021), the Science and Technology Cooperation Program with the Academy of China and Sichuan Province (No. 2012JZ0001).


  1. 1.
    Levin, E., Pieraccini, R., Eckert, W.: A stochastic model of human–machine interaction for learning dialog strategies. IEEE Trans. Speech Audio Process. 8, 11–23 (2000)CrossRefGoogle Scholar
  2. 2.
    Dufaux, F., Ebrahimi, T.: Scrambling for Video Surveillance with Privacy. IEEE Conference on Computer Vision and Pattern Recognition Workshop (2006)Google Scholar
  3. 3.
    Rougier, C., Meunier, J., St-Arnaud, A., Rousseau, J.: Fall detection from Humhan shape and motion history using video surveillance. Int. Conf. Adv. Inf. Netw. Appl. Workshops 2, 875–880 (2007)Google Scholar
  4. 4.
    Niebles, J.C., Chen, C., Li, F.: Modeling temporal structure of decomposable motion segments for activity classification. European Conference on Computer Vision, pp. 392–405 (2010)Google Scholar
  5. 5.
    Petkovłc, M., Jonker, W.: Content-based video retrieval by integrating spatio-temporal and stochastic recognition of events. IEEE Workshop on Detection and Recognition of Events in Video, pp. 75–82 (2001)Google Scholar
  6. 6.
    Geetha, P., Narayanan, V.: A survey of content-based video retrieval. J. Comput. Sci. 4, 474–486 (2008)CrossRefGoogle Scholar
  7. 7.
    Efros, A., Berg, A.C., Mori, G., Malik, J.: Recognizing action at a distance. Int. Conf. Comput. Vis. 2, 726–733 (2003)Google Scholar
  8. 8.
    Collins, R., Gross, R., Shi, J.: Silhouette-based human identification from body shape and gait. IEEE Conference on Automatic Face and Gesture Recognition, pp. 366–371 (2002)Google Scholar
  9. 9.
    Schldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local svm approach. IEEE Conf. Autom. Face Gesture Recognit. 3, 32–36 (2004)Google Scholar
  10. 10.
    Ke, Y., Sukthankar, R., Hebert, M.H.: Efficient visual event detection using volumetric features. Int. Conf. Comput. Vis. 1, 166–173 (2005)Google Scholar
  11. 11.
    Ivan, L.: On space-time interest points. Int. J. Comput. Vis. 64, 107–123 (2005)CrossRefGoogle Scholar
  12. 12.
    Wang, L., Suter, D.: Learning and matching of dynamic shape manifolds for human action recognition. IEEE Trans. Image Process. 16, 1646–1661 (2007)CrossRefMathSciNetGoogle Scholar
  13. 13.
    Wang, L., Suter, D.: Recognizing human activities from silhouettes: motion subspace and factorial discriminative graphical model. IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2007) (2007)Google Scholar
  14. 14.
    Bobick, A., Davis, J.: The recognition of human movement using temporal template. IEEE Trans. Pattern Anal. Mach. Intell. 23, 257–267 (2001)CrossRefGoogle Scholar
  15. 15.
    He, X., Niyogi, P.: Locality preserving projections. Neural Inf. Process. Syst. 16, 153–160 (2003)Google Scholar
  16. 16.
    Blackburn, J., Ribeiro, E.: Human motion recognition using isomap and dynamic time warping. Int. Conf. Comput. Vis. Workshop Human Motion 4814, 285–298 (2007)Google Scholar
  17. 17.
    Tenenbaum, J.B., de Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290, 2319–2323 (2000)CrossRefGoogle Scholar
  18. 18.
    Rabiner, L., Juang, B.: Fundamentals of Speech Recognition. Prentice Hall, New York (1993)Google Scholar
  19. 19.
    Roweis, S., Saul, L.: Nonlinear dimensionality reduction by locally linear embedding. Science 290, 2323–2326 (2000)CrossRefGoogle Scholar
  20. 20.
    Belkin, M., Niyogi, P.: Laplacian eigenmaps and spectral techniques for embedding and clustering. Neural Inf. Process. Syst. 14, 585–591 (2001)Google Scholar
  21. 21.
    Zhang, Z., Zha, H.: Principal manifolds and nonlinear dimension reduction via local tangent space alignment. SIAM J. Sci. Comput. 8, 406–424 (2005)MathSciNetGoogle Scholar
  22. 22.
    Yan, S., Xu, D., Zhang, B., Zhang, H., Yang, Q., Lin, S.: Graph embedding and extensions: a general framework for dimensionality reduction. IEEE Trans. Pattern Anal. Mach. Intell. 40–51 (2007)Google Scholar
  23. 23.
    Jenkins, O., Mataric, M.: A spatio-temporal extension to isomap nonlinear dimension reduction. International Conference on Machine Learning, pp. 56–61 (2004)Google Scholar
  24. 24.
    Fang, C., Chen, J., Tseng, C., Lien, J.: Human action recognition using spatio-temporal classification. Asian Conf. Comput. Vis. 5995, 98–109 (2009)Google Scholar
  25. 25.
    Lewandowski, M., del Rincon, J.M., Makris, D., Nebe, J.: Temporal extension of laplacian eigenmaps for unsupervised dimensionality reduction of time series. International Conference on Pattern Recognition, pp. 161–164 (2010)Google Scholar
  26. 26.
    Jia, K., Yeung, D.: Human action recognition using local spatio-temporal discriminant embedding. IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2008)Google Scholar
  27. 27.
    Zheng, Z., Yanga, F., Tana, W., Jiaa, J., Yangb, J.: Gabor feature-based face recognition using supervised locality preserving projection. Signal Process. 87, 2473–2483 (2007)CrossRefzbMATHGoogle Scholar
  28. 28.
    Okiopoulou, E., Saad, Y.: Orthogonal neighborhood preserving projections: a projection-based dimensionality reduction technique. IEEE Trans. Pattern Anal. Mach. Intell. 29, 2143–2156 (2007)CrossRefGoogle Scholar
  29. 29.
    Cai, D., He, X., Zhou, K.: Locality sensitive discriminant analysis. International Joint Conference on Artificial Intelligence, pp. 708–713 (2007)Google Scholar
  30. 30.
    Cai, D., He, X.: Orthogonal locality preserving indexing. ACM SIGIR Conference on Research and development in Information Retrieval, pp. 3–10 (2005)Google Scholar
  31. 31.
    Wang, L., Suter, D.: Visual learning and recognition of sequential data manifolds with applications to human movement analysis. Comput. Vis. Image Underst. 110, 153–172 (2008)CrossRefGoogle Scholar
  32. 32.
    Ma, J., Yuen, P.C., Zou, W., Lai, J.H.: Supervised neighborhood topology learning for human action recognition. International Conference on Computer Vision Workshops, pp. 476–481 (2009)Google Scholar
  33. 33.
    Gorelick, L., Blank, M., Shechtman, E., Irani, M., Basri, R.: Action as space-time shapes. IEEE Trans. Pattern Anal. Mach. Intell. 29, 2247–2253 (2007)CrossRefGoogle Scholar
  34. 34.
    Wang, L., Tan, T.: Silhouette analysis based gait recognition for human identification. IEEE Trans. Pattern Anal. Mach. Intell. 25, 1505–1518 (2003)CrossRefGoogle Scholar
  35. 35.
    Weinland, D., Ronfard, R., Boyer, E.: Free viewpoint action recognition using motion history volumes. Comput. Vis. Image Underst. 104, 249–257 (2006)CrossRefGoogle Scholar
  36. 36.
    Weinland, D., Boyer, E., Ronfard, R.: Action recognition from arbitrary views using 3D exemplars. International Conference on Computer Vision, pp. 1–7 (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  1. 1.School of Electronic EngineeringUniversity of Electronic Science and Technology of ChinaChengduChina

Personalised recommendations