View-Invariant Action Recognition Using Latent Kernelized Structural SVM

  • Xinxiao Wu
  • Yunde Jia
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7576)


This paper goes beyond recognizing human actions from a fixed view and focuses on action recognition from an arbitrary view. A novel learning algorithm, called latent kernelized structural SVM, is proposed for the view-invariant action recognition, which extends the kernelized structural SVM framework to include latent variables. Due to the changing and frequently unknown positions of the camera, we regard the view label of action as a latent variable and implicitly infer it during both learning and inference. Motivated by the geometric correlation between different views and semantic correlation between different action classes, we additionally propose a mid-level correlation feature which describes an action video by a set of decision values from the pre-learned classifiers of all the action classes from all the views. Each decision value captures both geometric and semantic correlations between the action video and the corresponding action class from the corresponding view. After that, we combine the low-level visual cue, mid-level correlation description, and high-level label information into a novel nonlinear kernel under the latent kernelized structural SVM framework. Extensive experiments on multi-view IXMAS and MuHAVi action datasets demonstrate that our method generally achieves higher recognition accuracy than other state-of-the-art methods.


View-invariant action recognition latent kernelized structural SVM correlation feature multiple level features 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Gorelick, L., Blank, M., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. PAMI 29, 2247–2253 (2007)CrossRefGoogle Scholar
  2. 2.
    Yilmaz, A., Shah, M.: Actions sketch: a novel action representation. In: CVPR (2005)Google Scholar
  3. 3.
    Dollar, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: VS PETS (2005)Google Scholar
  4. 4.
    Niebles, J.C., Wang, H., Fei-fei, L.: Unsupervised learning of human action categories using spatial-temporal words. IJCV 79, 299–318 (2008)CrossRefGoogle Scholar
  5. 5.
    Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local svm approach. In: ICPR (2004)Google Scholar
  6. 6.
    Yilmaz, A., Shah, M.: Recognizing human actions in videos acquired by uncalibrated moving cameras. In: ICCV (2005)Google Scholar
  7. 7.
    Shen, Y., Foroosh, H.: View-invariant action recognition using fundamental ratios. In: CVPR (2008)Google Scholar
  8. 8.
    Weinland, D., Boyer, E., Ronfard, R.: Action recognition from arbitrary views using 3d exemplars. In: ICCV (2007)Google Scholar
  9. 9.
    Yan, P., Khan, S.M., Shah, M.: Learning 4d action feature models for arbitrary view action recognition. In: CVPR (2008)Google Scholar
  10. 10.
    Junejo, I.N., Dexter, E., Laptev, I., Perez, P.: View-independent action recognition from temporal self-similarities. PAMI 33, 172–185 (2011)CrossRefGoogle Scholar
  11. 11.
    Lewandowski, M., Makris, D., Nebel, J.C.: View and Style-Independent Action Manifolds for Human Activity Recognition. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part VI. LNCS, vol. 6316, pp. 547–560. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  12. 12.
    Liu, J., Shah, M., Kuipers, B., Savarese, S.: Cross-view action recognition via view knowledge transfer. In: CVPR (2011)Google Scholar
  13. 13.
    Farhadi, A., Tabrizi, M.K.: Learning to Recognize Activities from the Wrong View Point. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 154–166. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  14. 14.
    Tsochantaridis, I., Hofmann, T., Joachims, T., Altun, Y.: Support vector machine learning for interdependent and structured output spaces. In: ICML (2004)Google Scholar
  15. 15.
    Yu, C.N.J., Joachims, T.: Learning structural svms with latent variables. In: ICML (2009)Google Scholar
  16. 16.
    Yu, C.N.J., Joachims, T.: Training structural svms with kernels using sampled cuts. In: ACM KDD (2008)Google Scholar
  17. 17.
    Wang, Y., Mori, G.: A Discriminative Latent Model of Object Classes and Attributes. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol. 6315, pp. 155–168. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  18. 18.
    Wang, Y., Mori, G.: A discriminative latent model of image region and object tag correspondence. In: NIPS (2010)Google Scholar
  19. 19.
    Yang, W., Wang, Y., Mori, G.: Recognizing human actions from still images with latent poses. In: CVPR (2010)Google Scholar
  20. 20.
    Lan, T., Wang, Y., Yang, W., Mori, G.: Beyond actions: discriminative models for contextual group activities. In: NIPS (2010)Google Scholar
  21. 21.
    Artieres, T., Do, T.M.T.: Large margin training for hidden markov models with partially observed states. In: ICML (2009)Google Scholar
  22. 22.
    Zien, A., Ong, C.S.: Multiclass multiple kernel learning. In: ICML (2007)Google Scholar
  23. 23.
    Wu, X., Xu, D., Duan, L., Luo, J.: Action recognition using context and appearance distribution features. In: CVPR (2011)Google Scholar
  24. 24.
    Wang, H., Klaser, A., Schmid, C., Liu, C.L.: Action recognition by dense trajectories. In: CVPR (2011)Google Scholar
  25. 25.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV 60, 91–110 (2004)CrossRefGoogle Scholar
  26. 26.
    Xu, D., Chang, S.F.: Video event recognition using kernel methods with multilevel temporal alignment. PAMI 30, 1985–1997 (2008)CrossRefGoogle Scholar
  27. 27.
    Liu, J., Kuipers, B., Savarese, S.: Recognizing human actions by attributes. In: CVPR (2011)Google Scholar
  28. 28.
    Parikh, D., Grauman, K.: Relative attributes. In: ICCV (2011)Google Scholar
  29. 29.
    Singh, S., Velastin, S., Ragheb, H.: Muhavi: a multicamera human action video dataset for the evaluation of action recognition methods. In: AVSS (2010)Google Scholar
  30. 30.
    Chang, C.C., Lin, C.J.: Libsvm: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology (2001)Google Scholar
  31. 31.
    Weinland, D., Ozuysal, M., Fua, P.: Making action recognition robust to occlusions and viewpoint changes. In: ECCV (2010)Google Scholar
  32. 32.
    Liu, J., Shah, M.: Learning human actions via information maximization. In: CVPR (2008)Google Scholar
  33. 33.
    Reddy, K., Liu, J., Shah, M.: Incremental action recognition using feature-tree. In: ICCV (2009)Google Scholar
  34. 34.
    Kaaniche, M.B., Bremond, F.: Gesture recognition by learning local motion signatures. In: CVPR (2011)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Xinxiao Wu
    • 1
  • Yunde Jia
    • 1
  1. 1.Beijing Laboratory of Intelligent Information Technology, School of Computer ScienceBeijing Institute of TechnologyBeijingP.R. China

Personalised recommendations