Cross-view Action Recognition via Dual-Codebook and Hierarchical Transfer Framework

  • Chengkun Zhang
  • Huicheng ZhengEmail author
  • Jianhuang Lai
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9007)


In this paper, we focus on the challenging cross-view action recognition problem. The key to this problem is to find the correspondence between source and target views, which is realized in two stages in this paper. Firstly, we construct a Dual-Codebook for the two views, which is composed of two codebooks corresponding to source and target views, respectively. Each codeword in one codebook has a corresponding codeword in the other codebook, which is different from traditional methods that implement independent codebooks in the two views. We propose an effective co-clustering algorithm based on semi-nonnegative matrix factorization to derive the Dual-Codebook. With the Dual-Codebook, an action can be represented based on Bag-of-Dual-Codes (BoDC) no matter it is in the source view or in the target view. Therefore, the Dual-Codebook establishes a sort of codebook-to-codebook correspondence, which is the foundation for the second stage. In the second stage, we observe that, although the appearance of action samples will change significantly with viewpoints, the temporal relationship between atom actions within an action should be stable across views. Therefore, we further propose a hierarchical transfer framework to obtain the feature-to-feature correspondence at atom-level between source and target views. The framework is based on a temporal structure that can effectively capture the temporal relationship between atom actions within an action. It performs transfer at atom levels of multiple timescales, while most existing methods only perform video-level transfer. We carry out a series of experiments on the IXMAS dataset. The results demonstrate that our method obtained superior performance compared to state-of-the-art approaches.



This work is supported by National Natural Science Foundation of China (No. 61172141), Key Projects in the National Science & Technology Pillar Program during the 12th Five-Year Plan Period (No. 2012BAK16B06), and Science and Technology Program of Guangzhou, China (2014J4100092).

Supplementary material

336672_1_En_38_MOESM1_ESM.pdf (59 kb)
Supplementary material (pdf 59 KB)


  1. 1.
    Aggarwal, J., Ryoo, M.: Human activity analysis: a review. ACM Comput. Surv. 43, 1–43 (2011)CrossRefGoogle Scholar
  2. 2.
    Aharon, M., Elad, M., Bruckstein, A.: K-SVD: an algorithm for designing overcomplete dictionaries for sparse representation. TSP 54, 4311–4322 (2006)Google Scholar
  3. 3.
    Cheung, G., Baker, S., Kanade, T.: Shape-from-silhouette of articulated objects and its use for human body kinematics estimation and motion capture. In: CVPR (2003)Google Scholar
  4. 4.
    Ding, C., Li, T.: Convex and semi-nonnegative matrix factorizations. PAMI 32, 45–55 (2010)CrossRefGoogle Scholar
  5. 5.
    Dollar, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: VS-PETS (2005)Google Scholar
  6. 6.
    Efros, A., Berg, A., Mori, G., Malik, J.: Recognizing action at a distance. In: ICCV (2003)Google Scholar
  7. 7.
    Farhadi, A., Tabrizi, M.K.: Learning to recognize activities from the wrong view point. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 154–166. Springer, Heidelberg (2008) CrossRefGoogle Scholar
  8. 8.
    Farhadi, A., Tabrizi, M., Endres, I., Forsyth, D.: A latent model of discriminative aspect. In: ICCV (2009)Google Scholar
  9. 9.
    Gavrila, D., Davis, L.S.: 3D model-based tracking of humans in action: a multi-view approach. In: CVPR (1996)Google Scholar
  10. 10.
    Holte, M.B., Moeslund, T.B., Tran, C., Trivedi, M.: Human action recognition using multiple views: a comparative perspective on recent developments. In: HGBU (2011)Google Scholar
  11. 11.
    Ji, X., Liu, H.: Advances in view-invariant human motion analysis: a review. TCSVT 40, 13–24 (2010)Google Scholar
  12. 12.
    Junejo, I., Dexter, E., Laptev, I., Patrick, P.: View-independent action recognition from temporal self-similarities. PAMI 33, 172–185 (2011)CrossRefGoogle Scholar
  13. 13.
    Junejo, I.N., Dexter, E., Laptev, I., Pérez, P.: Cross-view action recognition from temporal self-similarities. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 293–306. Springer, Heidelberg (2008) CrossRefGoogle Scholar
  14. 14.
    Laptev, I., Lindeberg, T.: Space-time interest points. In: ICCV (2003)Google Scholar
  15. 15.
    Lee, D., Seung, H.: Algorithms for non-negative matrix factorization. In: Advances in Neural Information Processing Systems, vol. 13. MIT Press, Cambridge (2001)Google Scholar
  16. 16.
    Li, R., Zickler, T.: Discriminative virtual views for cross-view action recognition. In: CVPR (2012)Google Scholar
  17. 17.
    Lin, Z., Jiang, Z., Davis, L.: Recognizing actions by shape-motion prototype trees. In: ICCV (2009)Google Scholar
  18. 18.
    Liu, J., Ali, S., Shah, M.: Recognizing human actions using multiple features. In: CVPR (2008)Google Scholar
  19. 19.
    Liu, J., Shah, M.: Learning human actions via information maximization. In: CVPR (2008)Google Scholar
  20. 20.
    Liu, J., Shah, M., Kuipers, B., Savarese, S.: Cross-view action recognition via view knowledge transfer. In: CVPR (2011)Google Scholar
  21. 21.
    Lv, F., Nevatia, R.: Single view human action recognition using key pose matching and viterbi path searching. In: CVPR (2007)Google Scholar
  22. 22.
    Paramesmaran, V., Chellappa, R.: View invariance for human action recognition. IJCV 66, 83–101 (2006)CrossRefGoogle Scholar
  23. 23.
    Rao, C., Yilmaz, A., Shah, M.: View-invariant representation and recognition of actions. IJCV 50, 203–226 (2002)CrossRefzbMATHGoogle Scholar
  24. 24.
    Tropp, J., Gilbert, A.: Signal recovery from random measurements via orthogonal matching pursuit. TIT 53, 4655–4666 (2007)zbMATHMathSciNetGoogle Scholar
  25. 25.
    Turaga, P., Chellappa, R., Subrahmanian, V., Udrea, O.: Machine recognition of human activities: a survey. TCSVT 18, 1473–1488 (2008)Google Scholar
  26. 26.
    Valera, M., Velastin, S.: Intelligent distributed surveillance systems: a review. VISP 152, 192–204 (2005)Google Scholar
  27. 27.
    Weinland, D., Boyer, E., Ronfard, R.: Action recognition from arbitrary views using 3D examplars. In: ICCV (2007)Google Scholar
  28. 28.
    Weinland, D., Ozuysal, M., Fua, P.: Making action recognition robust to occlusions and viewpoint changes. In: ECCV (2010)Google Scholar
  29. 29.
    Wright, J., Yang, A., Ganesh, A., Sastry, S., Ma, Y.: Robust face recognition via sparse representation. PAMI 31, 210–227 (2009)CrossRefGoogle Scholar
  30. 30.
    Yan, P., Khan, S.M., Shah, M.: Learning 4D action feature models for arbitrary view action recognition. In: CVPR (2008)Google Scholar
  31. 31.
    Yilmaz, A., Shah, M.: Actions sketch: a novel action representation. In: CVPR (2005)Google Scholar
  32. 32.
    Zhang, Z., Wang, Y., Zhang, Z.: Face synthesis from near-infrared to visual light via sparse representation. In: IJCB (2011)Google Scholar
  33. 33.
    Zheng, J., Jiang, Z.: Learning view-invariant sparse representations for cross-view action recognition. In: ICCV (2013)Google Scholar
  34. 34.
    Zheng, J., Jiang, Z., Phillips, P., Chellappa, R.: Cross-view action recognition via a transferable dictionary pair. In: BMVC (2012)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.School of Information Science and TechnologySun Yat-sen UniversityGuangzhouChina

Personalised recommendations