View-invariant human action recognition via robust locally adaptive multi-view learning

Feng, Jia-geng; Xiao, Jun

doi:10.1631/FITEE.1500080

View-invariant human action recognition via robust locally adaptive multi-view learning

Published: 07 November 2015

Volume 16, pages 917–929, (2015)
Cite this article

Frontiers of Information Technology & Electronic Engineering Aims and scope Submit manuscript

Jia-geng Feng¹ &
Jun Xiao¹

156 Accesses
9 Citations
Explore all metrics

Abstract

Human action recognition is currently one of the most active research areas in computer vision. It has been widely used in many applications, such as intelligent surveillance, perceptual interface, and content-based video retrieval. However, some extrinsic factors are barriers for the development of action recognition; e.g., human actions may be observed from arbitrary camera viewpoints in realistic scene. Thus, view-invariant analysis becomes important for action recognition algorithms, and a number of researchers have paid much attention to this issue. In this paper, we present a multi-view learning approach to recognize human actions from different views. As most existing multi-view learning algorithms often suffer from the problem of lacking data adaptiveness in the nearest neighborhood graph construction procedure, a robust locally adaptive multi-view learning algorithm based on learning multiple local L1-graphs is proposed. Moreover, an efficient iterative optimization method is proposed to solve the proposed objective function. Experiments on three public view-invariant action recognition datasets, i.e., ViHASi, IXMAS, and WVU, demonstrate data adaptiveness, effectiveness, and efficiency of our algorithm. More importantly, when the feature dimension is correctly selected (i.e., >60), the proposed algorithm stably outperforms state-of-the-art counterparts and obtains about 6% improvement in recognition accuracy on the three datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic Multi-view Action Recognition with Robust Features

Open-view human action recognition based on linear discriminant analysis

Article 30 January 2018

MMA: a multi-view and multi-modality benchmark dataset for human action recognition

Article 21 March 2018

References

Ashraf, A.B., Lucey, S., Chen, T., 2008. Learning patch correspondences for improved viewpoint invariant face recognition. Proc. IEEE Conf. on Computer Vision and Pattern Recognition, p.1–8. [doi:10.1109/CVPR.2008. 4587754]
Google Scholar
Balakrishnama, S., Ganapathiraju, A., 1998. Linear Discriminant Analysis—a Brief Tutorial. Institute for Signal and Information Processing, Mississippi State University, USA.
Google Scholar
Balasubramanian, M., Schwartz, E.L., 2002. The isomap algorithm and topological stability. Science, 295(5552):7. [doi:10.1126/science.295.5552.9r]
Article Google Scholar
Blum, A., Mitchell, T., 1998. Combining labeled and unlabeled data with co-training. Proc. 11th Annual Conf. on Computational Learning Theory, p.92–100. [doi:10.1145/ 279943.279962]
Google Scholar
Bobick, A.F., Davis, J.W., 2001. The recognition of human movement using temporal templates. IEEE Trans. Patt. Anal. Mach. Intell., 23(3):257–267. [doi:10.1109/34.910878]
Article Google Scholar
Brémond, F., Thonnat, M., Zúñiga, M., 2006. Videounderstanding framework for automatic behavior recognition. Behav. Res. Methods, 38(3):416–426. [doi:10. 3758/BF03192795]
Article Google Scholar
Candès, E., Romberg, J., 2005. l1-Magic: Recovery of Sparse Signals via Convex Programming.
Google Scholar
Chen, C., Zhuang, Y.T., Xiao, J., 2010. Silhouette representation and matching for 3D pose discrimination—a comparative study. Image Vis. Comput., 28(4):654–667. [doi:10.1016/jimavis.2009.10.008]
Article Google Scholar
Chen, H.S., Chen, H.T., Chen, Y., et al., 2006. Human action recognition using star skeleton. Proc. 4th ACM Int. Workshop on Video Surveillance and Sensor Networks, p.171–178. [doi:10.1145/1178782.1178808]
Chapter Google Scholar
Cheng, B., Yang, J., Yan, S., et al., 2010. Learning with l1-graph for image analysis. IEEE Trans. Image Process., 19(4):858–866. [doi:10.1109/TIP.2009.2038764] de Sa
Article MathSciNet Google Scholar
Virginia, R., 2005. Spectral clustering with two views. Proc. 22nd Annual Int. Conf. on Machine Learning, p.20–27.
Google Scholar
Donoho, D.L., 2006. For most large underdetermined systems of linear equations the minimal l1-norm solution is also the sparsest solution. Commun. Pure Appl. Math., 59(6):797–829. [doi:10.1002/cpa.20132]
Article MATH MathSciNet Google Scholar
Donoho, D.L., Elad, M., Temlyakov, V.N., 2006. Stable recovery of sparse overcomplete representations in the presence of noise. IEEE Trans. Inform. Theory, 52(1):6–18. [doi:10.1109/TIT.2005.860430]
Article MATH MathSciNet Google Scholar
Feng, J.G., Xiao, J., 2013. View-invariant action recognition: a survey. J. Image Graph., 18(2):157–168 (in Chinese). [doi:10.11834/jig.20130205]
Google Scholar
Fu, Y., Xian, Y.M., 2001. Image classification based on multifeature and improved SVM ensemble. Comput. Eng., 37(21):196–198. [doi:10.3969/jissn.1000–3428.2011.21. 067]
Google Scholar
He, X.F., Cai, D., Yan, S., et al., 2005. Neighborhood preserving embedding. Proc. 10th IEEE Int. Conf. on Computer Vision, p.1208–1213. [doi:10.1109/ICCV.2005. 167]
Google Scholar
Jean, F., Bergevin, R., Albu, A.B., 2008. Trajectories normalization for viewpoint invariant gait recognition. Proc. 19th Int. Conf. on Pattern Recognition, p.1–4. [doi:10.1109/ICPR.2008.4761312]
Google Scholar
Junejo, I.N., Dexter, E., Laptev, I., et al., 2008. Cross-view action recognition from temporal self-similarities. Proc. 10th European Conf. on Computer Vision, p.293–306. [doi:10.1007/978–3-540–88688-4_22]
Google Scholar
Lee, D.D., Seung, H.S., 1999. Learning the parts of objects by non-negative matrix factorization. Nature, 401(6755): 788–791. [doi:10.1038/44565]
Article Google Scholar
Lewandowski, M., Martinez-del-Rincon, J., Makris, D., et al., 2010. Temporal extension of Laplacian eigenmaps for unsupervised dimensionality reduction of time series. Proc. 20th Int. Conf. on Pattern Recognition, p.161–164. [doi:10.1109/ICPR.2010.48]
Google Scholar
Long, B., Yu, P.S., Zhang, Z.F., 2008. A general model for multiple view unsupervised learning. SIAM, p.822–833.
Google Scholar
Luo, Y., Wu, T., Hwang, J., 2003. Object-based analysis and interpretation of human motion in sports video sequences by dynamic Bayesian networks. Comput. Vis. Image Understand., 92(2–3):196–216. [doi:10.1016/jcviu.2003. 08.001]
Article Google Scholar
Mao, J.L., 2013. Adaptive multi-view learning and its application to image classification. J. Comput. Appl., 33(7): 1955–1959 (in Chinese). [doi:10.11772/jissn.1001–9081. 2013.07.1955]
Google Scholar
Natarajan, P., Nevatia, R., 2008. View and scale invariant action recognition using multiview shape-flow models. Proc. IEEE Conf. on Computer Vision and Pattern Recognition, p.1–8. [doi:10.1109/CVPR.2008.4587716]
Google Scholar
Natarajan, P., Singh, V.K., Nevatia, R., 2010. Learning 3D action models from a few 2D videos for view invariant action recognition. Proc. IEEE Conf. on Computer Vision and Pattern Recognition, p.2006–2013. [doi:10.1109/ CVPR.2010.5539876]
Google Scholar
Parameswaran, V., Chellappa, R., 2006. View invariance for human action recognition. Int. J. Comput. Vis., 66(1): 83–101. [doi:10.1007/s11263–005-3671–4]
Article Google Scholar
Rao, C., Yilmaz, A., Shah, M., 2002. View-invariant representation and recognition of actions. Int. J. Comput. Vis., 50(2):203–226. [doi:10.1023/A:1020350100748]
Article MATH Google Scholar
Raytchev, B., Kikutsugi, Y., Tamaki, T., et al., 2010. Classspecific low-dimensional representation of local features for viewpoint invariant object recognition. Proc. 10th Asian Conf. on Computer Vision, p.250–261. [doi:10. 1007/978–3-642–19318-7_20]
Google Scholar
Roh, M., Shin, H., Lee, S., 2010. View-independent human action recognition with volume motion template on single stereo camera. Patt. Recogn. Lett., 31(7):639–647. [doi:10.1016/jpatrec.2009.11.017]
Article Google Scholar
Roweis, S.T., Saul, L.K., 2000. Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500):2323–2326. [doi:10.1126/science.290.5500. 2323]
Article Google Scholar
Shen, B., Si, L., 2010. Nonnegative matrix factorization clustering on multiple manifolds. Proc. 24th AAAI Conf. on Artificial Intelligence, p.575–580.
Google Scholar
Srestasathiern, P., Yilmaz, A., 2008. View invariant object recognition. Proc. 19th Int. Conf. on Pattern Recognition, p.1–4. [doi:10.1109/ICPR.2008.4761238]
Google Scholar
Syeda-Mahmood, T., Vasilescu, A., Sethi, S., 2001. Recognizing action events from multiple viewpoints. Proc. IEEE Workshop on Detection and Recognition of Events in Video, p.64–72. [doi:10.1109/EVENT.2001.938868]
Chapter Google Scholar
Tang, Y.F., Huang, Z.M., Huang, R.J., et al., 2011. Texture image classification based on multi-feature extraction and SVM classifier. Comput. Appl. Softw., 28(6):22–46 (in Chinese). [doi:10.3969/jissn.1000–386X.2011.06.006]
Google Scholar
Tian, C., Fan, G., Gao, X., 2008. Multi-view face recognition by nonlinear tensor decomposition. Proc. 19th Int. Conf. on Pattern Recognition, p.1–4. [doi:10.1109/ICPR.2008. 4761195]
Google Scholar
Wang, Y., Huang, K., Tan, T., 2007. Multi-view gymnastic activity recognition with fused HMM. Proc. 8th Asian Conf. on Computer Vision, p.667–677. [doi:10.1007/978–3-540–76386-4_63]
Google Scholar
Weinland, D., Ronfard, R., Boyer, E., 2006. Free viewpoint action recognition using motion history volumes. Comput. Vis. Image Understand., 104(2–3):249–257. [doi:10.1016/ jcviu.2006.07.013]
Article Google Scholar
Weinland, D., Boyer, E., Ronfard, R., 2007. Action recognition from arbitrary views using 3D exemplars. Proc. IEEE 11th Int. Conf. on Computer Vision, p.1–7. [doi:10.1109/ ICCV.2007.4408849]
Google Scholar
Wen, J.H., Tian, Z., Lin, W., et al., 2011. Feature extraction based on supervised locally linear embedding for classi fication of hyperspectral images. J. Comput. Appl., 31(3):715–717. [doi:10.3724/SP.J.1087.2011.00715]
Google Scholar
Wold, S., Esbensen, K., Geladi, P., 1987. Principal component analysis. Chemometr. Intell. Lab. Syst., 2(1–3):37–52. [doi:10.1016/0169–7439(87)80084–9]
Article Google Scholar
Wright, J., Yang, A.Y., Ganesh, A., et al., 2009. Robust face recognition via sparse representation. IEEE Trans. Patt. Anal. Mach. Intell., 31(2):210–227. [doi:10.1109/TPAMI. 2008.79]
Article Google Scholar
Xia, T., Tao, D.C., Mei, T., et al., 2010. Multiview spectral embedding. IEEE Trans. Syst. Man Cybern., 40(6): 1438–1446. [doi:10.1109/TSMCB.2009.2039566]
Article Google Scholar
Yan, P., Khan, S.M., Shah, M., 2008. Learning 4D action feature models for arbitrary view action recognition. Proc. IEEE Conf. on Computer Vision and Pattern Recognition, p.1–7. [doi:10.1109/CVPR.2008.4587737]
Google Scholar
Yang, J., Jiang, Y.G., Hauptmann, A.G., et al., 2007. Evaluating bag-of-visual-words representations in scene classification. Proc. Int. Workshop on Multimedia Information Retrieval, p.197–206. [doi:10.1145/1290082.1290111]
Chapter Google Scholar
Yilmaz, A., Shah, M., 2005. Actions as objects: a novel action representation. Proc. IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, p.984–989. [doi:10.1109/CVPR.2005.58]
Google Scholar
Yu, H., Sun, G., Song, W., et al., 2005. Human motion recognition based on neural network. Proc. Int. Conf. on Communications, Circuits and Systems, p.979–982. [doi:10.1109/ICCCAS.2005.1495271]
Google Scholar
Zheng, S.E., Ye, S.Z., 2006. Semi-supervision and active relevance feedback algorithm for content-based image retrieval. Comput. Eng. Appl., S1:81–87 (in Chinese).
Google Scholar
Zhou, D., Burges, C.J.C., 2007. Spectral clustering and transductive learning with multiple views. Proc. 24th Int. Conf. on Machine Learning, p.1159–1166. [doi:10.1145/1273496.1273642]
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Artificial Intelligence, College of Computer Science and Technology, Zhejiang University, Hangzhou, 310027, China
Jia-geng Feng & Jun Xiao

Authors

Jia-geng Feng
View author publications
You can also search for this author in PubMed Google Scholar
Jun Xiao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jia-geng Feng.

Additional information

Project supported by the National Natural Science Foundation of China (No. 61572431), the National Key Technology R&D Program (No. 2013BAH59F00), the Zhejiang Provincial Natural Science Foundation of China (No. LY13F020001), and the Zhejiang Province Public Technology Applied Research Projects, China (No. 2014C33090)

ORCID: Jia-geng FENG, http://orcid.org/0000-0003-4577-4520

Rights and permissions

Reprints and permissions

About this article

Cite this article

Feng, Jg., Xiao, J. View-invariant human action recognition via robust locally adaptive multi-view learning. Frontiers Inf Technol Electronic Eng 16, 917–929 (2015). https://doi.org/10.1631/FITEE.1500080

Download citation

Received: 18 March 2015
Accepted: 14 September 2015
Published: 07 November 2015
Issue Date: November 2015
DOI: https://doi.org/10.1631/FITEE.1500080

Key words

CLC number

TP391

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

View-invariant human action recognition via robust locally adaptive multi-view learning

Abstract

Access this article

Similar content being viewed by others

Automatic Multi-view Action Recognition with Robust Features

Open-view human action recognition based on linear discriminant analysis

MMA: a multi-view and multi-modality benchmark dataset for human action recognition

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Key words

CLC number

Navigation

View-invariant human action recognition via robust locally adaptive multi-view learning

Abstract

Access this article

Similar content being viewed by others

Automatic Multi-view Action Recognition with Robust Features

Open-view human action recognition based on linear discriminant analysis

MMA: a multi-view and multi-modality benchmark dataset for human action recognition

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Key words

CLC number

Search

Navigation