Abstract
A new method for interaction recognition based on sparse representation of feature covariance matrices was presented. Firstly, the dense trajectories (DT) extracted from the video were clustered into different groups to eliminate the irrelevant trajectories, which could greatly reduce the noise influence on feature extraction. Then, the trajectory tunnels were characterized by means of feature covariance matrices. In this way, the discriminative descriptors could be extracted, which was also an effective solution to the problem that the description of the feature second-order statistics is insufficient. After that, an over-complete dictionary was learned with the descriptors and all the descriptors were encoded using sparse coding (SC). Classification was achieved using multiple instance learning (MIL), which was more suitable for complex environments. The proposed method was tested and evaluated on the WEB Interaction dataset and the UT interaction dataset. The experimental results demonstrated the superior efficiency.
摘要
人体行为识别是计算机视觉和模式识别领域的一个重要研究方向, 在监控系统、 人机交互、 人工智能等方面具有广阔的应用前景。 本文提出了一种基于协方差矩阵稀疏表示的交互行为识别方法。 首先, 对视频中提取的稠密轨迹进行聚类形成不同的轨迹群组, 以消除无关轨迹、 减少噪声对特征提取的影响。 然后通过协方差矩阵对轨迹通道进行特征描述, 得到有较强区分度的轨迹通道描述符, 该描述符维度更低, 并且能够有效解决以往描述符对特征二阶统计量描述不足的问题; 利用稀疏表示对特征描述符进行稀疏编码。 最后, 采用多示例学习进行行为分类。 在 UT-Interaction 数据集与 WEB-Interaction 数据集上的实验证明了本文方法的有效性。
Similar content being viewed by others
References
KONG Yu, FU Yun. Modeling supporting regions for close human interaction recognition [C]//Computer Vision-ECCV 2014 Workshops. Zurich: Springer International Publishing, 2014: 29–44.
KARUNGARU S, KENJI T, FUKUMI M. Human action recognition using normalized cone histogram features [C]//Computational Intelligence for Multimedia, Signal and Vision Processing (CIMSIVP), 2014 IEEE Symposium on. Orkand, FL: IEEE, 2014: 1–5.
HOAI M, ZISSERMAN A. Talking heads: detecting humans and recognizing their interactions [C]//Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on. Columbus, Dhio: IEEE, 2014: 875–882.
YANG Lu-yu, GAO Cheng-qiang, MENG De-yu, LU Jiang. A novel group-sparsity-optimization-based feature selection model for complex interaction recognition [M]//Computer Vision–ACCV 2014. Singapore: Springer International Publishing, 2015: 508–521.
ZHANG J, LIN H, NIE W Z, CHAISORN L, WONG Y K, KANKANHALLI M S. Human action recognition bases on local action attributes [J]. Journal of Electrical Engineering & Technology, 2015, 10(3): 1264–1274.
NOWAK E, JURIE F, TRIGGS B. Sampling strategies for bag-of-features image classification [M]. Computer vision–ECCV 2006. Springer Berlin Heidelberg, 2006: 490–503.
WANG Heng, ULLAH M M, KLÄSER A, et al. Evaluation of local spatio-temporal features for action recognition [C]//British Machine Vision Conference. London: Springer, 2009: 1–10.
WANG Heng, KLÄSER A, SCHMID C, et al. Dense trajectories and motion boundary descriptors for action recognition [J]. International Journal of Computer Vision, 2013, 103(1): 60–79.
HAO Zong-bo, ZHANG Qian-ni, EZQUIERDO E, et al. Human action recognition by fast dense trajectories [C]//Proceedings of the 21st ACM international conference on Multimedia. Barcelona: ACM, 2013: 377–380.
BEAUDRY C, PETERI R, MASCARILLA L. Action recognition in videos using frequency analysis of critical point trajectories [C]//2014 IEEE International Conference on Image Processing (ICIP). Paris: IEEE, 2014: 1445–1449.
SEO J J, BADDAR W J, KIM D H, et al. Human action recognition using time-invariant key-trajectories describing spatio-temporal salient motion [C]//IEEE International Conference on Image Processing. Quebec City: IEEE, 2015: 586–590.
NI Bing-bing, MOULIN P, YANG Xiao-kai, et al. Motion Part Regularization: Improving action recognition via trajectory group selection [C]//Computer Vision and Pattern Recognition. Boston: IEEE, 2015: 3698–3706.
ZHANG Bo, ROTA P, CONCI N, et al. Human interaction recognition in the wild: Analyzing trajectory clustering from multiple-instance-learning perspective [C]//IEEE International Conference on Multimedia and Expo. Torino: IEEE, 2015: 1–6.
IOSIFIDIS A, TEFAS A, PITAS I. Merging linear discriminant analysis with Bag of Words model for human action recognition [C]//IEEE International Conference on Image Processing. Quebec City: IEEE, 2015: 832–836.
ELGUEBALY T, BOUGUILA N. Improving codebook generation for action recognition using a mixture of Asymmetric Gaussians [C]//Computational Intelligence for Multimedia, Signal and Vision Processing (CIMSIVP), 2014 IEEE Symposium on. Orbando, FL: IEEE, 2014: 1–7.
WANG Yang-yang, LI Yi-bo, JI Xiao-fei. Human action recognition based on global gist feature and local patch coding [J]. Management Review, 2015, 21(11): 38–43.
GUO Kai, ISHWAR P, KONRAD J. Action recognition from video using feature covariance matrices [J]. IEEE Transactions on Image Processing, 2013, 22(6): 2479–2494.
BROX T, MALIK J. Object segmentation by long term analysis of point trajectories [C]//Proc European Conference on Computer Vision. Crete, Greece: Springer, 2010: 282–295.
SENER F, IKIZLER-CINBIS N. Two-person interaction recognition via spatial multiple instance embedding [J]. Journal of Visual Communication & Image Representation, 2015, 32: 63–73.
CHEN Yi-xin, BI Jin-bo, WANG J Z. MILES: Multiple-instance learning via embedded instance selection [J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2006, 28(12): 1931–1947.
GAO Cheng-qiang, YANG Lu-yu, DU Yin-he, et al. From constrained to unconstrained datasets: An evaluation of local action descriptors and fusion strategies for interaction recognition [J]. World Wide Web-internet & Web Information Systems, 2015, 19(2): 1–12.
XIA Li-min, SHI Xiao-ting, TU Hong-bin. An approach for complex activity recognition by key frames [J]. Journal of Central South University, 2015, 22(9): 3450–3457.
Author information
Authors and Affiliations
Corresponding author
Additional information
Foundation item: Project(51678075) supported by the National Natural Science Foundation of China; Project(2017GK2271) supported by the Science and Technology Project of Hunan Province, China
Rights and permissions
About this article
Cite this article
Wang, J., Zhou, Sc. & Xia, Lm. Human interaction recognition based on sparse representation of feature covariance matrices. J. Cent. South Univ. 25, 304–314 (2018). https://doi.org/10.1007/s11771-018-3738-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11771-018-3738-3