A review on human action analysis in videos for retrieval applications

Ramezani, Mohsen; Yaghmaee, Farzin

doi:10.1007/s10462-016-9473-y

A review on human action analysis in videos for retrieval applications

Published: 15 March 2016

Volume 46, pages 485–514, (2016)
Cite this article

Artificial Intelligence Review Aims and scope Submit manuscript

Mohsen Ramezani¹ &
Farzin Yaghmaee¹

1203 Accesses
37 Citations
Explore all metrics

Abstract

Today, the number of available videos on the Internet is significantly increased. Content-based video retrieval is used for finding the users’ desired items among these big video data. Memorizing details of the videos and intricate relations between included objects in videos can be considered as the major challenges of this big data topic. A large portion of video data relates to the humans. Thus, human action retrieval has been introduced as a new big data topic that seeks to find video objects based on the included human action. Human action retrieval has been applicated in different domains such as video search, intelligent human–computer interaction, robotics, video surveillance and human behavior analysis. There are some challenges such as variations in rotation, scale, style and above-mentioned challenges for the big video data that can impress the retrieval accuracy. In this paper, a survey on human action retrieval studies is presented that the methodologies have been analyzed from action representation and retrieving perspectives. Moreover, limitations and common datasets of human action retrieval are introduced before describing the state-of-the-arts’ methodologies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Akpınar S, Alpaslan FN (2014) Video action recognition using an optical flow based representation
Arman F, Depommier R, Hsu A, Chiu MY (1994) Content-based browsing of video sequences. In: Proceedings of the second ACM international conference on Multimedia. ACM, pp 97–103
Barnachon M, Bouakaz S, Boufama B, Guillou E (2013) A real-time system for motion retrieval and interpretation. Pattern Recognit Lett 34(15):1789–1798
Article Google Scholar
Ben-Arie J, Wang Z, Pandit P, Rajaram S (2002) Human activity recognition using multidimensional indexing. IEEE Trans Pattern Anal Mach Intell 24(8):1091–1104
Article Google Scholar
Bregonzio M, Gong S, Xiang T (2009) Recognising action as clouds of space-time interest points. In: Computer vision and pattern recognition, 2009. CVPR 2009. IEEE conference on. IEEE, pp 1948–1955
Bulbul MF, Jiang Y, Ma J (2015) Human action recognition based on DMMs, HOGs and contourlet transform
Caicedo JC, González FA (2012) Multimodal fusion for image retrieval using matrix factorization. In: Proceedings of the 2nd ACM international conference on multimedia retrieval. ACM, p 56
Chen CY, Grauman K (2012) Efficient activity detection with max-subgraph search. In: Computer vision and pattern recognition (CVPR), 2012 IEEE conference on. IEEE, pp 1274–1281
Choi J, Jeon WJ, Lee SC (2008) Spatio-temporal pyramid matching for sports videos. In: Proceedings of the 1st ACM international conference on multimedia information retrieval. ACM, pp 291–297
Ciptadi A, Goodwin MS, Rehg JM (2014) Movement pattern histogram for action recognition and retrieval. In: Computer vision—ECCV 2014. Springer International Publishing, pp 695–710
Cohn DA, Ghahramani Z, Jordan MI (1996) Active learning with statistical models. J Artif Intell Res 4(17):129–145
Davis JW, Bobick AE (1997) The representation and recognition of human movement using temporal templates. In: Computer vision and pattern recognition, 1997. Proceedings., 1997 IEEE computer society conference on. IEEE, pp 928–934
Dollár P, Rabaud V, Cottrell G, Belongie S (2005) Behavior recognition via sparse spatio-temporal features. In: Visual surveillance and performance evaluation of tracking and surveillance, 2005. 2nd joint IEEE international workshop on. IEEE, pp 65–72
Efros A, Berg AC, Mori G, Malik J (2003) Recognizing action at a distance. In: Computer vision, 2003. Proceedings. Ninth IEEE international conference on. IEEE, pp 726–733
Fossati A, Dimitrijevic M, Lepetit V, Fua P (2007) Bridging the gap between detection and tracking for 3D monocular video-based motion capture. In: Computer vision and pattern recognition, 2007. CVPR’07. IEEE conference on. IEEE, pp 1–8
Gao Y, Wang T, Li J, Du Y, Hu W, Zhang Y, Ai H (2007) Cast indexing for videos by ncuts and page ranking. In: Proceedings of the 6th ACM international conference on Image and video retrieval. ACM, pp 441–447
Gómez-Conde I, Olivieri DN (2015) A KPCA spatio-temporal differential geometric trajectory cloud classifier for recognizing human actions in a CBVR system. Expert Syst Appl 42(13):5472–5490
Article Google Scholar
Gowsikhaa D, Abirami S, Baskaran R (2014) Automated human behavior analysis from surveillance videos: a survey. Artif Intell Rev 42(4):747–765
Article Google Scholar
Ji R, Yao H, Sun X (2011) Actor-independent action search using spatiotemporal vocabulary with appearance hashing. Pattern Recognit 44(3):624–638
Article MATH Google Scholar
Jiang YG, Li Z, Chang SF (2011) Modeling scene and object contexts for human action retrieval with few examples. IEEE Trans Circuits Syst Video Technol 21(5):674–681
Article Google Scholar
Jones S, Shao L (2011) Action retrieval with relevance feedback on YouTube videos. In: Proceedings of the third international conference on internet multimedia computing and service. ACM, pp 42–45
Jones S, Shao L (2013) Content-based retrieval of human actions from realistic video databases. Inf Sci 236:56–65
Article Google Scholar
Jones S, Shao L (2014) A multigraph representation for improved unsupervised/semi-supervised learning of human actions. In: Computer vision and pattern recognition (CVPR), 2014 IEEE conference on. IEEE, pp 820–826
Jones S, Shao L, Du K (2014) Active learning for human action retrieval using query pool selection. Neurocomputing 124:89–96
Article Google Scholar
Jones S, Shao L, Zhang J, Liu Y (2012) Relevance feedback for real-world human action retrieval. Pattern Recognit Lett 33(4):446–452
Article Google Scholar
Junejo IN, Dexter E, Laptev I, Pérez P (2008) Cross-view action recognition from temporal self-similarities. Springer, Berlin Heidelberg
Book Google Scholar
Junejo IN, Dexter E, Laptev I, Perez P (2011) View-independent action recognition from temporal self-similarities. IEEE Trans Pattern Anal Mach Intell 33(1):172–185
Article Google Scholar
Kehl R, Bray M, Van Gool L (2005) Full body tracking from multiple views using stochastic sampling. In: Computer vision and pattern recognition, 2005. CVPR 2005. IEEE computer society conference on. IEEE, vol 2, pp 129–136
Klaser A, Marszałek M, Schmid C (2008) A spatio-temporal descriptor based on 3d-gradients. In: BMVC 2008-19th British machine vision conference. British machine vision association, pp 275:1
Kläser A, Marszałek M, Schmid C, Zisserman A (2012) Human focused action localization in video. In: Trends and topics in computer vision. Springer, Berlin Heidelberg, pp 219–233
Laptev I (2005) On space-time interest points. Int J Comput Vis 64(2–3):107–123
Article Google Scholar
Laptev I, Lindeberg T (2005) Space-time interest points. In: Computer vision, 2003. IEEE conference on. IEEE
Laptev I, Marszałek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: Computer vision and pattern recognition, 2008. CVPR 2008. IEEE conference on. IEEE, pp 1-8
Lew MS, Sebe N, Djeraba C, Jain R (2006) Content-based multimedia information retrieval: state of the art and challenges. ACM Trans Multimed Comput Commun Appl (TOMM) 2(1):1–19
Article Google Scholar
Li J, Allinson N, Tao D, Li X (2006) Multitraining support vector machine for image retrieval. IEEE Trans Image Process 15(11):3597–3601
Article Google Scholar
Li R, Zickler T (2012) Discriminative virtual views for cross-view action recognition. In: Computer vision and pattern recognition (CVPR), 2012 IEEE conference on. IEEE, pp 2855–2862
Lin Z, Jiang Z, Davis LS (2009) Recognizing actions by shape-motion prototype trees. In: Computer vision, 2009 IEEE 12th international conference on. IEEE, pp 444–451
Liu D, Hua XS, Yang L, Wang M, Zhang HJ (2009). Tag ranking. In: Proceedings of the 18th international conference on world wide web. ACM, pp 351–360
Liu J, Shah M, Kuipers B, Savarese S (2011) Cross-view action recognition via view knowledge transfer. In: Computer vision and pattern recognition (CVPR), 2011 IEEE conference on. IEEE, pp 3209–3216
Liu L, Bai X, Zhang H, Zhou J, Tang W (2016) Describing and learning of related parts based on latent structural model in big data. Neurocomputing 173:355–363
Article Google Scholar
Liu L, Shao L, Li X, Lu K (2015) Learning spatio-temporal representations for action recognition: a genetic programming approach
Liu L, Shao L, Zheng F, Li X (2014) Realistic action recognition via sparsely-constructed Gaussian processes. Pattern Recognit 47:3819–3827
Article Google Scholar
Liu X, Yibo L (2014) Research on human action recognition based on global and local mixed features. In: International conference on mechatronics, control and electronic engineering
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Article Google Scholar
Menier C, Boyer E, Raffin B (2006). 3d skeleton-based body pose recovery. In: 3rd international symposium on 3D data processing, visualization and transmission (DPVT’06). IEEE computer society, pp 389–396
Mikolajczyk K, Schmid C (2002) An affine invariant interest point detector. In: Computer vision—ECCV 2002. Springer, Berlin Heidelberg, pp 128–142
Mikolajczyk K, Uemura H (2008) Action recognition with motion-appearance vocabulary forest. In: Computer vision and pattern recognition, 2008. CVPR 2008. IEEE conference on. IEEE, pp 1-8
Paez F, Vanegas J, Gonzalez F (2013) An evaluation of NMF algorithm on human action video retrieval. In: Image, signal processing, and artificial vision (STSIVA), 2013 XVIII symposium of. IEEE, pp 1–4
Paez F, Vanegas J, Gonzalez F (2014) Online multimodal matrix factorization for human action video indexing. In: Content-based multimedia indexing (CBMI), 2014 12th international workshop on. IEEE, pp 1–6
Polana R, Nelson RC (1997) Detection and recognition of periodic, nonrigid motion. Int J Comput Vis 23(3):261–282
Article Google Scholar
Poppe R (2010) A survey on vision-based human action recognition. Image Vis Comput 28(6):976–990
Article Google Scholar
Ramezani M, Yaghmaee F (2014a) Content-based retrieval of human actions by extracting the main moving directions and their scales. In: 4th international conference on information technology management, communication and computer, Iran, Tehran
Ramezani M, Yaghmaee F (2014b) Using the fuzzy clustering algorithm to improve the content-based action retrieval. In: 14’th Iranian conference on fuzzy systems
Ramezani M, Yaghmaee F (2014c) Content-based retrieval of human actions by analysing the statistical information of features. In: Information and knowledge technology (IKT), 2014 6th conference on. IEEE, pp 56–60
Ramezani M, Yaghmaee F (2014d) Content-based human actions retrieval by a novel low complex action representation. In: Computer and knowledge engineering (ICCKE), 2014 4th international econference on. IEEE, pp 204–208
Reddy KK, Liu J, Shah M (2009, September) Incremental action recognition using feature-tree. In: Computer vision, 2009 IEEE 12th international conference on. IEEE, pp 1010–1017
Schüldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local SVM approach. In: Pattern recognition, 2004. ICPR 2004. Proceedings of the 17th international conference on. IEEE, vol 3, pp 32–36
Scovanner P, Ali S, Shah M (2007) A 3-dimensional sift descriptor and its application to action recognition. In: Proceedings of the 15th international conference on multimedia. ACM, pp 357–360
Shao L, Chen X (2010) Histogram of body poses and spectral regression discriminant analysis for human action categorization. In: BMVC, pp 1–11
Shao L, Jones S, Li X (2014) Efficient search and localization of human actions in video databases. IEEE Trans Circuits Syst Video Technol 24(3):504–512
Article Google Scholar
Shao L, Liu L, Yu M (2015) Kernelized multiview projection for robust action recognition. Int J Comput Vis, 1–15
Shao L, Wu D, Chen X (2011) Action recognition using correlogram of body poses and spectral regression. In: Image processing (ICIP), 2011 18th IEEE international conference on. IEEE, pp 209–212
Shao L, Zhen X, Tao D, Li X (2014) Spatio-temporal Laplacian pyramid coding for action recognition. IEEE Trans Cybern 44(6):817–827
Article Google Scholar
Smeaton AF, Browne P (2006) A usage study of retrieval modalities for video shot retrieval. Inf Process Manag 42(5):1330–1344
Article Google Scholar
Sun X, Yao H, Liu T, Xu P, Liu X (2008) Place retrieval with graph-view model. In: ACM conference on multimedia information retrieval
Tang J, Shao L, Zhen X (2013) Human action retrieval via efficient feature matching. In: Advanced video and signal based surveillance (AVSS), 2013 10th IEEE international conference on. IEEE, pp 306–311
Thi TH, Zhang J, Cheng L, Wang L, Satoh S (2010) Human action recognition and localization in video using structured learning of local space-time features. In: Advanced video and signal based surveillance (AVSS), 2010 seventh IEEE international conference on. IEEE, pp 204–211
Tong S, Chang E (2001) Support vector machine active learning for image retrieval. In: Proceedings of the ninth ACM international conference on multimedia. ACM, pp 107–118
Typke R, Wiering F, Veltkamp RC (2005). A survey of music information retrieval systems. In: ISMIR, pp 153–160
Wang H, Zheng X, Xiao B (2015) Large-scale human action recognition with spark. In: Multimedia signal processing (MMSP), 2015 IEEE 17th international workshop on. IEEE, pp 1–6
Wang J, Liu W, Kumar S, Chang SF (2016) Learning to Hash for indexing big data–a survey. Proc IEEE 104(1):34–57
Article Google Scholar
Wang M, Hong R, Li G, Zha ZJ, Yan S, Chua TS (2012) Event driven web video summarization by tag localization and key-shot identification. IEEE Trans Multimed 14(4):975–985
Article Google Scholar
Wu L, Jin R, Jain AK (2013) Tag completion for image retrieval. IEEE Trans Pattern Anal Mach Intell 35(3):716–727
Article Google Scholar
Yamato J, Ohya J, Ishii K (1992) Recognizing human action in time-sequential images using hidden markov model. In: Computer vision and pattern recognition, 1992. Proceedings CVPR’92., 1992 IEEE computer society conference on. IEEE, pp 379–385
Yan R, Hauptmann AG, Jin R (2003) Negative pseudo-relevance feedback in content-based video retrieval. In: Proceedings of the eleventh ACM international conference on Multimedia. ACM, pp 343–346
Yilmaz A, Shah M (2006) Matching actions in presence of camera motion. Comput Vis Image Underst 104(2):221–231
Article Google Scholar
Yu G, Goussies N, Yuan J, Liu Z (2011) Fast action detection via discriminative random forest voting and top-k subvolume search. IEEE Trans Multimed 13(3):507–517
Article Google Scholar
Yu G, Yuan J, Liu Z (2011) Real-time human action search using random forest based hough voting. In: Proceedings of the 19th ACM international conference on Multimedia. ACM, pp 1149–1152
Yu G, Yuan J, Liu Z (2011) Unsupervised random forest indexing for fast action search. In: Computer vision and pattern recognition (CVPR), 2011 IEEE conference on. IEEE, pp 865–872
Yu G, Yuan J, Liu Z (2015) Unsupervised trees for human action search. In: Human action analysis with randomized trees. Springer Singapore, pp 29–56
Yuan J, Liu Z, Wu Y (2011) Discriminative video pattern search for efficient action detection. IEEE Trans Pattern Anal Mach Intell 33(9):1728–1743
Article Google Scholar
Zhai X, Peng Y, Xiao J (2013) Cross-media retrieval by intra-media and inter-media correlation mining. Multimed Syst 19(5):395–406
Article Google Scholar
Zhang HJ, Wu J, Zhong D, Smoliar SW (1997) An integrated system for content-based video retrieval and browsing. Pattern Recognit 30(4):643–658
Article Google Scholar
Zhang Z, Wang C, Xiao B, Zhou W, Liu S, Shi C (2013) Cross-view action recognition via a continuous virtual path. In: Computer vision and pattern recognition (CVPR), 2013 IEEE conference on. IEEE, pp 2690–2697
Zhao S, Chen L, Yao H, Zhang Y, Sun X (2015) Strategy for dynamic 3D depth data matching towards robust action retrieval. Neurocomputing 151:533–543
Article Google Scholar
Zhen X, Shao L, Tao D, Li X (2013) Embedding motion and structure features for action recognition. IEEE Trans Circuits Syst Video Technol 23(7):1182–1190
Article Google Scholar
Zhu F, Shao L (2014) Weakly-supervised cross-domain dictionary learning for visual recognition. Int J Comput Vis 109(1–2):42–59
Article MATH Google Scholar
Zhu F, Shao L, Lin M (2013) Multi-view action recognition using local similarity random forests and sensor fusion. Pattern Recognit Lett 34(1):20–24
Article Google Scholar
Zhu X, Liu Z (2011) Human behavior clustering for anomaly detection. Front Comput Sci China 5(3):279–289
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, Semnan University, Semnan, Iran
Mohsen Ramezani & Farzin Yaghmaee

Authors

Mohsen Ramezani
View author publications
You can also search for this author in PubMed Google Scholar
Farzin Yaghmaee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Farzin Yaghmaee.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ramezani, M., Yaghmaee, F. A review on human action analysis in videos for retrieval applications. Artif Intell Rev 46, 485–514 (2016). https://doi.org/10.1007/s10462-016-9473-y

Download citation

Published: 15 March 2016
Issue Date: December 2016
DOI: https://doi.org/10.1007/s10462-016-9473-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A review on human action analysis in videos for retrieval applications

Abstract

Access this article

Similar content being viewed by others

Motion pattern based representation for improving human action retrieval

STHARNet: spatio-temporal human action recognition network in content based video retrieval

Who’s the Best Charades Player? Mining Iconic Movement of Semantic Concepts

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A review on human action analysis in videos for retrieval applications

Abstract

Access this article

Similar content being viewed by others

Motion pattern based representation for improving human action retrieval

STHARNet: spatio-temporal human action recognition network in content based video retrieval

Who’s the Best Charades Player? Mining Iconic Movement of Semantic Concepts

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation