Skip to main content
Log in

A review on human action analysis in videos for retrieval applications

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

Today, the number of available videos on the Internet is significantly increased. Content-based video retrieval is used for finding the users’ desired items among these big video data. Memorizing details of the videos and intricate relations between included objects in videos can be considered as the major challenges of this big data topic. A large portion of video data relates to the humans. Thus, human action retrieval has been introduced as a new big data topic that seeks to find video objects based on the included human action. Human action retrieval has been applicated in different domains such as video search, intelligent human–computer interaction, robotics, video surveillance and human behavior analysis. There are some challenges such as variations in rotation, scale, style and above-mentioned challenges for the big video data that can impress the retrieval accuracy. In this paper, a survey on human action retrieval studies is presented that the methodologies have been analyzed from action representation and retrieving perspectives. Moreover, limitations and common datasets of human action retrieval are introduced before describing the state-of-the-arts’ methodologies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26

Similar content being viewed by others

References

  • Akpınar S, Alpaslan FN (2014) Video action recognition using an optical flow based representation

  • Arman F, Depommier R, Hsu A, Chiu MY (1994) Content-based browsing of video sequences. In: Proceedings of the second ACM international conference on Multimedia. ACM, pp 97–103

  • Barnachon M, Bouakaz S, Boufama B, Guillou E (2013) A real-time system for motion retrieval and interpretation. Pattern Recognit Lett 34(15):1789–1798

    Article  Google Scholar 

  • Ben-Arie J, Wang Z, Pandit P, Rajaram S (2002) Human activity recognition using multidimensional indexing. IEEE Trans Pattern Anal Mach Intell 24(8):1091–1104

    Article  Google Scholar 

  • Bregonzio M, Gong S, Xiang T (2009) Recognising action as clouds of space-time interest points. In: Computer vision and pattern recognition, 2009. CVPR 2009. IEEE conference on. IEEE, pp 1948–1955

  • Bulbul MF, Jiang Y, Ma J (2015) Human action recognition based on DMMs, HOGs and contourlet transform

  • Caicedo JC, González FA (2012) Multimodal fusion for image retrieval using matrix factorization. In: Proceedings of the 2nd ACM international conference on multimedia retrieval. ACM, p 56

  • Chen CY, Grauman K (2012) Efficient activity detection with max-subgraph search. In: Computer vision and pattern recognition (CVPR), 2012 IEEE conference on. IEEE, pp 1274–1281

  • Choi J, Jeon WJ, Lee SC (2008) Spatio-temporal pyramid matching for sports videos. In: Proceedings of the 1st ACM international conference on multimedia information retrieval. ACM, pp 291–297

  • Ciptadi A, Goodwin MS, Rehg JM (2014) Movement pattern histogram for action recognition and retrieval. In: Computer vision—ECCV 2014. Springer International Publishing, pp 695–710

  • Cohn DA, Ghahramani Z, Jordan MI (1996) Active learning with statistical models. J Artif Intell Res 4(17):129–145

  • Davis JW, Bobick AE (1997) The representation and recognition of human movement using temporal templates. In: Computer vision and pattern recognition, 1997. Proceedings., 1997 IEEE computer society conference on. IEEE, pp 928–934

  • Dollár P, Rabaud V, Cottrell G, Belongie S (2005) Behavior recognition via sparse spatio-temporal features. In: Visual surveillance and performance evaluation of tracking and surveillance, 2005. 2nd joint IEEE international workshop on. IEEE, pp 65–72

  • Efros A, Berg AC, Mori G, Malik J (2003) Recognizing action at a distance. In: Computer vision, 2003. Proceedings. Ninth IEEE international conference on. IEEE, pp 726–733

  • Fossati A, Dimitrijevic M, Lepetit V, Fua P (2007) Bridging the gap between detection and tracking for 3D monocular video-based motion capture. In: Computer vision and pattern recognition, 2007. CVPR’07. IEEE conference on. IEEE, pp 1–8

  • Gao Y, Wang T, Li J, Du Y, Hu W, Zhang Y, Ai H (2007) Cast indexing for videos by ncuts and page ranking. In: Proceedings of the 6th ACM international conference on Image and video retrieval. ACM, pp 441–447

  • Gómez-Conde I, Olivieri DN (2015) A KPCA spatio-temporal differential geometric trajectory cloud classifier for recognizing human actions in a CBVR system. Expert Syst Appl 42(13):5472–5490

    Article  Google Scholar 

  • Gowsikhaa D, Abirami S, Baskaran R (2014) Automated human behavior analysis from surveillance videos: a survey. Artif Intell Rev 42(4):747–765

    Article  Google Scholar 

  • Ji R, Yao H, Sun X (2011) Actor-independent action search using spatiotemporal vocabulary with appearance hashing. Pattern Recognit 44(3):624–638

    Article  MATH  Google Scholar 

  • Jiang YG, Li Z, Chang SF (2011) Modeling scene and object contexts for human action retrieval with few examples. IEEE Trans Circuits Syst Video Technol 21(5):674–681

    Article  Google Scholar 

  • Jones S, Shao L (2011) Action retrieval with relevance feedback on YouTube videos. In: Proceedings of the third international conference on internet multimedia computing and service. ACM, pp 42–45

  • Jones S, Shao L (2013) Content-based retrieval of human actions from realistic video databases. Inf Sci 236:56–65

    Article  Google Scholar 

  • Jones S, Shao L (2014) A multigraph representation for improved unsupervised/semi-supervised learning of human actions. In: Computer vision and pattern recognition (CVPR), 2014 IEEE conference on. IEEE, pp 820–826

  • Jones S, Shao L, Du K (2014) Active learning for human action retrieval using query pool selection. Neurocomputing 124:89–96

    Article  Google Scholar 

  • Jones S, Shao L, Zhang J, Liu Y (2012) Relevance feedback for real-world human action retrieval. Pattern Recognit Lett 33(4):446–452

    Article  Google Scholar 

  • Junejo IN, Dexter E, Laptev I, Pérez P (2008) Cross-view action recognition from temporal self-similarities. Springer, Berlin Heidelberg

    Book  Google Scholar 

  • Junejo IN, Dexter E, Laptev I, Perez P (2011) View-independent action recognition from temporal self-similarities. IEEE Trans Pattern Anal Mach Intell 33(1):172–185

    Article  Google Scholar 

  • Kehl R, Bray M, Van Gool L (2005) Full body tracking from multiple views using stochastic sampling. In: Computer vision and pattern recognition, 2005. CVPR 2005. IEEE computer society conference on. IEEE, vol 2, pp 129–136

  • Klaser A, Marszałek M, Schmid C (2008) A spatio-temporal descriptor based on 3d-gradients. In: BMVC 2008-19th British machine vision conference. British machine vision association, pp 275:1

  • Kläser A, Marszałek M, Schmid C, Zisserman A (2012) Human focused action localization in video. In: Trends and topics in computer vision. Springer, Berlin Heidelberg, pp 219–233

  • Laptev I (2005) On space-time interest points. Int J Comput Vis 64(2–3):107–123

    Article  Google Scholar 

  • Laptev I, Lindeberg T (2005) Space-time interest points. In: Computer vision, 2003. IEEE conference on. IEEE

  • Laptev I, Marszałek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: Computer vision and pattern recognition, 2008. CVPR 2008. IEEE conference on. IEEE, pp 1-8

  • Lew MS, Sebe N, Djeraba C, Jain R (2006) Content-based multimedia information retrieval: state of the art and challenges. ACM Trans Multimed Comput Commun Appl (TOMM) 2(1):1–19

    Article  Google Scholar 

  • Li J, Allinson N, Tao D, Li X (2006) Multitraining support vector machine for image retrieval. IEEE Trans Image Process 15(11):3597–3601

    Article  Google Scholar 

  • Li R, Zickler T (2012) Discriminative virtual views for cross-view action recognition. In: Computer vision and pattern recognition (CVPR), 2012 IEEE conference on. IEEE, pp 2855–2862

  • Lin Z, Jiang Z, Davis LS (2009) Recognizing actions by shape-motion prototype trees. In: Computer vision, 2009 IEEE 12th international conference on. IEEE, pp 444–451

  • Liu D, Hua XS, Yang L, Wang M, Zhang HJ (2009). Tag ranking. In: Proceedings of the 18th international conference on world wide web. ACM, pp 351–360

  • Liu J, Shah M, Kuipers B, Savarese S (2011) Cross-view action recognition via view knowledge transfer. In: Computer vision and pattern recognition (CVPR), 2011 IEEE conference on. IEEE, pp 3209–3216

  • Liu L, Bai X, Zhang H, Zhou J, Tang W (2016) Describing and learning of related parts based on latent structural model in big data. Neurocomputing 173:355–363

    Article  Google Scholar 

  • Liu L, Shao L, Li X, Lu K (2015) Learning spatio-temporal representations for action recognition: a genetic programming approach

  • Liu L, Shao L, Zheng F, Li X (2014) Realistic action recognition via sparsely-constructed Gaussian processes. Pattern Recognit 47:3819–3827

    Article  Google Scholar 

  • Liu X, Yibo L (2014) Research on human action recognition based on global and local mixed features. In: International conference on mechatronics, control and electronic engineering

  • Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110

    Article  Google Scholar 

  • Menier C, Boyer E, Raffin B (2006). 3d skeleton-based body pose recovery. In: 3rd international symposium on 3D data processing, visualization and transmission (DPVT’06). IEEE computer society, pp 389–396

  • Mikolajczyk K, Schmid C (2002) An affine invariant interest point detector. In: Computer vision—ECCV 2002. Springer, Berlin Heidelberg, pp 128–142

  • Mikolajczyk K, Uemura H (2008) Action recognition with motion-appearance vocabulary forest. In: Computer vision and pattern recognition, 2008. CVPR 2008. IEEE conference on. IEEE, pp 1-8

  • Paez F, Vanegas J, Gonzalez F (2013) An evaluation of NMF algorithm on human action video retrieval. In: Image, signal processing, and artificial vision (STSIVA), 2013 XVIII symposium of. IEEE, pp 1–4

  • Paez F, Vanegas J, Gonzalez F (2014) Online multimodal matrix factorization for human action video indexing. In: Content-based multimedia indexing (CBMI), 2014 12th international workshop on. IEEE, pp 1–6

  • Polana R, Nelson RC (1997) Detection and recognition of periodic, nonrigid motion. Int J Comput Vis 23(3):261–282

    Article  Google Scholar 

  • Poppe R (2010) A survey on vision-based human action recognition. Image Vis Comput 28(6):976–990

    Article  Google Scholar 

  • Ramezani M, Yaghmaee F (2014a) Content-based retrieval of human actions by extracting the main moving directions and their scales. In: 4th international conference on information technology management, communication and computer, Iran, Tehran

  • Ramezani M, Yaghmaee F (2014b) Using the fuzzy clustering algorithm to improve the content-based action retrieval. In: 14’th Iranian conference on fuzzy systems

  • Ramezani M, Yaghmaee F (2014c) Content-based retrieval of human actions by analysing the statistical information of features. In: Information and knowledge technology (IKT), 2014 6th conference on. IEEE, pp 56–60

  • Ramezani M, Yaghmaee F (2014d) Content-based human actions retrieval by a novel low complex action representation. In: Computer and knowledge engineering (ICCKE), 2014 4th international econference on. IEEE, pp 204–208

  • Reddy KK, Liu J, Shah M (2009, September) Incremental action recognition using feature-tree. In: Computer vision, 2009 IEEE 12th international conference on. IEEE, pp 1010–1017

  • Schüldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local SVM approach. In: Pattern recognition, 2004. ICPR 2004. Proceedings of the 17th international conference on. IEEE, vol 3, pp 32–36

  • Scovanner P, Ali S, Shah M (2007) A 3-dimensional sift descriptor and its application to action recognition. In: Proceedings of the 15th international conference on multimedia. ACM, pp 357–360

  • Shao L, Chen X (2010) Histogram of body poses and spectral regression discriminant analysis for human action categorization. In: BMVC, pp 1–11

  • Shao L, Jones S, Li X (2014) Efficient search and localization of human actions in video databases. IEEE Trans Circuits Syst Video Technol 24(3):504–512

    Article  Google Scholar 

  • Shao L, Liu L, Yu M (2015) Kernelized multiview projection for robust action recognition. Int J Comput Vis, 1–15

  • Shao L, Wu D, Chen X (2011) Action recognition using correlogram of body poses and spectral regression. In: Image processing (ICIP), 2011 18th IEEE international conference on. IEEE, pp 209–212

  • Shao L, Zhen X, Tao D, Li X (2014) Spatio-temporal Laplacian pyramid coding for action recognition. IEEE Trans Cybern 44(6):817–827

    Article  Google Scholar 

  • Smeaton AF, Browne P (2006) A usage study of retrieval modalities for video shot retrieval. Inf Process Manag 42(5):1330–1344

    Article  Google Scholar 

  • Sun X, Yao H, Liu T, Xu P, Liu X (2008) Place retrieval with graph-view model. In: ACM conference on multimedia information retrieval

  • Tang J, Shao L, Zhen X (2013) Human action retrieval via efficient feature matching. In: Advanced video and signal based surveillance (AVSS), 2013 10th IEEE international conference on. IEEE, pp 306–311

  • Thi TH, Zhang J, Cheng L, Wang L, Satoh S (2010) Human action recognition and localization in video using structured learning of local space-time features. In: Advanced video and signal based surveillance (AVSS), 2010 seventh IEEE international conference on. IEEE, pp 204–211

  • Tong S, Chang E (2001) Support vector machine active learning for image retrieval. In: Proceedings of the ninth ACM international conference on multimedia. ACM, pp 107–118

  • Typke R, Wiering F, Veltkamp RC (2005). A survey of music information retrieval systems. In: ISMIR, pp 153–160

  • Wang H, Zheng X, Xiao B (2015) Large-scale human action recognition with spark. In: Multimedia signal processing (MMSP), 2015 IEEE 17th international workshop on. IEEE, pp 1–6

  • Wang J, Liu W, Kumar S, Chang SF (2016) Learning to Hash for indexing big data–a survey. Proc IEEE 104(1):34–57

    Article  Google Scholar 

  • Wang M, Hong R, Li G, Zha ZJ, Yan S, Chua TS (2012) Event driven web video summarization by tag localization and key-shot identification. IEEE Trans Multimed 14(4):975–985

    Article  Google Scholar 

  • Wu L, Jin R, Jain AK (2013) Tag completion for image retrieval. IEEE Trans Pattern Anal Mach Intell 35(3):716–727

    Article  Google Scholar 

  • Yamato J, Ohya J, Ishii K (1992) Recognizing human action in time-sequential images using hidden markov model. In: Computer vision and pattern recognition, 1992. Proceedings CVPR’92., 1992 IEEE computer society conference on. IEEE, pp 379–385

  • Yan R, Hauptmann AG, Jin R (2003) Negative pseudo-relevance feedback in content-based video retrieval. In: Proceedings of the eleventh ACM international conference on Multimedia. ACM, pp 343–346

  • Yilmaz A, Shah M (2006) Matching actions in presence of camera motion. Comput Vis Image Underst 104(2):221–231

    Article  Google Scholar 

  • Yu G, Goussies N, Yuan J, Liu Z (2011) Fast action detection via discriminative random forest voting and top-k subvolume search. IEEE Trans Multimed 13(3):507–517

    Article  Google Scholar 

  • Yu G, Yuan J, Liu Z (2011) Real-time human action search using random forest based hough voting. In: Proceedings of the 19th ACM international conference on Multimedia. ACM, pp 1149–1152

  • Yu G, Yuan J, Liu Z (2011) Unsupervised random forest indexing for fast action search. In: Computer vision and pattern recognition (CVPR), 2011 IEEE conference on. IEEE, pp 865–872

  • Yu G, Yuan J, Liu Z (2015) Unsupervised trees for human action search. In: Human action analysis with randomized trees. Springer Singapore, pp 29–56

  • Yuan J, Liu Z, Wu Y (2011) Discriminative video pattern search for efficient action detection. IEEE Trans Pattern Anal Mach Intell 33(9):1728–1743

    Article  Google Scholar 

  • Zhai X, Peng Y, Xiao J (2013) Cross-media retrieval by intra-media and inter-media correlation mining. Multimed Syst 19(5):395–406

    Article  Google Scholar 

  • Zhang HJ, Wu J, Zhong D, Smoliar SW (1997) An integrated system for content-based video retrieval and browsing. Pattern Recognit 30(4):643–658

    Article  Google Scholar 

  • Zhang Z, Wang C, Xiao B, Zhou W, Liu S, Shi C (2013) Cross-view action recognition via a continuous virtual path. In: Computer vision and pattern recognition (CVPR), 2013 IEEE conference on. IEEE, pp 2690–2697

  • Zhao S, Chen L, Yao H, Zhang Y, Sun X (2015) Strategy for dynamic 3D depth data matching towards robust action retrieval. Neurocomputing 151:533–543

    Article  Google Scholar 

  • Zhen X, Shao L, Tao D, Li X (2013) Embedding motion and structure features for action recognition. IEEE Trans Circuits Syst Video Technol 23(7):1182–1190

    Article  Google Scholar 

  • Zhu F, Shao L (2014) Weakly-supervised cross-domain dictionary learning for visual recognition. Int J Comput Vis 109(1–2):42–59

    Article  MATH  Google Scholar 

  • Zhu F, Shao L, Lin M (2013) Multi-view action recognition using local similarity random forests and sensor fusion. Pattern Recognit Lett 34(1):20–24

    Article  Google Scholar 

  • Zhu X, Liu Z (2011) Human behavior clustering for anomaly detection. Front Comput Sci China 5(3):279–289

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Farzin Yaghmaee.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ramezani, M., Yaghmaee, F. A review on human action analysis in videos for retrieval applications. Artif Intell Rev 46, 485–514 (2016). https://doi.org/10.1007/s10462-016-9473-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-016-9473-y

Keywords

Navigation