A comprehensive survey of human action recognition with spatio-temporal interest point (STIP) detector

Das Dawn, Debapratim; Shaikh, Soharab Hossain

doi:10.1007/s00371-015-1066-2

A comprehensive survey of human action recognition with spatio-temporal interest point (STIP) detector

Survey
Published: 10 March 2015

Volume 32, pages 289–306, (2016)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Debapratim Das Dawn¹ &
Soharab Hossain Shaikh²

3140 Accesses
114 Citations
1 Altmetric
Explore all metrics

Abstract

Over the past two decades, human action recognition from video has been an important area of research in computer vision. Its applications include surveillance systems, human–computer interactions and various real-world applications where one of the actor is a human being. A number of review works have been done by several researchers in the context of human action recognition. However, it is found that there is a gap in literature when it comes to methodologies of STIP-based detector for human action recognition. This paper presents a comprehensive review on STIP-based methods for human action recognition. STIP-based detectors are robust in detecting interest points from video in spatio-temporal domain. This paper also summarizes related public datasets useful for comparing performances of various techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Survey on Human Action Recognition and Detection Techniques

Human Action Recognition Using STIP Evaluation Techniques

Human Action Detection and Recognition Using SIFT and SVM

References

Baumberg, A., Hogg, D.: Generating spatiotemporal models from examples. In: 6th British Machine Vision Conference, vol. 14, pp. 525–532 (1996)
Laptev, I., Lindeberg, T.: Space–time interest points. In: Proceedings ICCV’03, pp. 432–439. France (2003)
Yuan, C., Li, X., Hu, W., Ling, H., Maybank, S.: 3D R transform on spatio-temporal interest points for action recognition. In: CVPR’13 (2013)
Aggarwal, J.K., Cai, Q.: Human motion analysis: a review. In: IEEE Proceedings of Nonrigid and Articulated Motion Workshop, vol. 73, pp. 428–440. San Juan (1997)
Wu, J., Hu, D., Chen, F.: Action recognition by hidden temporal models. Vis. Comput. 30(12), 1395–1404 (2013)
Aggarwal, J.K., Ryoo, M.S.: Human activity analysis: a review. ACM Comput. Surv. (CSUR). 43(3), 16:1–16:43 (2011)
Mukherjee, S., Biswas, S.K., Mukherjee, D.P.: Recognizing human action at a distance in video by key poses. IEEE Trans. Circuits Syst. Video Technol. 21(9), 1228–1241 (2011)
Article Google Scholar
Moravec, H.: Obstacle avoidance and navigation in the real world by a seeing robot rover. In:tech. report CMU-RI-TR-80-03, Robotics Institute, Carnegie Mellon University & doctoral dissertation, Stanford University (1980)
Harris, C., Stephens, M.: A combined corner and edge detector. In: Proceedings of Fourth Alvey Vision Conference, pp. 147–151 (1988)
Fstner, M.A., Glch, E.: A fast operator for detection and precise location of distinct Points, corners and centers of circular features. In: ISPRS Intercommission Workshop (1987)
Li, Y., Kuai, Y.: Action recognition based on spatio-temporal interest points. In: Proceedings on 5th International Conference on BioMedical Engineering and Informatics (BMEI) (2012)
Willems, G., Tuytelaars, T., Van Gool, L.: An efficient dense and scale-invariant spatio-temporal interest point detector. Comput. Vis. ECCV 5303, 650–663 (2008)
Google Scholar
Chakraborty, B., Holte, M.B., Moeslund, T.B., Gonzalez, J., Xavier Roca, F.: A selective spatio-temporal interest point detector for human action recognition in complex scenes. In: IEEE International Conference on Computer Vision (ICCV), pp. 1776–1783 (2011)
Laptev, I.: On space-time interest points. Int. J. Comput. Vis. 64, 107–123 (2005)
Article Google Scholar
Dollar, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: 2nd Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp. 65–72 (2005)
Wang, H., Fei-Fei, L.: Unsupervised learning of human action categories using spatial–temporal words. Int. J. Comput. Vis. 79, 299–318 (2007)
Kovashka, A., Grauman, K.: Learning a hierarchy of discriminative space-time neighborhood features for human action recognition. In: Proceedings on IEEE Conference, Computer Vision and Pattern Recognition (CVPR), pp. 2046–2053. San Francisco (2010)
Zhang, H., Parker, L.E.: 4-Dimensional local spatio-temporal features for human activity recognition. In: Proceedings of IEEE International Conference on Intelligent Robots and Systems, pp. 2044–2049. San Francisco (2011)
Laptev, I., Caputo, B., Schuldt, C., Lindeberg, T.: Local velocity-adapted motion events for spatio-temporal recognition. Comput. Vis. Image Underst. 108, 207–229 (2007)
Article Google Scholar
Yu, T.-H., Kim, T.-K., Cipolla, R.: Real-time action recognition by spatiotemporal semantic and structural forests. In: Proceedings of the British Machine Vision Conference, pp. 52.1–52.12. BMVA Press (2010)
Vishwakarma, S., Agrawal, A.: A survey on activity recognition and behavior understanding in video surveillance. Vis. Comput. 29, 983–1009 (2012)
Article Google Scholar
Iosifidis, A., Tefas, A., Pitas, I.: Multi-view human action recognition: a survey In: Ninth International Conference on Intelligent Information Hiding and Multimedia Signal Processing, pp. 522–525. Beijing (2013)
Akila, K., Chitrakala, S.: A comparative analysis of various representations of human action recognition in a video. Int. J. Innov. Res. Comput. Commun. Eng. 2(1), 2829–2837 (2014)
Claudette, C., Shah, M.: Motion-based recognition: a survey. Image Vis. Comput. 13(2), 129–155 (1995)
Article Google Scholar
Moeslund, T.B., Hilton, A., Kruger, V.: A survey of advances in vision-based human motion capture and analysis. Special Issue on Modeling People: Vision-based understanding of a person’s shape, appearance, movement and behavior, vol. 104, pp. 90–126 (2006)
Gavrila, D.M.: The visual analysis of human movement: a survey. Comput. Vis. Image Underst. 73, 82–98 (1999)
Article MATH Google Scholar
Krger, V., Kragic, D., Geib, C.: The meaning of action: a review on action recognition and mapping. Adv. Robot. 21(13), 1473–1501 (2007)
Google Scholar
Poppe, R.: A survey on vision-based human action recognition. Image Vis. Comput. 28, 976–990 (2010)
Article Google Scholar
Weinlanda, D., Ronfardb, R., Boyer, E.: A survey of vision-based methods for action representation, segmentation and recognition. Comput. Vis. Image Underst. 115, 224–241 (2011)
Article Google Scholar
Turaga, P., Chellappa, R., Subrahmanian, V.S., Udrea, O.: Machine recognition of human activities: a survey. IEEE Trans. Circuits Syst. Video Technol. 18(11), 1473–1488 (2008)
Article Google Scholar
Ke, S.-R., Thuc, H.L.U., Lee, Y.-J., Hwang, J.-N., Yoo, J.-H., Choi, K.-H.: A review on video-based human activity recognition. Act. Detect. Nov. Sens. Technol. 2, 88–131 (2013)
Google Scholar
Guan, D., Ma, T., Yuan, W., Lee, Y.-K., Jehad Sarkar, A.M.: Review of sensor-based activity recognition systems. IETE Tech. Rev. (Medknow Publications and Media Pvt. Ltd.) 28, 418 (2011)
Article Google Scholar
Xu, X., Tang, J., Zhang, X., Liu, X., Zhang, H., Qiu, Y.: Exploring techniques for vision based human activity recognition: methods, systems, and evaluation. Sensors 13, 1635–1650 (2013)
Article Google Scholar
Chaquet, J.M., Carmona, E.J., Fernndez-Caballero, A.: A survey of video datasets for human action and activity recognition. Comput. Vis. Image Underst. 117, 633–659 (2013)
Article Google Scholar
Hassner, T.: A critical review of action recognition benchmarks. In: IEEE Conference in Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 245–250. Portland (2013)
Laptev, I., Lindeberg, T.: Interest point detection and scale selection in space-time. In: Proceedings of 4th International Conference, pp. 372–387, UK (2003)
Laptev, I., Lindeberg, T.: Velocity adaptation of spatio-temporal receptive fields for direct recognition of activities: an experimental study. In: ECCV’02 workshop on Statistical Methods in Video Processing, pp. 61–66. Copenhagen (2003)
Laptev, I., Lindeberg, T.: Local descriptors for spatio-temporal recognition. In: First International Workshop, SCVMA, pp. 91–103. Prague (2004)
Laptev, I., Lindeberg, T.: Velocity adaptation of space-time interest points. In: Proceedings of the 17th International Conference on Pattern Recognition, ICPR, pp. 52–56 (2004)
Chakraborty, B., Holte, M.B., Moeslund, T.B., Gonzalez, J.: Selective spatio-temporal interest points. In: Special issue on Semantic Understanding of Human Behaviors in Image Sequences, vol. 116(3), pp. 396–410 (2012)
Wong, S.-F., Cipolla, R.: Extracting spatiotemporal interest points using global information. In: Proceedings on 11th IEEE International Conference of Computer Vision, ICCV. Rio de Janeiro, pp. 1–8 (2007)
Yan, X., Luo, Y.: Recognizing human actions using a new descriptor based on spatialtemporal interest points and weighted-output classifier. J. Neurocomput. 87, 51–61 (2012)
Article Google Scholar
Wang, H., Ullah, M.M., Klser, A., Laptev, I., Schmid, C.: Evaluation of local spatio-temporal features for action recognition. In: Proceedings British Machine Vision Conference, pp. 1–18 (2009)
Liangliang, C., Tian, Y.L., Liu, Z., Yao, B., Zhang, Z., Huang, T.S.: Action detection using multiple spatial–temporal interest point features. In: International Conference on Multimedia and Expo, pp. 340–345. IEEE (2010)
Matikainen, P., Hebert, M., Sukthankar, R.: Representing pairwise spatial and temporal relations for action recognition. In: Proceedings on 11th European conference of the Computer vision: Part I, pp. 508–521. ECCV (2010)
Yu, G., Yuan, J., Liu, Z.: Predicting human activities using spatio-temporal structure of interest points. In: Proceedings of the 20th ACM international conference on Multimedia, pp. 1049–1052. New York (2012)
Scovanner, P., Ali, S., Shah, M.: A 3-Dimensional SIFT descriptor and its application to action recognition. In: Proceedings of the 15th international conference on Multimedia, pp. 357–360 (2007)
Laptev, I., Marszaek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8 (2008)
Weinland, D., Ozuysal, M., Fua, P.: Making action recognition robust to occlusions and viewpoint changes. In: 11th European Conference on Computer Vision, pp. 635–648 (2010)
Wang, T., Wang, S., Ding, X.: Detecting human action as the spatio-temporal tube of maximum mutual information. IEEE Trans. Circuits Syst. Video Technol. 24(2), 277–290 (2013)
Singh, V.K., Nevatia, R.: Simultaneous tracking and action recognition for single actor human actions. Vis. Comput. 27(12), 1115–1123 (2011)
Article Google Scholar
Jiang, X., Zhong, F., Peng, Q., Qin, X.: Online robust action recognition based on a hierarchical model. Vis. Comput. 30(9), 1021–1033 (2014)
Ramanathan, M., Yau, W.-Y., Teoh, E.K.: Human action recognition with video data: research and evaluation challenges. IEEE Trans. Hum. Mach. Syst. 44(5), 650–663 (2014)
Article Google Scholar
Qiuxia, W., Wang, Z., Deng, F., Chi, Z., Feng, D.D.: Realistic human action recognition with multimodal feature selection and fusion. IEEE Trans. Syst. Man Cybern. Syst. 43(4), 875–885 (2013)
Article Google Scholar
Gorelick, L., Blank, M., Shechtman, E., Irani, M., Basri, R.: Actions as space–time shapes. Trans. Pattern Anal. Mach. Intell. 29(12), 2247–2253 (2007)
Article Google Scholar
Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In:17th International Conference on ICPR 3, pp. 32–36 (2004)
Rodriguez, M.D., Ahmed, J., Shah, M.: Action MACH: a spatio-temporal maximum average correlation height filter for action recognition. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (2008)
Laptev, I., Marszaek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2008)
Marszałek, M., Laptev, I., Schmid, C.: Actions in context. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2929–2936 (2009)
Yuan, J., Liu, Z., Wu, Y.: Discriminative subvolume search for efficient action detection. In: IEEE Conference on Computer Vision and Pattern Recognition (2009)
Kliper-Gross, O., Hassner, T., Wolf, L.: The action similarity labeling challenge. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 34(3) (2012)
Munaro, M., Ballin, G., Michieletto, S., Menegatti, E.: 3D flow estimation for human action recognition from colored point clouds. In: Biologically Inspired Cognitive Architectures (BICA) vol. 5, pp. 42–51 (2013)
Weinland, D., Ronfard, R., Boyer, E.: Free viewpoint action recognition using motion history volumes. Comput. Vis. Image Underst. 104(2–3), 249–257 (2006)
Gkalelis, N., Kim, H., Hilton, A., Nikolaidis, N., Pitas, I.: The i3DPost multi-view and 3D human action/interaction. In: Conference for Visual Media Production, pp. 159–168 (2009)
Singh, S., Velastin, S.A., Ragheb, H.: MuHAVi: A multicamera human action video dataset for the evaluation of action recognition methods. In: 2nd Workshop on Activity monitoring by multi-camera surveillance systems(AMMCSS), pp. 48–55 (2010)
Oh, S., Hoogs, A., Perera, A., Cuntoor, N., et al.: A large-scale benchmark dataset for event recognition in surveillance video. In: Proceedings of IEEE Computer Vision and Pattern Recognition (CVPR) (2011)

Download references

Author information

Authors and Affiliations

University of Calcutta, Kolkata, India
Debapratim Das Dawn
Faculty Member, University of Calcutta, Kolkata, India
Soharab Hossain Shaikh

Authors

Debapratim Das Dawn
View author publications
You can also search for this author in PubMed Google Scholar
Soharab Hossain Shaikh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Debapratim Das Dawn.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Das Dawn, D., Shaikh, S.H. A comprehensive survey of human action recognition with spatio-temporal interest point (STIP) detector. Vis Comput 32, 289–306 (2016). https://doi.org/10.1007/s00371-015-1066-2

Download citation

Published: 10 March 2015
Issue Date: March 2016
DOI: https://doi.org/10.1007/s00371-015-1066-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comprehensive survey of human action recognition with spatio-temporal interest point (STIP) detector

Abstract

Access this article

Similar content being viewed by others

A Survey on Human Action Recognition and Detection Techniques

Human Action Recognition Using STIP Evaluation Techniques

Human Action Detection and Recognition Using SIFT and SVM

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A comprehensive survey of human action recognition with spatio-temporal interest point (STIP) detector

Abstract

Access this article

Similar content being viewed by others

A Survey on Human Action Recognition and Detection Techniques

Human Action Recognition Using STIP Evaluation Techniques

Human Action Detection and Recognition Using SIFT and SVM

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation