Abstract
This paper provides a comprehensive survey for activity recognition in video surveillance. It starts with a description of simple and complex human activity, and various applications. The applications of activity recognition are manifold, ranging from visual surveillance through content based retrieval to human computer interaction. The organization of this paper covers all aspects of the general framework of human activity recognition. Then it summarizes and categorizes recent-published research progresses under a general framework. Finally, this paper also provides an overview of benchmark databases for activity recognition, the market analysis of video surveillance, and future directions to work on for this application.
Similar content being viewed by others
References
Aggarwal, J.K., Cai, Q.: Human motion analysis: a review. Comput. Vis. Image Underst. 73(3), 428–440 (1999)
Aggarwal, J.K., Ryoo, M.S.: Human activity analysis: a review. ACM Comput. Surv. 43(3), 1–43 (2011)
Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Mach. Learn. 06, 37–66 (1991)
Allili, M.S., Bouguila, N., Ziou, D.: A robust video foreground segmentation by using generalized Gaussian mixture modeling. In: 4th Canadian Conf. on Computer and Robot Vision, pp. 503–509 (2007)
Bayona, A., SanMiguel, J.C., Martínez, J.M.: Stationary foreground detection using background subtraction and temporal difference in video surveillance. In: IEEE 17th Int. Conf. on Image Processing, pp. 1–4 (2010)
Blunsom, P.: Hidden Markov models. Tech. rep, Human Language Technology University of Melbourne, Victoria, Australia (2004). http://www.cs.mu.oz.au/460/2004/materials/hmm-tutorial.pdf
Bobick, A.F., Davis, J.W.: The recognition of human movement using temporal templates. IEEE Trans. Pattern Anal. Mach. Intell. 23(3), 257–267 (2001)
Bobick, A.F., Wilson, A.D.: A state-based approach to the representation and recognition of gesture. IEEE Trans. Pattern Anal. Mach. Intell. 19(12), 1325–1337 (1997)
Bose, B., Grimson, E.: Improving object classification in far-field video. In: Proc. of the Int. Conf. on Computer Vision and Pattern Recognition, pp. 181–188. IEEE Computer Society, Washington (2004)
Brown, L.M.: View independent vehicle/person classification. In: Proc. of the ACM 2nd Int. Workshop on Video Surveillance & Sensor Networks, pp. 114–123. ACM Press, New York (2004)
Bucak, S.S., Gunsel, B., Gursoy, O.: Incremental nonnegative matrix factorization for background modeling in surveillance video. In: IEEE 15th Signal Processing and Communications Applications (SIU), pp. 1–4 (2007)
Cai, L., He, L., Yamashita, T., Xu, Y., Zhao, Y., Yang, X.: Robust contour tracking by combining region and boundary information. IEEE Trans. Circuits Syst. Video Technol. 21(12), 1784–1794 (2011)
Campbell, L., Bobick, A.: Recognition of human body motion using phase space constraints. In: ICCV, pp. 624–630 (1995)
Camplani, M., Salgado, L.: Adaptive background modeling in multicamera system for real-time object detection. Opt. Eng. 50(12), 1–17 (2011)
Cavallaro, A., Steiger, O., Ebrahimi, T.: Tracking video objects in cluttered background. IEEE Trans. Circuits Syst. Video Technol. 15(4), 575–584 (2005)
Chai, Y., Shin, S., Chang, K., Kim, T.: Real-time user interface using particle filter with integral histogram. IEEE Trans. Consum. Electron. 56(2), 510–515 (2010)
Chang, S.F.: The holy grail of content-based media analysis. IEEE Multimed. 9(2), 6–10 (2002)
Chen, L., Yang, H., Takaki, T., Ishii, I.: Real-time frame-straddling-based optical flow detection. In: Proc. of IEEE Int. Conf. on Robotics and Biomimetics, pp. 2447–2452 (2011)
Chen, Q., Sun, Q.S., Heng, P.A., Xia, D.S.: Two-stage object tracking method based on kernel and active contour. IEEE Trans. Circuits Syst. Video Technol. 20(4), 605–609 (2010)
Chen, Y., Zhang, L., Lin, B., Xu, Y., Ren, X.: Fighting detection based on optical flow context histogram. In: Proc. of IEEE 2nd Int. Conf. on Innovations in Bio-inspired Computing and Applications, pp. 95–98 (2011)
Cheng, F.H., Chen, Y.L.: Real time multiple objects tracking and identification based on discretewavelet transform. Pattern Recognit. 39, 1126–1139 (2006)
Cheung, K., Baker, S., Kanade, T.: Shape-from-silhouette across time part II: applications to human modeling and markerless motion tracking. Int. J. Comput. Vis. 63(3), 225–245 (2005)
Chiverton, J., Mirmehdi, M., Xie, X.: On-line learning of shape information for object segmentation and tracking. In: Proc. of British Machine Vision Conference, pp. 1–11 (2009)
Chiverton, J., Xie, X., Mirmehdi, M.: Automatic bootstrapping and tracking of object contours. IEEE Trans. Image Process. 21(3), 1231–1245 (2012)
Chomat, O., Crowley, J.L.: Probabilistic recognition of activity using local appearance. In: IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, vol. 2, pp. 637–663 (1999)
Cohen, C.J., Morelli, F., Scott, K.A.: A surveillance system for recognition of intent within individuals and crowds. In: Conf. on Technologies for Homeland Security, Waltham, MA, pp. 559–565. IEEE Press, New York (2008)
Cohen, W.W.: Fast effective rule induction. In: Proc. of 12th Int. Conf. on Machine Learning, pp. 115–123. Morgan Kaufmann, San Mateo (1995)
Coifman, B., Beymer, D., McLauchlan, P., Malik, J.: A real-time computer vision system for vehicle tracking and traffic surveillance. Transp. Res., Part C, Emerg. Technol. 6(4), 271–288 (1998)
Collins, R.T., Lipton, A.J., Kanade, T., Fujiyoshi, H., Duggins, D., Tsin, Y., Tolliver, D., Enomoto, N., Hasegawa, O., Burt, P., Wixson, L.: A system for video surveillance and monitoring. Tech. rep, Robotics Institute at Carnegie Mellon University (2000)
Comaniciu, D., Ramesh, V., Meer, P.: Kernel-based object tracking. IEEE Trans. Pattern Anal. Mach. Intell. 25(5), 564–577 (2003)
Cupillard, F., Bremond, F., Thonnat, M.: Group behavior recognition with multiple cameras. In: Proc. 6th IEEE Workshop on Applications of Computer Vision, pp. 177–183 (2002)
Cutler, R., Davis, L.S.: Robust real-time periodic motion detection, analysis, and applications. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 781–796 (2000)
Dai, P., Di, H., Dong, L., Tao, L., Xu, G.: Group interaction analysis in dynamic context. IEEE Trans. Syst. Man Cybern. 38(1), 275–282 (2008)
Damen, D., Hogg, D.: Recognizing linked events: searching the space of feasible explanations. In: IEEE Conf. on Computer Vision and Pattern Recognition, pp. 927–934 (2009)
Darrell, T., Pentland, A.: Space-time gestures. In: Proc. IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, pp. 335–340 (1993)
Denman, S., Chandran, V., Sridharan, S.: Adaptive optical flow for person tracking. In: Proc. of the Digital Imaging Computing: Techniques and Applications, DICTA ’05, pp. 1–7 (2005)
Denman, S., Chandran, V., Sridharan, S.: An adaptive optical flow technique for person tracking systems. Pattern Recognit. Lett. 28(10), 1232–1239 (2007)
Denman, S., Fookes, C., Sridharan, S.: Improved simultaneous computation of motion detection and optical flow for object tracking. In: IEEE Digital Image Computing: Techniques and Applications, pp. 175–182 (2009)
Dollar, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: Int. Conf. on Computer Communications and Networks, vol. 14, pp. 65–72. IEEE Press, New York (2005)
Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis. Wiley, Stanford Research Institute, Menlo Park (1973)
Efros, A.A., Berg, A.C., Mori, G., Malik, J.: Recognizing action at a distance. In: Proc. 9th IEEE Int. Conf. on Computer Vision, vol. 2, pp. 726–733 (2003)
Elgammal, A., Harwood, D., Davis, L.: Non-parametric model for background subtraction. In: Frame-Rate Workshop, pp. 751–767. IEEE Press, New York (2000)
Fazli, S., Pour, H.M., Bouzari, H.: Multiple object tracking using improved GMM based motion segmentation. In: IEEE ECTI-CON, vol. 2, pp. 1130–1133 (2009)
Fergus, R., Perona, P., Zisserman, A.: Object class recognition by unsupervised scale-invariant learning. In: Proc. of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 264–271 (2003)
Filipovych, R., Ribeiro, E.: Combining models of pose and dynamics for human motion recognition. In: 3rd International Springer Symposium on Advances in Visual Computing, Aberdeen, Scotland, pp. 21–32 (2007)
Forsyth, D.A., Arikan, O., Ikemoto, L., O’Brien, J., Ramanan, D.: Computational studies of human motion: part 1, tracking and motion synthesis. Found. Trends Comput. Graph. Vis. 1(02/03), 77–254 (2005)
Gallagher, M., Downs, T.: Visualization of learning in multilayer perceptron networks using principal component analysis. IEEE Trans. Syst. Man Cybern. 33, 28–34 (2003)
Gavrilla, D., Davis, L.: 3D Model-based tracking of humans in action: a multi-view approach. In: Int. Proc. of the Computer Vision and Pattern Recognition, pp. 73–80 (1996)
Ghanem, N., DeMenthon, D., Doermann, D., Davis, L.: Representation and recognition of events in surveillance video using Petri nets. In: Conf. on Computer Vision and Pattern Recognition Workshop, pp. 112–121 (2004)
Gilbert, A., Illingworth, J., Bowden, R.: Fast realistic multi-action recognition using mined dense spatio-temporal features. In: IEEE 12th Int. Conf. on Computer Vision, pp. 925–931 (2009)
Girisha, R., Murali, S.: Tracking humans using novel optical flow algorithm for surveillance videos. In: Proceedings of the 4th Annual ACM Bangalore Conf., COMPUTE ’11, pp. 1–8 (2011)
Gong, S., Xiang, T.: Recognition of group activities using dynamic probabilistic networks. In: Proc. 9th IEEE Int. Conf. on Computer Vision, vol. 2, pp. 742–749 (2003)
Gorelick, L., Blank, M., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. IEEE Trans. Pattern Anal. Mach. Intell. 29(12), 2247–2253 (2007)
Gupta, A., Davis, L.S.: Objects in action: an approach for combining action understanding and object perception. In: IEEE Conf. on Computer Vision and Pattern Recognition, pp. 1–8 (2007)
Gupta, A., Srinivasan, P., Shi, J., Davis, L.S.: Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos. In: IEEE Conf. on Computer Vision and Pattern Recognition, pp. 2012–2019 (2009)
Haritaoglu, I., Harwood, D., Davis, L.S.: W 4: real-time surveillance of people and their activities. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 309–330 (2000)
Heisele, B., Ho, P., Wu, J., Poggio, T.: Face recognition: component-based versus global approaches. Comput. Vis. Image Underst. 91, 6–21 (2003)
Hoiem, D., Efros, A.A., Hebert, M.: Putting objects in perspective. Int. J. Comput. Vis. 80, 3–15 (2008)
Hu, W., Tan, T., Wang, L., Maybank, S.: A survey on visual surveillance of object motion and behaviors. IEEE Trans. Syst. Man Cybern., Part C, Appl. Rev. 34(3), 334–352 (2004)
Hu, W., Xie, D., Tan, T., Maybank, S.: Learning activity patterns using fuzzy self-organizing neural network. IEEE Trans. Syst. Man Cybern. 34(3), 1618–1626 (2004)
Huang, J., et al.: GPU-accelerated computation for robust motion tracking using the CUDA framework. In: Int. Conf. on Visual Information Engineering, vol. 5, pp. 437–442 (2008)
11th IEEE Int. Workshop on Performance Evaluation of Tracking and Surveillance (2009). http://www.cvg.rdg.ac.uk/PETS2009/authors.html
Imagery Library for Intelligent Detection Systems (2010). http://www.ilids.co.uk
IMS Research. http://www.imsresearch.com/
Ince, S., Konrad, J.: Occlusion-aware optical flow estimation. IEEE Trans. Image Process. 17(8), 1443–1451 (2008)
Intille, S.S., Bobick, A.F.: A framework for recognizing multi-agent action from visual evidence. In: AAAI-99, pp. 518–525. AAAI Press, Menlo Park (1999)
Ishii, I., Taniguchi, T., Yamamoto, K., Takaki, T.: 1000 fps real-time optical flow detection system. Proc. SPIE 7538, 1–11 (2010)
Ivanov, Y.A., Bobick, A.F.: Recognition of visual activities and interactions by stochastic parsing. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 852–872 (2000)
Jan, T.: Neural network based threat assessment for automated visual surveillance. In: Int. Joint Conf. on Neural Networks, vol. 2, pp. 1309–1312. IEEE Press, New York (2004)
Jang, D.S., Choi, H.I.: Active models for tracking moving objects. Pattern Recognit. 33(7), 1135–1146 (2000)
Javed, O., Shah, M.: Tracking and object classification for automated surveillance. In: Proc. of the 7th European Conference on Computer Vision, pp. 343–357. Springer, London (2002)
Jeong, Y.S., Jeong, M.K., Omitaomu, O.A.: Weighted dynamic time warping for time series classification. Pattern Recognit. 44, 2231–2240 (2011)
Jiang, H., Drew, M.S., Li, Z.N.: Successive convex matching for action detection. In: IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, vol. 2, pp. 1646–1653 (2006)
Joo, S.W., Chellappa, R.: Attribute grammar-based event recognition and anomaly detection. In: Conference on Computer Vision and Pattern Recognition Workshop, CVPRW ’06, pp. 107–114 (2006)
Kameda, Y., Minoh, M.: A human motion estimation method using 3-successive video frames. In: Proc. of Int. Conf. on Virtual Systems, pp. 135–140 (1996)
Kang, W., Deng, F.: Research on intelligent visual surveillance for public security. In: 6th Int. Conf. Comput. and Inf. Sci, pp. 824–829. IEEE/ACIS, Melbourne (2007)
Ke, Y., Sukthankar, R., Hebert, M.: Spatio-temporal shape and flow correlation for action recognition. In: IEEE Conf. on Computer Vision and Pattern Recognition, pp. 1–8 (2007)
Khan, S.M., Shah, M.: Detecting group activities using rigidity of formation. In: Proc. of the 13th Annual ACM Int. Conf. on Multimedia, pp. 403–406 (2005)
Kim, H., Sakamoto, R., Kitahara, I., Toriyama, T., Kogure, K.: Robust silhouette extraction technique using background subtraction. In: 10th Meeting on Image Recognition and Understand (MIRU), Hiroshima, Japan, pp. 1–6 (2007)
Kim, J.B., Kim, H.J.: Efficient region-based motion segmentation for a video monitoring system. Pattern Recognit. Lett. 24(1/3), 113–128 (2003)
Kim, T.K., Im, J.H., Paik, J.K.: Video object segmentation and its salient motion detection using adaptive background generation. IEEE Power Electron. Lett. 45(11), 542–543 (2009)
Ko, T.: A survey on behavior analysis in video surveillance for homeland security applications. In: AIPR, pp. 1–8. IEEE Press, New York (2008)
Kuno, Y., Watanabe, T., Shimosakoda, Y., Nakagawa, S.: Automated detection of human for visual surveillance system. In: Proc. of the Int. Conf. on Pattern Recognition, ICPR ’96, pp. 865–869. IEEE Computer Society, Washington (1996)
Ladikos, A., Benhimane, S., Navab, N.: A realtime tracking system combining template-based and feature-based approaches. In: VISAPP (2007)
Lalos, C., Anagnostopoulos, V.: Hybrid tracking approach for assistive environments. In: In Int. Conf. Proc. Series, 05, vol. 39/64. ACM Press, New York (2009)
Laptev, I.: On space-time interest points. Int. J. Comput. Vis. 64(2–3), 107–123 (2005)
Laptev, I., Lindeberg, T.: Space-time interest points. In: Proc. 9th IEEE Int. Conf. on Computer Vision, pp. 432–439 (2003)
Laptev, I., Perez, P.: Retrieving actions in movies. In: Proc. of the 11th IEEE Int. Conf. on Computer Vision, pp. 1–8 (2007)
Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: IEEE Conf. on Computer Vision and Pattern Recognition, pp. 1–8 (2008)
Leordeanu, M., Collins, R.: Unsupervised learning of object features from video sequences. In: Proc. of IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, Washington, DC, USA, vol. 1, pp. 1142–1149 (2005)
Li, X., Hu, W., Zhang, Z., Zhang, X.: Robust foreground segmentation based on two effective background models. In: Proc. of the 1st ACM Int. Conf. on Multimedia Information Retrieval, MIR ’08, pp. 223–228. ACM Press, New York (2008)
Liao, H.H., Chang, J.Y., Chen, L.G.: A localized approach to abandoned luggage detection with foreground-mask sampling. In: Proc. of the IEEE 5th Int. Conf. on Advanced Video and Signal Based Surveillance, AVSS’08, pp. 132–139. IEEE Computer Society, Washington (2008)
Lin, F., Chen, B.M., Lee, T.H.: Robust vision-based target tracking control system for an unmanned helicopter using feature fusion. In: 9th IAPR Int. Conf. on Machine Vision Applications, vol. 13, pp. 398–401 (2009)
Lin, H.H., Liu, T.L., Chuang, J.H.: A probabilistic svm approach for background scene initialization. In: Int. Conf. on Image Processing, vol. 3, pp. 893–896 (2002)
Lipton, A.J.: Local application of optic flow to analyse rigid versus non-rigid motion. http://www.eecs.lehigh.edu/FRAME/Lipton/ieevframe.html
Lipton, A.J., Fujiyoshi, H., Patil, R.S.: Moving target classification and tracking from real-time video. In: Proc. of the 4th IEEE Workshop on Applications of Computer Vision, pp. 8–14. IEEE Computer Society, Washington (1998)
Liu, C., Yuen, J., Torralba, A., Sivic, J., Freeman, W.T.: Sift flow: dense correspondence across different scenes. In: Proc. of the 10th European Conference on Computer Vision: Part III, pp. 28–42. Springer, Berlin, Heidelberg (2008)
Liu, J., Luo, J., Shah, M.: Recognizing realistic actions from videos in the wild. In: IEEE Int. Conf. on Computer Vision and Pattern Recognition, pp. 1–8 (2009)
Lublinerman, R., Ozay, N., Zarpalas, D., Camps, O.: Activity recognition from silhouettes using linear systems and model (in)validation techniques. In: 18th Int. Conf. on Pattern Recognition, vol. 1, pp. 347–350 (2006)
Lucas, B., Kanade, T.: An iterative image registration technique with an application to stereo vision. In: Int. Joint Conf. on Artificial Intelligence, pp. 674–679. AAAI Press, Menlo Park (1981)
Luo, R., Li, L., Gu, I.Y.: Efficient adaptive background subtraction based on multi-resolution background modeling and updating. In: Springer-PCM, pp. 118–127. Springer, Berlin (2007)
Lv, F., Nevatia, R.: Single view human action recognition using key pose matching and Viterbi path searching. In: CVPR, Minneapolis, Minnesota, USA, pp. 1–7. IEEE Computer Society, Washington (2007)
Ma, X., Grimson, W.E.L.: Edge-based rich representation for vehicle classification. In: Proceedings of the Tenth IEEE International Conference on Computer Vision, vol. 2, pp. 1185–1192. IEEE Computer Society, Washington (2005)
McHugh, J.M., Konrad, J., Saligrama, V., Jodoin, P.M.: Foreground-adaptive background subtraction. IEEE Signal Process. Lett. 16(5), 390–393 (2009)
Meyer, F., Bouthemy, P.: Region-based tracking using affine motion models in long image sequences. CVGIP, Image Underst. 60(2), 119–140 (1994)
Migdal, J., Grimson, W.E.L.: Background subtraction using Markov thresholds. In: Proc. of the IEEE Workshop on Motion and Video Computing (WACV/MOTION’05), WACV-MOTION ’05, vol. 2, pp. 58–65. IEEE Computer Society, Washington (2005)
Minnen, D., Essa, I., Starner, T.: Expectation grammars: leveraging high-level expectations for activity recognition. In: Proceedings IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003, vol. 2, pp. 626–632 (2003)
Moeslund, T.B., Granum, E.: A survey of computer vision-based human motion capture. Comput. Vis. Image Underst. 81(03), 231–268 (2001)
Moeslund, T.B., Hilton, A., kruger, V.: A survey of advances in vision-based human motion capture and analysis. Comput. Vis. Image Underst. 104(2–3), 90–126 (2006)
Mohan, A., Papageorgiou, C., Poggio, T.: Example-based object detection in images by components. IEEE Trans. Pattern Anal. Mach. Intell. 23(4), 349–361 (2001)
Monnet, A., Mittal, A., Paragios, N., Ramesh, V.: Background modeling and subtraction of dynamic scenes. In: Proc. 9th IEEE Int. Conf. on Computer Vision, vol. 2, pp. 1305–1312 (2003)
Moore, D., Essa, I.: Recognizing multitasked activities from video using stochastic context-free grammar. In: Proc. AAAI National Conf. on AI, pp. 770–776. AAAI Press, Menlo Park (2002)
Moore, D.J., Essa, I.A., Hayes, M.H.: Exploiting human actions and object context for recognition tasks. In: Proc. of 7th IEEE Int. Conf. on Computer Vision, vol. 1, pp. 80–86 (1999)
Morris, B.T., Trivedi, M.M.: A survey of vision-based trajectory learning and analysis for surveillance. IEEE Trans. Circuits Syst. Video Technol. 18(08), 1114–1127 (2008)
Narayana, M., Haverkamp, D.: A Bayesian algorithm for tracking multiple moving objects in outdoor surveillance video. In: CVPR, pp. 1–8. IEEE Press, New York (2007)
Natarajan, P., Nevatia, R.: Coupled hidden semi Markov models for activity recognition. In: IEEE Workshop on Motion and Video Computing, pp. 1–8 (2007)
Nevatia, R., Hobbs, J., Bolles, B.: An ontology for video event representation. In: IEEE Conf. on Computer Vision and Pattern Recognition Workshop, pp. 119–128 (2004)
Nevatia, R., Zhao, T., Hongeng, S.: Hierarchical language-based representation of events in video streams. In: Conf. on Computer Vision and Pattern Recognition Workshop, vol. 4, pp. 39–47 (2003)
Nguyen, N.T., Phung, D.Q., Venkatesh, S., Bui, H.: Learning and detecting activities from movement trajectories using the hierarchical hidden Markov model. In: IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, vol. 2, pp. 955–960 (2005)
Niebles, J.C., Wang, H., Fei-fei, L.: Unsupervised learning of human action categories using spatial-temporal words. In: Proc. British Machine Vision Conference (BMVC) (2006)
Niethammer, M., Tannenbaum, A., Angenent, S.: Dynamic active contours for visual tracking. IEEE Trans. Autom. Control 51(4), 562–579 (2006)
Nowozin, G.S., Bakir, G., Tsuda, K.: Discriminative subsequence mining for action classification. In: ICCV, vol. 11, pp. 1–8. IEEE Press, New York (2007)
Ogale, A.S., Karapurkar, A., Aloimonos, Y.: View-invariant modeling and recognition of human actions using grammars. In: 10th Conf. on Category Curve of Long Video, vol. 10, pp. 115–126, Beijing, China. IEEE Press, New York (2005)
Oh, S., Hoogs, A., et al.: A large-scale benchmark dataset for event recognition in surveillance video. In: Proc. of IEEE Int. Conf. on Computer Vision and Pattern Recognition, pp. 3153–3160 (2011)
Oikonomopoulos, A., Patras, I., Pantic, M., Paragios, N.: Trajectory-based representation of human actions. In: Artificial Intelligence for Human Computing, vol. 4451, pp. 133–154. Springer, Berlin (2007)
Oikonomopoulos, A., Patras, I., Pantici, M.: Spatiotemporal salient points for visual recognition of human actions. IEEE Trans. Syst. Man Cybern. 36(3), 710–719 (2006)
Oliver, N.M., Rosario, B., Pentland, A.P.: A Bayesian computer vision system for modeling human interactions. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 831–843 (2000)
Oliver, N., Horvitz, E., Garg, A.: Layered representations for human activity recognition. In: Proc. 4th IEEE Int. Conf. on Multimodal Interfaces, pp. 3–8 (2002)
Ong, E.J., Gong, S.: The dynamics of linear combinations: tracking 3d skeletons of human subjects. Image Vis. Comput. 20(5/6), 397–414 (2002)
Paragios, N., Deriche, R.: Geodesic active contours and level sets for the detection and tracking of moving objects. IEEE Trans. Pattern Anal. Mach. Intell. 22(3), 266–280 (2000)
Paragios, R., Stenger, B., Ramesh, V., Paragios, N., Buhmann, F.C.J.: Topology free hidden Markov models: application to background modeling. In: IEEE Int. Conf. on Computer Vision, pp. 294–301 (2001)
Parameswaran, V., Chellappa, R.: View invariance for human action recognition. Int. J. Comput. Vis. 66(1), 83–101 (2006)
Parameswaran, V., Singh, M., Ramesh, V.: Illumination compensation based change detection using order consistency. In: IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 1982–1989 (2010)
Parikh, D., Zitnick, C.L., Chen, T.: Unsupervised learning of hierarchical spatial structures in images. In: Proc. of IEEE Conf. on Computer Vision and Pattern Recognition, pp. 1–8 (2009)
Park, S., Aggarwal, J.K.: A hierarchical Bayesian network for event recognition of human actions and interactions. Assoc. Comput. Mach. Multimedia Syst. J., 164–179 (2004)
Paruchuri, J.K., Sathiyamoorthy, E.P., Ching, S., Cheung, S., Chen, C.H.: Spatially adaptive illumination modeling for background subtraction. In: IEEE Int. Conf. on Computer Vision Workshops (ICCV Workshops), pp. 1745–1752 (2011)
Pentland, A.: Smart rooms, smart clothes. In: Proc. Fourteenth Int. Conf. on Pattern Recognition, vol. 2, pp. 949–953 (1998)
Peursum, P., West, G., Venkatesh, S.: Combining image regions and human activity for indirect object recognition in indoor wide-angle views. In: 10th IEEE Int. Conf. on Computer Vision, vol. 1, pp. 82–89 (2005)
Pilet, J., Strecha, C., Fua, P.: Making background subtraction robust to sudden illumination changes. In: Proc. European Conf. on Computer Vision, pp. 1–14 (2008)
Pinhanez, C.S., Bobick, A.F.: Human action detection using pnf propagation of temporal constraints. In: Proc. IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, pp. 898–904 (1998)
Platt, J.C.: Fast training of support vector machines using sequential minimal optimization. In: Proceedings of Advances in Kernel Methods—Support Vector Learning, pp. 185–208. Microsoft, Redmond (1998)
Poppe, R.: A survey on vision-based human action recognition. Image Vis. Comput. 28, 976–990 (2010)
Porikli, F., Ivanov, Y., Haga, T.: Robust abandoned object detection using dual foregrounds. EURASIP J. Adv. Signal Process. 08, 197875 (2008)
Qi, Y., An, G.: Infrared moving targets detection based on optical flow estimation. In: Proc. of IEEE Int. Conf. on Computer Science and Network Technology, pp. 2452–2455 (2011)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1999)
Rabinovich, A., Vedaldi, A., Galleguillos, C., Wiewiora, E., Belongie, S.: Objects in context. In: Proc. of the IEEE 11th Int. Conf. on Computer Vision, pp. 1–8 (2007)
Rao, C., Shah, M.: View-invariance in action recognition. In: Proc. of IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, vol. 2, pp. 316–322 (2001)
Reddy, V., Sanderson, C., Sanin, A., Lovell, B.C.: Adaptive patch-based background modelling for improved foreground object segmentation and tracking. In: 7th IEEE Int. Conf. on Advanced Video and Signal Based Surveillance (AVSS), pp. 172–179 (2010)
Ren, Y., Chua, C.S.: Bilateral learning for color-based tracking. Image Vis. Comput. 26(11), 1530–1539 (2008)
Rodriguez, M.D., Ahmed, J., Shah, M.: Action MACH: a spatio-temporal maximum average correlation height filter for action recognition. In: CVPR. IEEE Press, New York (2008)
Rui, Y., Huang, T.S.: Image retrieval: current techniques, promising directions and open issues. J. Vis. Commun. Image Represent. 10, 39–62 (1999)
Ryoo, M.S., Aggarwal, J.K.: Recognition of composite human activities through context-free grammar based representation. In: IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, vol. 2, pp. 1709–1718 (2006)
Ryoo, M.S., Aggarwal, J.K.: Hierarchical recognition of human activities interacting with objects. In: IEEE Conf. on Computer Vision and Pattern Recognition, pp. 1–8 (2007)
Ryoo, M.S., Aggarwal, J.K.: Recognition of high-level group activities based on activities of individual members. In: IEEE Workshop on Motion and Video Computing, pp. 1–8 (2008)
Ryoo, M.S., Aggarwal, J.K.: Semantic representation and recognition of continued and recursive human activities. Int. J. Comput. Vis. 82, 1–24 (2009)
Ryoo, M.S., Aggarwal, J.K.: Spatio-temporal relationship match: video structure comparison for recognition of complex human activities. In: IEEE 12th Int. Conf. on Computer Vision, pp. 1593–1600 (2009)
Ryoo, M.S., Aggarwal, J.K.: UT-Interaction Dataset, ICPR contest on Semantic Description of Human Activities (SDHA). http://cvrc.ece.utexas.edu/SDHA2010/Human_Interaction.html (2010)
Ryoo, M.S., Chen, C.C., Aggarwal, J.K., Roy-Chowdhury, A.: An overview of contest on semantic description of human activities 2010. In: Proc. Int. Conf. Pattern Recognition Contests, pp. 1–16 (2010)
Sakaino, H.: A semitransparency-based optical-flow method with a point trajectory model for particle-like video. IEEE Trans. Image Process. 21(2), 441–450 (2012)
Salembier, P., Marques, F.: Region-based representations of image and video: segmentation tools for multimedia services. IEEE Trans. Circuits Syst. Video Technol. 9(8), 1147–1169 (1999)
Sarkar, S., Phillips, P.J., Liu, Z., Vega, I.R., Grother, P., Bowyer, K.W.: The humanoid gait challenge problem: data sets, performance, and analysis. IEEE Trans. Pattern Anal. Mach. Intell. 27(2), 162–177 (2005)
Schmaltz, C., Rosenhahn, B., Brox, T., Weickert, J.: Localised mixture models in region-based tracking. In: Proc. of the 31st DAGM Symposium on Pattern Recognition, pp. 21–30. Springer, Berlin (2009)
Schmaltz, C., Rosenhahn, B., Brox, T., Weickert, J.: Region-based pose tracking with occlusions using 3D models. Mach. Vis. Appl. 23(3), 557–577 (2012)
Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: Proc. IEEE Computer Society Pattern Recognition, vol. 3, pp. 32–36. IEEE Computer Society Press, Los Alamitos (2004)
Schunck, B.: The image flow constraint equation. Comput. Vis. Graph. Image Process. 35(1), 20–46 (1986)
Schunck, B., Horni, B.: Determining optical flow. In: DARPA81, pp. 144–156 (1981)
Sclaroff, S., Isidoro, J.: Active blobs: region-based, deformable appearance models. Comput. Vis. Image Underst. 89(2/3), 197–225 (2003)
Senst, T., Evangelio, R.H., Sikora, T.: Detecting people carrying objects based on an optical flow motion model. In: IEEE Workshop on Applications of Computer Vision, pp. 301–306 (2011)
Shechtman, E., Irani, M.: Space-time behavior based correlation. In: IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, vol. 1, pp. 405–412 (2005)
Sheikh, Y., Javed, O., Kanade, T.: Background subtraction for freely moving cameras. In: IEEE 12th Int. Conf. on Computer Vision, pp. 1219–1225 (2009)
Sheikh, Y., Sheikh, M., Shah, M.: Exploring the space of a human action. In: Tenth IEEE Int. Conf. on Computer Vision, vol. 1, pp. 144–149 (2005)
Shi, J., Tomasi, C.: Good features to track. In: CVPR, pp. 593–600. IEEE Computer Society, Washington (1994)
Shi, Y., Huang, Y., Minnen, D., Bobick, A., Essa, I.: Propagation networks for recognition of partially ordered sequential action. In: Proc. of IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, vol. 2, pp. 862–869 (2004)
Shibata, M., Yasuda, Y., Ito, M.: Moving object detection for active camera based on optical flow distortion. In: Proc. of the 17th World Congress the International Federation of Automatic Control, Seoul, Korea, pp. 14,720–14,725 (2008)
Siskind, J.M.: Grounding the lexical semantics of verbs in visual perception using force dynamics and event logic. J. Artif. Intell. Res. 15, 31–90 (2001)
Sivic, J., Zisserman, A.: Video data mining using configurations of viewpoint invariant regions. In: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, Washington, DC, pp. 1–8 (2004)
Smeaton, A.F., Over, P., Kraaij, W.: Evaluation campaigns and TRECVid. In: Proc. of the 8th ACM Int. Workshop on Multimedia Information Retrieval, Santa Barbara, California, USA, pp. 321–330 (2006)
Starner, T., Pentland, A.: Real-time American sign language recognition from video using hidden Markov models. In: Proceedings International Symposium on Computer Vision, pp. 265–270 (1995)
Stauffer, C.: Automatic hierarchical classification using time-based co-occurrences. In: IEEE Int. Conf. on Computer Vision and Pattern Recognition, vol. 2, pp. 333–339 (1999)
Stauffer, C., Grimson, W.E.L.: Learning patterns of activity using real-time tracking. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 747–757 (2000)
Tavakkoli, A., Nicolescu, M., Bebis, G.: A novelty detection approach for foreground region detection in videos with quasi-stationary backgrounds. In: Proc. of the 2nd Int. Symposium on Visual Computing, pp. 40–49. Springer, Berlin, Heidelberg (2006)
Techmer, A.: Contour-based motion estimation & object tracking for real-time applications. In: IEEE Image Processing Proceedings, vol. 3, pp. 648–651 (2001)
Thi, T.H., Zhang, J., Cheng, L., Wang, L., Satoh, S.: Semi-supervised human action recognition and localization using spatially and temporally integrated local features (2009). http://huetuan.net/semiaction.html
Trec Video Retrieval Evaluation Official Website. http://huetuan.net/semiaction.html
Tsai, D.M., Lai, S.C.: Independent component analysis-based background subtraction for indoor surveillance. IEEE Trans. Image Process. 18(1), 158–167 (2009)
Tsuchiya, M., Fujiyoshi, H.: Evaluating feature importance for object classification in visual surveillance. In: Proc. of the 18th Int. Conf. on Pattern Recognition, vol. 2, pp. 978–981. IEEE Computer Society, Washington (2006)
Valera, M., Velastin, S.A.: Intelligent distributed surveillance systems: a review. IEE Proc., Vis. Image Signal Process. 152(2), 192–204 (2005)
Varcheie, P.D.Z., Sills-Lavoie, M., Bilodeau, G.A.: A multiscale region-based motion detection and background subtraction algorithm. Sensors 10, 1041–1061 (2010)
Vaswani, N., Chowdhury, A.R., Chellappa, R.: Activity recognition using the dynamics of the configuration of interacting objects. In: Proc. IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, vol. 2, pp. 633–640 (2003)
Vaswani, N., Chowdhury, A.R., Chellappa, R.: Shape activity: a continuous state HMM for moving/deforming shapes with application to abnormal activity detection. IEEE Trans. Image Process. 14(10), 1603–1616 (2005)
Veeraraghavan, A., Chellappa, R., Roy-Chowdhury, A.K.: The function space of an activity. In: IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, vol. 1, pp. 959–968 (2006)
Vishwakarma, S., Agrawal, A.: A novel approach for feature quantization using one-dimensional histogram. In: Annual IEEE India Conference (INDICON), pp. 1–4 (2011)
Vishwakarma, S., Sapre, A., Agrawal, A.: Action recognition using cuboids of interest points. In: IEEE Int. Conf. on Signal Processing, Communications and Computing (ICSPCC), pp. 1–6 (2011)
Vlasic, D., Baran, I., Matusik, W., Popović, J.: Articulated mesh animation from multi-view silhouettes. ACM Trans. Graph. 27(3), 97:1–97:9 (2008)
Vogler, C., Metaxas, D.: Parallel hidden Markov models for American sign language recognition. In: IEEE Int. Conf. on Computer Vision, vol. 1, pp. 224–228 (1999)
Vosters, L., Shan, C., Gritti, T.: Background subtraction under sudden illumination changes. In: 7th IEEE Int. Conf. on Advanced Video and Signal Based Surveillance (AVSS), pp. 384–391 (2010)
Vu, V.-T., Bremond, F., Thonnat, M.: Automatic video interpretation: a novel algorithm for temporal scenario recognition. In: Proc. 8th Int. Joint Conf. Artif. Intell, pp. 9–15 (2003)
Waltisberg, D., Yao, A., Gall, J., Gool, L.V.: Variations of a hough-voting action recognition system. In: Proc. of Int. Conf. on Pattern Recognition, pp. 1–7 (2010)
Wang, J., Bebis, G., Miller, R.: Robust video-based surveillance by integrating target detection with tracking. In: Proc. Conf. on Computer Vision and Pattern Recognition Workshop, CVPRW ’06, pp. 137–144. IEEE Computer Society, Washington (2006)
Weber, M.: Unsupervised learning of models for object recognition. Ph.D. thesis, California Institute of Technology, Pasadena, California (2000)
Weinland, D., Boyer, E., Ronfard, R.: Action recognition from arbitrary views using 3D exemplars. In: ICCV, Rio de Janeiro, Brazil, vol. 11, pp. 1–7. IEEE Computer Society Press, Los Alamitos (2007)
Weinland, D., Ronfard, R., Boyer, E.: Automatic discovery of action taxonomies from multiple views. In: CVPR, vol. 2, pp. 1639–1645. IEEE Computer Society, Washington (2006)
Weinland, D., Ronfard, R., Boyer, E.: Free viewpoint action recognition using motion history volumes. Comput. Vis. Image Underst. 104(02), 249–257 (2006)
Weinland, D., Ronfard, R., Boyer, E.: A survey of vision-based methods for action representation, segmentation and recognition. Comput. Vis. Image Underst. 115, 224–241 (2011)
Wen, Z., Cai, Z.: A robust object tracking approach using mean shift. In: 3rd IEEE Int. Conf. on Natural Computation, vol. 2, pp. 170–174 (2007)
Wong, S.F., Cipolla, R.: Extracting spatiotemporal interest points using global information. In: ICCV, vol. 11, pp. 1–8. IEEE Press, New York (2007)
Wong, S.F., Kim, T.K., Cipolla, R.: Learning motion categories using both semantic and structural information. In: IEEE Conf. on Computer Vision and Pattern Recognition, pp. 1–6 (2007)
Wunsch, P., Hirzinger, G.: Real-time visual tracking of 3D objects with dynamic handling of occlusion. In: Int. Conf. on Robotics and Automation, 97, Albuquerque, New Mexico, USA, vol. 4, pp. 2868–2879 (1997)
Xiang, T.: Video behavior profiling for anomaly detection. IEEE Trans. Pattern Anal. Mach. Intell. 30(5), 893–908 (2008)
Xiao, J., Cheng, H., Han, F., Sawhney, H.: Geo-spatial aerial video processing for scene understanding and object tracking. In: CVPR, pp. 1–8. IEEE Press, New York (2008)
Xu, M., Zuo, L., Iyengar, S., Goldfain, A., DelloStritto, J.: A semi-supervised hidden Markov model-based activity monitoring system. In: 33rd Annual Int. Conf. of the IEEE Engineering in Medicine and Biology Society (EMBC), Boston, Massachusetts USA, pp. 1794–1797 (2011)
Yacoob, Y., Black, M.J.: Parameterized modeling and recognition of activities. In: 6th Int. Conf. on Computer Vision, pp. 120–127 (1998)
Yamato, J., Ohya, J., Ishii, K.: Recognizing human action in time-sequential images using hidden Markov model. In: Proc. IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, pp. 379–385 (1992)
Yamazaki, M., Xu, G., Chen, Y.W.: Detection of moving objects by independent component analysis. In: Proc. of the 7th Asian Conf. on Computer Vision, ACCV’06, vol. 2, pp. 467–478. Springer, Berlin, Heidelberg (2006)
Yang, F., Li, B.: Unsupervised learning of spatial structures shared among images. Vis. Comput. 28(2), 175–180 (2011)
Yilmaz, A., Javed, O., Shah, M.: Object tracking: a survey. ACM Comput. Surv. 38(4), 1–45 (2006)
Yilmaz, A., Li, X., Shah, M.: Contour-based object tracking with occlusion handling in video acquired using mobile cameras. IEEE Trans. Pattern Anal. Mach. Intell. 26(11), 1531–1536 (2004)
Yilmaz, A., Shah, M.: Actions sketch: a novel action representation. In: CVPR, vol. 1, pp. 984–989. IEEE Computer Society, Washington (2005)
Yohannes, Y., Hoddinott, J.: Classification and regression trees. Tech. rep., International Food Policy Research Institute, Washington, DC, USA (1999)
Yokoyama, M., Poggio, T.: A contour-based moving object detection and tracking. In: 2nd Joint IEEE Int. Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp. 271–276 (2005)
Yu, E., Aggarwal, J.K.: Detection of fence climbing from monocular video. In: 18th Int. Conf. on Pattern Recognition, vol. 1, pp. 375–378 (2006)
Yu, T.H., Kim, T.K., Cipolla, R.: Real-time action recognition by spatiotemporal semantic and structural forests. In: Proc. of British Machine Vision Conference, pp. 1–7 (2010)
Zelnik-Manor, L., Irani, M.: Event-based analysis of video. In: Proc. of IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, vol. 2, pp. 123–130 (2001)
Zhan, B., Monekosso, D.N., Remagnino, P., Velastin, S.A., Xu, L.Q.: Crowd analysis: a survey. Mach. Vis. Appl. 19(5–6), 345–357 (2008)
Zhang, D., Gatica-Perez, D., Bengio, S., McCowan, I.: Modeling individual and group actions in meetings with layered hmms. IEEE Trans. Multimed. 8(3), 509–520 (2006)
Zhang, J., Tian, Y., Yang, Y.: Adaptive dynamic model particle filter for visual object tracking. In: ISECS International Colloquium, vol. 1, pp. 333–336. IEEE Press, New York (2009)
Zhang, L., Li, S.Z., Yuan, X., Xiang, S.: Real-time object classification in video surveillance based on appearance learning. In: IEEE Conf. on Computer Vision and Pattern Recognition, pp. 1–8 (2007)
Zhao, Y., Gong, H., Lin, L., Jia, Y.: Spatio-temporal patches for night background modeling by subspace learning. In: 19th Int. Conf. on Pattern Recognition, pp. 1–4 (2008)
Zhong, H., Shi, J., Visontai, M.: Detecting unusual activity in video. In: Proc. of IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, vol. 2, pp. 819–826 (2004)
Zhou, S.K., Chellappa, R., Moghaddam, B.: Visual tracking and recognition using appearance-adaptive models in particle filters. IEEE Trans. Image Process. 13(11), 1491–1506 (2004)
Zhu, Y., Dariush, B., Fujimura, K.: Kinematic self retargeting: a framework for human pose estimation. Comput. Vis. Image Underst. 114(12), 1362–1375 (2010)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Vishwakarma, S., Agrawal, A. A survey on activity recognition and behavior understanding in video surveillance. Vis Comput 29, 983–1009 (2013). https://doi.org/10.1007/s00371-012-0752-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-012-0752-6