Abstract
We propose an activity-monitoring framework based on a platform called VSIP, enabling behavior recognition in different environments. To allow end-users to actively participate in the development of a new application, VSIP separates algorithms from a priori knowledge. To describe how VSIP works, we present a full description of a system developed with this platform for recognizing behaviors, involving either isolated individuals, groups of people, or crowds, in the context of visual monitoring of metro scenes, using multiple cameras. In this work, we also illustrate the capability of the framework to easily combine and tune various recognition methods dedicated to the visual analysis of specific situations (e.g., mono-/multiactors’ activities, numerical/symbolic actions, or temporal scenarios). We also present other applications, using this framework, in the context of behavior recognition. VSIP has shown a good performance on human behavior recognition for different problems and configurations, being suitable to fulfill a large variety of requirements.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
AVITRACKEuropean Research Project (2006). Available at www.avitrack.net.
Bobick, A. F., &Wilson, A. D. (1997). A state-based approach to the representation and recognition of gesture.IEEE Transactions on Pattern Analysis & Machine Intelligence,19, 1325–1337.
Borg, M., Thirde, D., Ferryman, J., Fusier, F., Brémond, F., &Thonnat, M. (2005). Video event recognition for aircraft activity monitoring. InProceedings of the 8th International IEEE Conference on Intelligent Transportation Systems (pp. 1102–1107). Los Alamitos, CA: IEEE Computer Society Press.
Brand, M., &Kettnaker, V. (2000). Discovery and segmentation of activities in video.IEEE Transactions on Pattern Analysis & Machine Intelligence,22, 844–851.
Brémond, F., Maillot, N., Thonnat, M., &Vu, T. V. (2004).RR5189 Ontologies for video events (Research Rep.5189). Sophia Antipolis, France: Orion Team, Institut National de Recherche en Informatique et Automatique (INRIA).
Bui Ngoc, H.-B., Brémond, F., Thonnat, M., &Faure, J.-C. (2005). Shape recognition based on a video and multi-sensor system. InProceedings of the IEEE International Conference on Advanced Video and Signal-Based Surveillance (pp. 230–235). Los Alamitos, CA: IEEE Computer Society Press.
Chleq, N., &Thonnat, M. (1996). Real-time image sequence interpretation for video-surveillance applications. InProceedings of the IEEE International Conference on Image Processing (Vol. 2, pp. 801–804). Los Alamitos, CA: IEEE Computer Society Press.
Cupillard, F., Brémond, F., &Thonnat, M. (2004). Video understanding for metro surveillance. InProceedings of the IEEE International Conference on Networking, Sensing and Control (Vol. 1, pp. 186–191). Los Alamitos, CA: IEEE Computer Society Press.
Galata, A., Cohn, A., Magee, D., &Hogg, D. (2002). Modeling interaction using learnt qualitative spatio-temporal relations and variable length Markov models. In F. van Harmelen (Ed.),Proceedings of the 15th European Conference on Artificial Intelligence, (pp. 741–745). Amsterdam: IOS Press.
Georis, B., Mazière, M., Brémond, F., &Thonnat, M. (2004). A video interpretation platform applied to bank agency monitoring. InProceedings of the International Conference on Intelligent Distributed Surveillance Systems (pp. 46–50). London: Institution of Electrical Engineers.
Gerber, R., Nagel, H., &Schreiber, H. (2002). Deriving textual descriptions of road traffic queues from video sequences. InProceedings of the 15th European Conference on Artificial Intelligence (pp. 736–740). Amsterdam: IOS Press.
Ghallab, M. (1996). On chronicles: Representation, on-line recognition and learning. InProceedings of the 5th International Conference on Principles of Knowledge Representation and Reasoning (pp. 597–606). Cambridge, MA: Kaufmann.
Hongeng, S., Brémond, F., &Nevatia, R. (2000). Representation and optimal recognition of human activities. InIEEE Proceedings of the International Conference on Computer Vision and Pattern Recognition (Vol. 1, pp. 818–825). Los Alamitos, CA: IEEE Computer Society Press.
Hongeng, S., &Nevatia, R. (2001). Multi-agent event recognition. InProceedings of the 8th International Conference on Computer Vision (Vol. 2, pp. 84–91). Los Alamitos, CA: IEEE Computer Society Press.
Howell, A., &Buxton, H. (2002). Active vision techniques for visually mediated interaction.Image & Vision Computing,20, 861–871.
Hu, W., Tan, T., Wang, L., &Maybank, S. (2004). A survey on visual surveillance of object motion and behaviors.IEEE Transactions on Systems, Man, & Cybernetics: Pt. C. Applications & Reviews,34, 334–352.
Ivanov, Y. A., &Bobick, A. F. (2000). Recognition of visual activities and interactions by stochastic parsing.IEEE Transactions on Pattern Analysis & Machine Intelligence,22, 852–872.
Moenne-Locoz, N., Brémond, F., &Thonnat, M. (2003). Recurrent Bayesian network for the recognition of human behaviors from video. In J. L. Crowley, J. H. Piater, M. Vineze, & L. Paletta (Eds.),Computer Vision Systems: Proceedings of the 3rd International Conference on Computer vision systems (pp. 68–77). Berlin: Springer.
Owens, J., &Hunter, A. (2000). Application of the self-organizing map to trajectory classification. InProceedings of the IEEE International Workshop on Visual Surveillance (pp. 77–83). Los Alamitos, CA: IEEE Computer Society Press.
Pinhanez, C., &Bobick, A. (1998). Human action detection using pnf propagation of temporal constraints. InProceedings of IEEE Conference on Computer Vision and Pattern Recognition (pp. 898–904). Los Alamitos, CA: IEEE Computer Society Press.
Rota, N., &Thonnat, M. (2000). Activity recognition from video sequences using declarative models. InProceedings of the 14th European Conference on Artificial Intelligence (pp. 673–680). Amsterdam: IOS Press.
Toshev, A., Brémond, F., &Thonnat, M. (2006). An a priori-based method for frequent composite event discovery in videos. InProceedings of the IEEE International Conference on Computer Vision Systems [CD-ROM]. Los Alamitos, CA: IEEE Computer Society Press.
Vu, V., Brémond, F., &Thonnat, M. (2003). Automatic video interpretation: A novel algorithm for temporal scenario recognition. InProceedings of the 18th International Joint Conference on Artificial Intelligence (pp. 1295–1302). San Francisco: Morgan Kaufmann.
Xie, L., Chang, S.-F., Divakaran, A., &Sun, H. (2003). Unsupervised mining of statistical temporal structures in video. In A. Rosenfeld, D. Doermann, & D. Dementhon (Eds.),Video mining (pp. 279–307). Norwell, MA: Kluwer.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Brémond, F., Thonnat, M. & Zúñiga, M. Video-understanding framework for automatic behavior recognition. Behavior Research Methods 38, 416–426 (2006). https://doi.org/10.3758/BF03192795
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.3758/BF03192795