Abstract
Visual events occurring in video streams (such as human postures or more complex activities) are detected from a robust and generic region-based representation of the visual content and inferred using a spatio-temporal language that integrates domain-specific knowledge. More specifically, salient regions of activity are first extracted from the dynamic of the salient points along the scene. They are mapped to a vocabulary of the domain, using a state-of-the-art classifier, to describe the visual content in terms of semantic facts. Occurrences of events, modelled as assertions of a language representing spatio-temporal relationships between facts, are inferred from the description of videos by applying a forward-reasoning engine. An application to visual events retrieval in videos of meetings is presented as a test case.
Similar content being viewed by others
References
Del Bimbo A, Vicario E (1995) Symbolic description and visual querying of image sequences using spatio-temporal logic. IEEE Trans Knowl Data Eng 7:4
Pinhanez C, Bobick A (1997) Human action detection using PNF propagation of temporal constraints. M.T.T. Media Laboratory Perceptual Section Report No 423
Comaniciu D, Ramesh V, Meer P (2000) Real-time tracking of non-rigid objects using mean shift. Comput Pattern Recog 2: 142–149
Cordelia Schmid, Roger Mohr (1997) Local grayvalue Invariants for Image retrieval. IEEE Trans Pattern Anal Mach Intell 5:19
Giarratano J, Riley G (1998) Expert system: principles and programming. PWS publishing company, Boston
Howel AJ, Buxton H (2002) Active vision techniques for visually mediated interaction. Image Vis Comput 20:861–871
Mikolajczyk K, Schmid C (2001) Indexing based on scale invariant interest points. In: 8th international conference on computer vision
Ghallab M (1996) On chronicles: representation on-line recognition and learning. In: 5th international conference on principles of knowledge representation and reasoning
Fisher MA, Bolles RC (1981) Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun ACM 24:381–395
Rota N, Thonnat M (2000) Video sequence interpretation for visual surveillance. In: 3rd IEEE international workshop on visual surveillance
Ounis I, Huibers TWC (1997) A logical relational approach for information retrieval indexing. In: 19th annual BCS-IRSG colloquium on IR research
Mikolajczyk K, Schmid C (2003) A performance evaluation of local descriptors. IEEE Conf Comput Vis Pattern Recog 2:257
Moënne-Loccoz N, Brémond F, Thonnat M (2003) Recurrent Bayesian network for the recognition of human behaviors from video. In: 3rd international conference on computer vision systems
Moënne-Loccoz N, Bruno E, Marchand-Maillet S (2004) Video content representation as salient regions of activity. In: International conference on image and video retrieval
Oliver N, Pentland A (2000) Graphical models for driver behavior recognition in a SmartCar. In: Proceedings of IEEE conference on intelligent vehicles
Pallotta V, Ballim A, Marchand-Maillet S, Lisowska A (2004) Towards meeting information systems: meeting knowledge management. In: 6th international conference on enterprise information systems
Philip HST, Zisserman A (1999) Feature based methods for structure and motion estimation. In: Workshop on vision algorithms
Stiller C, Konrad J (1999) Estimating motion in image sequences: a tutorial on modeling and computation of 2D motion. IEEE Signal Process 16(4):70–91
Tian Q, Sebe N, Lew MS, Loupias E, Huang TS (2001) Image retrieval using wavelet-based salient points. J Electronic Imaging (Special issue on storage and retrieval of digital media) 10(4):835–849
Lindeberg T (1998) Feature detection with automatic scale selection. Int J Comput Vis 2:30
Van-Thin V, Brémond F, Thonnat M (2002) Temporal constraints for video interpretation. In: 15th European conference on artificial intelligence
Vapnik VN (1998) Statistical learning theory. Wiley, New York
Acknowledgements
This work is funded by EU-IST project M4 (http://www.m4project.org) and the Swiss NCCR IM2 (Interactive Multimodal Information Management).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Moënne-Loccoz, N., Bruno, E. & Marchand-Maillet, S. Knowledge-based detection of events in video streams from salient regions of activity. Pattern Anal Applic 7, 422–429 (2004). https://doi.org/10.1007/s10044-004-0235-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-004-0235-0