Abstract
This paper deals with the problem of event discrimination in generic video documents. We propose an investigation on the design of an activity-based similarity measure derived from motion analysis. In an unsupervised context, our approach relies on the nonlinear temporal modeling of wavelet-based motion features directly estimated from the video frame. On the basis of the support vector machine (SVM) regression, this nonlinear model is able to learn the behavior of the motion descriptors along the temporal dimension and to capture useful information about the dynamic content of the shot. A similarity measure associated with our temporal model is then defined. This measure defines a metric between video segments according to spatial and temporal properties of the movements and provides a theoretic framework to compare, sort and classify videos. Experiments on a large annotated video database and a comparison with a similarity measure based on motion histograms shows that our approach is effective in discriminating between video events without any prior knowledge.
Similar content being viewed by others
References
Black MJ, Jepson AD (1998) A probabilistic framework for matching temporal trajectories: Condensation-based recognition of gestures and expressions. In: Burkhardt H, Neumann B (eds) European conference on computer vision, ECCV-98, vol 1406 of LNCS-series, Springer, Freiburg, pp 909–924
Bruno E, Pellerin D (2001) Global motion model based on B-spline wavelets: Application to motion estimation and video indexing. In: Proceedings of the 2nd international symposium on image and signal processing and analysis, ISPA’01, June 2001
Bruno E, Pellerin D (2002) Video structuring, indexing and retrieval based on global motion wavelet coefficients. In: Proceedings of international conference of pattern recognition (ICPR), Quebec City, August 2002
Chang SF, Chen W, Meng HJ, Sundaram H, Zhong D (1998) A fully automated content-based video search engine supporting spatio-temporal queries. IEEE Trans Circuits Syst Video Technol 8(5):602–615
Chomat O, Crowley J (1999) Probabilistic recognition of activity using local appearance. In: Proceedings of IEEE conference on computer vision and pattern recognition, CVPR’99, June 1999, pp 104–109
Duric Z, Rivlin E, Rosenfeld A (2000) Qualitative description of camera motion from histograms of normal flow. In: ICPR00, vol III
Fablet R, Bouthemy P, Perez P (2002) Non parametric motion characterization using temporal gibbs models for content-based video indexing and retrieval. IEEE Trans Image Process 11(4):393–407
Gardenfors P (1996) Conceptual spaces as a basis for cognitive semantics. In: Clark A et al (eds) Philosophy and cognitive science. Kluwer, Dordrecht
Hampapur A, Gupta A, Horowitz B, Shu C, Fuller C, Bach J, Gorkani M, Jain R (1997) Virage video engine. In: Proceedings of SPIE conference on storage and retrieval for image and video databases, vol 3022, San-Jose, pp 188–197, February 1997
Horn BKP, Schunk BG (1981) Determining optical flow. Artif Intell 17:185–204
Jain AK, Vailaya A, Wei X (1999) Query by video clip. Multimedia Syst 7(5):369–384
Janvier B, Bruno E, Marchand-Maillet S, Pun T (2003) Information-theoretic framework for the joint temporal partioning and representation of video data. In: Proceedings of the European conference on content-based multimedia indexing, CBMI’03, September 2003
Moënne-Loccoz N, Janvier B, Marchand-Maillet S, Bruno E (2004) Managing video collections at large. In: Proceedings of the 1st workshop on computer vision meets databases CVDB’04, Paris, France, 2004
Mukherjee S, Osuna E, Girosi F (1997) Nonlinear prediction of chaotic time series using support vector machines. In: Proceeding of IEEE neural networks for signal processing, NNSP’97, September 1997, pp 24–26
Odobez J-M, Bouthemy P (1995) Robust multiresolution estimation of parametric motion models. J Visual Commun Image Represent 6(4):348–365
Roach M, Mason J, Xu L-Q, Stentiford FWM (2002) Recent trends in video analysis: A taxonomy of video classification problems. In: Proceedings of the 6th IASTED international conference on internet and multimedia systems and applications, August 2002
Rui Y, Anandan P (2000) Segmenting visual actions based on spatio-temporal motion patterns. In: Proceedings of IEEE conference on computer vision and pattern recognition, CVPR’00, vol 1, Hilton Head, SC, pp 111–118, June 2000
Smola A, Scholkopf B (1998) A tutorial on support vector regression. Neurocolt2 technical report nc2-tr-1998-030
Srinivasan S, Ponceleon D, Amir A, Petkovic D (1999) What is in that video anyway? In search of better browsing. In: Proceedings of IEEE international conference on multimedia computing and systems, Florence, Italy, June 1999, pp 388–392
Taskiran C, Chen J-Y, Albiol A, Torres L, Bouman CA, Delp EJ (2004) Vibe: A compressed video database structured for active browsing and search. IEEE Trans Multimedia 1(6):103–118
Vapnik V (1995) The nature of statistical learning theory. Springer, Berlin heidelberg New York
Vasconcelos N, Lippman A (1997) Spatiotemporal motion model for video summarization. In: Proceedings of IEEE conference on computer vision and pattern recognition, CVPR’97, Santa Barbara
Vinod VV (1998) Activity based video shot retrieval and ranking. In: ICPR 98, pp 682–684
Wu YT, Kanade T, Li CC, Cohn J (2000) Image registration using wavelet-based motion model. Int J Comput Vis 38(2):129–152
Yacoob Y, Black MJ (1999) Parameterized modeling and recognition of activities. Comput Vis Image Understand 2(73):232–247
Zelni-Manor L, Irani M (2001) Event-based analysis of video. In: Proceedings of IEEE conference on computer vision and pattern recognition, CVPR’01, vol 2, Kauai Mariott, Hawai, December 2001, pp 123–130
Acknowledgements
This work is funded by the swiss Interactive Multimodal Information Management (IM2) and the EU IST Multimodal Meeting Manager (M4) projects.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Bruno, E., Moenne-Loccoz, N. & Marchand-Maillet, S. Unsupervised event discrimination based on nonlinear temporal modeling of activity content. Pattern Anal Applic 7, 402–410 (2004). https://doi.org/10.1007/s10044-005-0242-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-005-0242-9