Abstract
Temporal segmentation of videos into meaningful image sequences containing some particular activities is an interesting problem in computer vision. We present a novel algorithm to achieve this semantic video segmentation. The segmentation task is accomplished through event detection in a frame-by-frame processing setup. We propose using one-class classification (OCC) techniques to detect events that indicate a new segment, since they have been proved to be successful in object classification and they allow for unsupervised event detection in a natural way. Various OCC schemes have been tested and compared, and additionally, an approach based on the temporal self-similarity maps (TSSMs) is also presented. The testing was done on a challenging publicly available thermal video dataset. The results are promising and show the suitability of our approaches for the task of temporal video segmentation.
Similar content being viewed by others
References
P. Bodesheim, A. Freytag, E. Rodner, M. Kemmler, and J. Denzler, “Kernel null space methods for novelty detection,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR’13) (Portland, 2013).
J. S. Boreczky and L. D. Wilcox, “A hidden Markov model framework for video segmentation using audio and image features,” in Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (Seattle, 1998), Vol. 6, pp. 3741–3744.
A. Bosch, A. Zisserman, and X. Munoz, “Representing shape with a spatial pyramid kernel,” in Proc. 6th ACM Int. Conf. on Image and Video Retrieval (CIVR’07) (Minneapolis, 2007), pp. 401–408.
C.-C. Chang and C.-J. Lin, “Libsvm: A library for support vector machines,” ACM Trans. Intellig. Syst. Technol. 2(3) (2011).
M. Cooper, T. Liu, and E. Rieffel, “Video segmentation via temporal pattern classification,” IEEE Trans. Multimedia 9(3), 610–618 (2007).
R. Cutler and L. S. Davis, “Robust real-time periodic motion detection, analysis, and applications,” IEEE Trans. Pattern Anal. Mach. Intellig. (TPAMI) 22(8), 781–796 (2000).
J. N. Goyette, P.-M. Porikli, J. F. Konrad, and P. Ishwar, “Changedetection.net: A new change detection benchmark dataset,” in Proc. IEEE Workshop on Change Detection (CDW’12) at CVPR’12 (Providence, RI, 2012).
J. S. Iwanski and E. Bradley, “Recurrence plots of experimental data: To embed or not to embed?,” Chaos 8(4), 861–871 (1998).
I. N. Junejo, E. Dexter, I. Laptev, and P. Pórez, “Viewindependent action recognition from temporal self-similarities,” IEEE Trans. Pattern Anal. Mach. Intellig. (TPAMI) 33(1), 172–185 (2011).
M. Kemmler, E. Rodner, and J. Denzler, “One-class classification with Gaussian processes,” in Proc. Asian Conf. on Computer Vision (ACCV’10) (Queenstown, 2010), pp. 489–500.
I. Koprinska and S. Carrato, “Temporal video segmentation: a survey,” Signal Processing: Image Commun. 16(5), 477–500 (2001).
M. Körner and J. Denzler, “Temporal self-similarity for appearance-based action recognition in multi-view setups,” in Proc. 15th Int. Conf. on Computer Analysis of Images and Patterns (CAIP) (York, 2013).
Tianming Liu, Hong-Jiang Zhang, and Feihu Qi, “A novel video key-frame-extraction algorithm based on perceived motion energy model,” IEEE Trans. Circuits Syst. Video Technol. 13(10), 1006–1013 (2003).
S. Maji, A.C. Berg, and J. Malik, “Classification using intersection kernel support vector machines is efficient,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR’08) (Anchorage, 2008), pp. 1–8.
G. McGuire, N. B. Azar, and M. Shelhamer, “Recurrence matrices and the preservation of dynamical properties,” Phys. Lett. A 237(1–2), 43–47 (1997).
F. Odone, A. Barla, and A. Verri, “Building kernels from binary strings for image matching,” IEEE Trans. Image Processing 14(2), 169–180 (2005).
C. E. Rasmussen and C. K. I. Williams, Gaussian Processes for Machine Learning (MIT Press, 2006).
B. Schölkopf, J. C. Platt, J. Shawe-Taylor, A. J. Smola, and R. C. Williamson, “Estimating the support of a high-dimensional distribution,” Neural Comput. 13(7), 1443–1471 (2001).
P. Sidiropoulos, V. Mezaris, I. Kompatsiaris, H. Meinedo, M. Bugalho, and I. Trancoso, “Temporal video segmentation to scenes using high-level audiovisual features,” IEEE Trans. Circuits Syst. Video Tech. 21 (8), 1163–1177 (2011).
D. Swanberg, Chiao-Fe Shu, and R. C. Jain, “Knowledge-guided parsing in video databases,” SPIE 36, 13–24 (1993).
D. M. J. Tax and R. P. W. Duin, “Support vector data description,” Mach. Learn. 54(1), 45–66 (2004).
R. Zabih, J. Miller, and K. Mai, “A feature-based algorithm for detecting and classifying production effects,” Multimedia Syst. 7(2), 119–128 (1999).
Hong Jiang Zhang, A. Kankanhalli, and S. W. Smoliar, “Automatic partitioning of full-motion video,” Multimedia Syst. 1(1), 10–28 (1993).
Author information
Authors and Affiliations
Corresponding author
Additional information
The article is published in the original.
This article uses the materials of the report submitted at the 4th International Workshop “Image Mining. Theory and Applications”, Barcelona, Spain, February 2013
Mahesh Venkata Krishna, born in 1984, received the Bachelor degree in Telecommunications Engineering in 2006 from the Visvesvaraya Technological University, India and obtained the MSc degree in Communication Engineering from the RWTH, Aachen in 2011. He is currently a holder of a scholarship from the Graduate Academy for Image Processing of the Free State of Thuringia, Germany, funded by Carl Zeiss AG. He is a member of the Computer Vision Group of Joachim Denzler at the Friedrich Schiller University, Jena. His research interests include video analysis, event rules of a scene based on visual data etc.
Paul Bodesheim. Born in 1987, received the Diploma degree in Computer Science (“Diplom-Informatiker”) in 2011 from the Friedrich Schiller University Jena, Germany. He is currently a holder of a scholarship from the Graduate Academy of the University Jena partially funded by the Free State of Thuringia, Germany (“Landesgraduiertenstipendium”) and a PhD student in the Computer Vision Group of Joachim Denzler at the University Jena. His research interests are in the field of computer vision and machine learning, especially one-class classification and novelty detection as well as incremental, large-scale, and life-long learning for visual object category recognition.
Marco Körner. Born in 1984, received the Diploma degree in Computer Science (“Diplom-Informatiker”) in 2008 from the Friedrich Schiller University Jena, Germany. He is currently a PhD student at the Computer Vision Group of Joachim Denzler at the University Jena. His research interests are in the field of 3D computer vision and machine learning, especially action recognition in multi-sensor environments.
Joachim Denzler. Earned the degrees “Diplom-Informatiker,” “Dr.-Ing.,” and “Habilitation” from the University of Erlangen in the years 1992, 1997, and 2003, respectively. Currently, he holds a position of full professor for computer science and is head of the Chair for Computer Vision, Faculty of Mathematics and Informatics, Friedrich-Schiller-University of Jena. His research interests comprise active computer vision, object recognition and tracking, 3D reconstruction, and plenoptic modeling, as well as computer vision for autonomous systems. He is author and coauthor of over 200 journal and conference papers as well as technical articles. He is a member of the IEEE, IEEE computer society, DAGM, and GI.
Rights and permissions
About this article
Cite this article
Krishna, M.V., Bodesheim, P., Körner, M. et al. Temporal video segmentation by event detection: A novelty detection approach. Pattern Recognit. Image Anal. 24, 243–255 (2014). https://doi.org/10.1134/S1054661814020114
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1134/S1054661814020114