Abstract.
In this paper, we propose a hierarchical video summarization strategy that explores video content structure to provide the users with a scalable, multilevel video summary. First, video-shot- segmentation and keyframe-extraction algorithms are applied to parse video sequences into physical shots and discrete keyframes. Next, an affinity (self-correlation) matrix is constructed to merge visually similar shots into clusters (supergroups). Since video shots with high similarities do not necessarily imply that they belong to the same story unit, temporal information is adopted by merging temporally adjacent shots (within a specified distance) from the supergroup into each video group. A video-scene-detection algorithm is thus proposed to merge temporally or spatially correlated video groups into scenario units. This is followed by a scene-clustering algorithm that eliminates visual redundancy among the units. A hierarchical video content structure with increasing granularity is constructed from the clustered scenes, video scenes, and video groups to keyframes. Finally, we introduce a hierarchical video summarization scheme by executing various approaches at different levels of the video content hierarchy to statically or dynamically construct the video summary. Extensive experiments based on real-world videos have been performed to validate the effectiveness of the proposed approach.
Similar content being viewed by others
References
Zhang H, Kantankanhalli A, Smoliar S (1993) Automatic partitioning of full-motion video. Multimedia Syst 1(1):2
Zhang H, Low C, Smoliar SW, Zhong D (1995) Video parsing, retrieval and browsing: an integrated and content-based solution. In: Proc. ACM Multimedia
Yeung M, Yeo B (1997) Video visualization for compact presentation and fast browsing of pictorial content. IEEE Trans CSVT 7:771-785
Pfeiffer S, Lienhart R, Fischer S, Effelsberg W (1996) Abstracting digital movies automatically. VCIP3 7(4):345-353
Li Y, Zhang T, Tretter D () An overview of video abstract techniques. HP Technical Report 4
Mills M (1992) A magnifier tool for video data. In: Proc. ACM Human Computer Interface, pp 93-98
Uchihashi S, Foote J, Girgensohn A, Boreczky J (1999) Video Manga: Generating semantically meaningful video summaries. In: Proc. 7th ACM Multimedia conference, Orlando, FL, pp 383-392
Doulamis N, Doulamis A, Avrithis Y, Ntalianis K, Kollias S (2000) Efficient Summarization of Stereoscopic Video Sequences. IEEE Trans CSVT 10(4) 5
Stefanidis A, Partsinevelos P, Agouris P, Doucette P (2000) Summarizing video datasets in the spatiotemporal domain. In: Proc. 11th international workshop on dataset and expert systems applications, pp 906-912
Kim C, Hwang J (2000) An integrated scheme for object-based video abstraction. In: Proc. 8th ACM Multimedia conference, Los Angeles, pp 303-311
Gong Y, Liu X (2000) Generating optimal video summaries. In: Proc. ICME, New York
DeMenthon D, Kobla V, Doermann D (1998) Video summarization by curve simplification. In: Proc. of 6th ACM Multimedia conference, Bristol, UK, pp 13-16
Ratakonda K, Sezan M, Crinon R (1999) Hierarchical video summarization. In: Proc. IS&T/SPIE conference on visual communications and image processing, San Jose, 3653:1531-1541
Lienhart R (1999) Abstracting home video automatically. In: Proc. 7th ACM Multimedia conference, Orlando, FL
Lienhart R, Pfeiffer S, Effelsberg W (1997) Video abstracting. Commun ACM 40(12):54-62
He L, Sanocki W, Gupta A, Grudin J (1999) Auto-summarization of audio-video presentations. In: Proc. 7th ACM Multimedia conference, Orlando, FL, 30 October-5 November 1999, pp 489-498
Fan J, Zhu X, Wu L (2001) Automatic model-based semantic object extraction algorithm. IEEE Trans Circuits Sys Video Technol 11(10):1073-1084
Iran M, Anandan P (1998) Video indexing based on mosaic representation. Proc IEEE 86(5) 6
Taniguchi Y, Akutsu A, Tonomura Y (1997) PanoramaExcerpts: Extracting and packing panoramas for video browsing. In: Proc. ACM Multimedia conference, Seattle, pp 427-436
Ponceleon D, Dieberger A (2001) Hierarchical brushing in a collection of video data. In: Proc. 34th Hawaii international conference on system sciences
Christel M, Hauptmann A, Warmack A, Crosby S (1999) Adjustable filmstrips and skims as abstractions for a digital video library. In: Proc. IEEE conference on advances in digital libraries, Baltimore, MD, 19-21 May 1999
Christel M (1999) Visual digest for news video libraries. In: Proc. 6th ACM Multimedia conference, Orlando, FL
Smith M, Kanade T (1995) Video skimming for quick browsing based on audio and image characterization. Technical Report, CMU-CS-95- 186, School of Computer Science, Carnegie Mellon University, Pittsburgh
Nam J, Tewfik A (1999) Dynamic video summarization and visualization. In: Proc. 6th ACM Multimedia conference, October 1999, Orlando, FL
Ebadollahi S, Chang S, Wu H, Takoma S (2001) Echocardiogram video summarization. Proc SPIE MI7, San Diego
Zhou W, Vellaikal A, Kuo CJ (2001) Rule-based video classification system for basketball video indexing. In: Proc. 9th ACM Multimedia conference workshop, Los Angeles
Haering N, Qian R, Sezan M (1999) Detecting hunts in wildlife videos. In: Proc. IEEE international conference on multimedia computing and systems, Florence, Italy, vol I
Zhu X, Wu L, Xue X, Lu X, Fan J (2001) Automatic scene detection in news programs by integrating visual feature and rules. In: Proc. 2nd IEEE Pacific-Rim conference on multimedia, Beijing, 24-26 October 2001. Lecture notes in computer science, vol 2195. Springer, Berlin Heidelberg New York, pp 837-842
Smoliar S, Zhang H (1994) Content based video indexing and retrieval. IEEE Multimedia 1(2):62-72
Zhu X, Fan J, Elmagarmid A, Aref W (2002) Hierarchical video summarization for medical data. In: Proc. SPIE: Storage and Retrieval for Media Databases, vol 4676, San Jose
Toklu C, Liou A, Das M (2000) Videoabstract: a hybrid approach to generate semantically meaningful video summaries. In: Proc. ICME, New York
Doulamis A, Doulamis N, Kollias S (2000) A fuzzy video content representation for video summarization and content-based retrieval. Signal Process 80(6): 8
Zhong D, Zhang H, Chang S (1997) Clustering methods for video browsing and annotation. Technical report, Columbia University
Vasconcelos N, Lippman A (1998) A spatiotemporal motion model for video summarization. In: Proc. IEEE conference on computer vision and pattern recognition (CVPR), Santa Barbara, CA, June 1998
Fan J, Yu J, Fujita G, Onoye T, Wu L, Shirakawa I (2001) Spatiotemporal segmentation for compact video representation. Signal Process Image Commun 16:553-566
Kender J, Yeo B (1998) Video scene segmentation via continuous video coherence. In: Proc. CVPR, Santa Barbara, CA
Yeo B, Liu B (1995) Rapid scene analysis on compressed video. IEEE Trans CSVT 5(6):533-544
Fan J, Aref W, Elmagarmid A, Hacid M, Marzouk M, Zhu X (2001) MultiView: multilevel video content representation and retrieval. J Electron Imag 10(4):895-908
Rui Y, Huang T, Mehrotra S (1999) Constructing table-of-content for video. ACM Multimedia Syst J on Video 7(5):359-368
Lin T, Zhang H (2000) Automatic video scene extraction by shot grouping. In: Proc. ICPR, Barcelona
Yeung M, Yeo B (1996) Time-constrained clustering for segmentation of video into story units. In: Proc. ICPR, Vienna, Austria
Zhu X, Aref WG, Fan J, Catlin A, Elmagarmid A (2003) Medical video mining for efficient database indexing, management and access. In: Proc. IEEE ICDE, pp 569-580, India
Hanjalic A, Zhang H (1999) An integrated scheme for automated video abstraction based on unsupervised cluster-validity analysis. IEEE Trans CSVT 9(8): 9
Girgensohn A, Boreczky J (1999) Time-constrained keyframe selection technique. In: Proc. IEEE conference on multimedia computing and systems, Florence, Italy, pp 756-761
Zhuang Y, Rui Y, Huang T, Mehrotra S (1998) Adaptive key frame extraction using unsupervised clustering. In: Proc. IEEE ICIP, Chicago
Manor L, Irani M (2002) Event-based analysis of video. In: Proc. CVPR, Kauai, HI, pp II-123-II-130
Zhang H, Wu J, Zhong D, Smoliar S (1997) An integrated system for content-based video retrieval and browsing. Pattern Recog 30(4):643-658
Weiss Y (1999) Segmentation using eigenvectors: a unifying view. In: Proc. IEEE ICCV, Corfu, Greece, pp 975-982
Scott G, Longuet-Higgins H (1990) Feature grouping by relocalisation of eigenvectors of the proximity matrix. In: Proc. British Machine Vision conference, Oxford, UK
Rasmussen E (1992) Clustering algorithms. In: Frakes W, Bazea-Yates R (eds) Information retrieval: data structure and algorithm. Prentice- Hall, Upper Saddle River, NJ, pp 419-442
Sundaram H, Chang S (2000) Determining computable scenes in films and their structures using audio-visual memory models. In: Proc. ACM Multimedia conference, Los Angeles
Costeira J, Kanade T (1994) A multi-body factorization method for motion analysis. Technical Report, CMU-CS-TR-94-220, Department of Computer Science, Carnegie Mellon University, Pittsburgh
Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888-905
Author information
Authors and Affiliations
Corresponding author
Additional information
Published online: 15 September 2004
Corespondence to: Xingquan Zhu
This research has been supported by the NSF under grants 9972883-EIA, 9974255-IIS, 9983248-EIA, and 0209120-IIS, a grant from the state of Indiana 21th Century Fund, and by the U.S. Army Research Laboratory and the U.S. Army Research Office under grant DAAD19-02-1-0178.
Rights and permissions
About this article
Cite this article
Zhu, X., Wu, X., Fan, J. et al. Exploring video content structure for hierarchical summarization. Multimedia Systems 10, 98–115 (2004). https://doi.org/10.1007/s00530-004-0142-7
Issue Date:
DOI: https://doi.org/10.1007/s00530-004-0142-7